Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Self-supervised RGB-NIR Fusion Video Vision Transformer Framework for rPPG Estimationopen access

Authors
Soyeon ParkBo-Kyeong KimSuh-Yeon Dong
Issue Date
Oct-2022
Publisher
Institute of Electrical and Electronics Engineers Inc.
Keywords
Computational modeling; Estimation; Feature extraction; Heart rate; near-infrared; remote heart rate measurement; RGB; rPPG; self-supervised learning; Spatiotemporal phenomena; Task analysis; Transformers; video vision transformer
Citation
IEEE Transactions on Instrumentation and Measurement, v.71
Journal Title
IEEE Transactions on Instrumentation and Measurement
Volume
71
URI
https://scholarworks.sookmyung.ac.kr/handle/2020.sw.sookmyung/152372
DOI
10.1109/TIM.2022.3217867
ISSN
0018-9456
1557-9662
Abstract
Remote photoplethysmography (rPPG) is a technology that can estimate non-contact heart rate (HR) using facial videos. Estimating rPPG signals requires low cost, and thus, it is widely used for non-contact health monitoring. Recent HR estimation studies based on rPPG heavily rely on the supervised feature learning on normal RGB videos. However, the RGB-only methods are significantly affected by head movements and various illumination conditions, and it is difficult to obtain large-scale labeled data for rPPG in order to determine the performance of supervised learning methods. To address these problems, we present the first of its kind self-supervised transformer-based fusion learning framework for rPPG estimation. In our study, we propose an end-to-end Fusion Video Vision Transformer (Fusion ViViT) network that can extract long-range local and global spatiotemporal features from videos and convert them into video sequences to enhance the rPPG representation. In addition, the self-attention of the transformer integrates the spatiotemporal representations of complementary RGB and near-infrared (NIR), which, in turn, enable robust HR estimation even under complex conditions. We use contrastive learning as a self-supervised learning scheme. We evaluate our framework on public datasets containing both RGB, NIR videos and physiological signals. The result of near-instant HR (approximately 6 s) estimation on the large-scale rPPG dataset with various scenarios, was 14.86 of RMSE, which was competitive with the state-of-the-art accuracy of average HR (approximately 30 s). Furthermore, transfer learning results on the driving rPPG dataset showed a stable HR estimation performance with 16.94 of RMSE, demonstrating that our framework can be utilized in the real world. Author
Files in This Item
Go to Link
Appears in
Collections
ETC > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Dong, Suh Yeon photo

Dong, Suh Yeon
공과대학 (인공지능공학부)
Read more

Altmetrics

Total Views & Downloads

BROWSE