Scale-Aware Visual-Inertial Depth Estimation and Odometry Using Monocular Self-Supervised Learningopen access
- Authors
- Lee, Chungkeun; Kim, Changhyeon; Kim, Pyojin; Lee, Hyeonbeom; Kim, H. Jin
- Issue Date
- Mar-2023
- Publisher
- IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
- Keywords
- Odometry; Deep learning; Loss measurement; Depth measurement; Cameras; Self-supervised learning; Coordinate measuring machines; monocular depth estimation; self-supervised learning; visual-inertial odometry
- Citation
- IEEE ACCESS, v.11, pp 24087 - 24102
- Pages
- 16
- Journal Title
- IEEE ACCESS
- Volume
- 11
- Start Page
- 24087
- End Page
- 24102
- URI
- https://scholarworks.sookmyung.ac.kr/handle/2020.sw.sookmyung/152012
- DOI
- 10.1109/ACCESS.2023.3252884
- ISSN
- 2169-3536
- Abstract
- For real-world applications with a single monocular camera, scale ambiguity is an important issue. Because self-supervised data-driven approaches that do not require additional data containing scale information cannot avoid the scale ambiguity, state-of-the-art deep-learning-based methods address this issue by learning the scale information from additional sensor measurements. In that regard, inertial measurement unit (IMU) is a popular sensor for various mobile platforms due to its lightweight and inexpensiveness. However, unlike supervised learning that can learn the scale from the ground-truth information, learning the scale from IMU is challenging in a self-supervised setting. We propose a scale-aware monocular visual-inertial depth estimation and odometry method with end-to-end training. To learn the scale from the IMU measurements with end-to-end training in the monocular self-supervised setup, we propose a new loss function named as preintegration loss function, which trains scale-aware ego-motion by comparing the ego-motion integrated from IMU measurement and predicted ego-motion. Since the gravity and the bias should be compensated to obtain the ego-motion by integrating IMU measurements, we design a network to predict the gravity and the bias in addition to the ego-motion and the depth map. The overall performance of the proposed method is compared to state-of-the-art methods in the popular outdoor driving dataset, i.e., KITTI dataset, and the author-collected indoor driving dataset. In the KITTI dataset, the proposed method shows competitive performance compared with state-of-the-art monocular depth estimation and odometry methods, i.e., root-mean-square error of 5.435 m in the KITTI Eigen split and absolute trajectory error of 22.46 m and 0.2975 degrees in the KITTI odometry 09 sequence. Different from other up-to-scale monocular methods, the proposed method can estimate the metric-scaled depth and camera poses. Additional experiments on the author-collected indoor driving dataset qualitatively confirm the accurate performance of metric-depth and metric pose estimations.
- Files in This Item
-
Go to Link
- Appears in
Collections - Engineering > Department of Mechanical Systems Enginee > 1. Journal Articles

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.