Efficient deformable modeling network for multi-view 3D object detection
Citations

WEB OF SCIENCE

0
Citations

SCOPUS

0

초록

Multi-view 3D object detection is a critical component of camera-based autonomous driving systems. While Bird’s-Eye View (BEV) methods provide strong spatial reasoning, they often suffer from vertical information loss and high computational overhead. More recent sparse query-based approaches improve efficiency but still struggle with aligning 3D queries to image features and maintaining stable optimization during training. In this work, we present a novel deformable modeling framework that advances sparse query-based 3D object detection through enhanced geometric and motion-aware representation learning. Our approach introduces (i) a 4D query encoding that jointly models object position, scale, orientation, and velocity; (ii) structured denoising across all box parameters to improve early training stability; and (iii) distance-aware feature sampling that enhances multi-view feature alignment. We further employ a lightweight 2D detector for query initialization, eliminating the need for depth supervision. Importantly, all components operate independently of the image backbone, allowing seamless integration with both Convolutional Neural Network (CNN) and Transformer-based architectures. Experiments on the nuScenes validation set demonstrate that our method achieves the highest mean Average Precision (mAP) (45.5%) and second-highest nuScenes Detection Score (NDS) (55.1%) among ResNet-50 based on camera-only detectors, slightly outperforming Stream Position Embedding Transformation (StreamPETR) and closely matching Divided View Position Embedding (DVPE), despite using fewer input frames. Our approach also converges twice as fast and achieves leading performance on key localization and scale metrics (mean Average Translation Error (mATE), mean Average Scale Error (mASE), mean Average Attribute Error (mAAE)), validating its effectiveness and efficiency as a modular enhancement for modern 3D object detection systems.

키워드

4D query denoisingAuxiliary 2D detectorMulti-view 3D Object DetectionObject DetectionSparse query-based framework
제목
Efficient deformable modeling network for multi-view 3D object detection
저자
Lee, Han-LimAisha, Qurat Ul AinKim, Byung-Gyu
DOI
10.1007/s00521-026-11881-y
발행일
2026-03
유형
Article
저널명
Neural Computing and Applications
38
5