Monocular 3D target detection for intelligent vehicle based on dynamic bins and deep uncertainty
-
Graphical Abstract
-
Abstract
To address the low target recognition accuracy caused by insufficient multi-scale information capture capability of backbone networks, large 3D depth prediction errors, and limited depth information encoding capability in existing monocular 3D object detection for autonomous driving, a monocular 3D object detection algorithm (MonoDBDU) for autonomous vehicles is proposed. Firstly, to tackle the insufficient multi-scale information capture capability of backbone networks, a Bidirectional Attention-Gated Feature Fusion Module (BAGFF) is proposed. An attention-gated module is incorporated, which takes deep high-semantic features as guidance signals to dynamically weight shallow high-resolution features. A multi-scale feature fusion link is constructed via a bidirectional hierarchical semantic transmission mode, and the ResNeSt50 network is integrated to form the BAGFF-ResNeSt50, which serves as the backbone feature extraction network of the MonoDBDU framework. Secondly, to resolve the large 3D depth prediction errors, a dynamic Bins depth predictor fused with depth uncertainty is designed. Uncertainty probability is incorporated into depth features, and the width and central position of Bins intervals are set as learnable parameters, thereby addressing the large depth errors induced by depth prediction and Bins distribution. The predictor is adopted as the depth estimation component of the proposed algorithm. Finally, to overcome the limited depth information encoding capability of Transformers, a lightweight Transformer fused with depth uncertainty (DU-Transformer) is proposed, and an encoding-decoding structure guided by depth uncertainty is constructed. Experimental validation via simulations demonstrates that MonoDBDU improves AP3D and APBEV by 3.28 and 1.01, respectively, on the KITTI dataset at IOU=0.7, and by 2.78 and 2.71, respectively, on the Waymo dataset at IOU=0.7. Real-vehicle experiments further verify that MonoDBDU exhibits favorable practicability and effectiveness.
-
-