基于视觉特征引导的复杂环境下道路小目标检测

翟志鹏; 邵金菊; 高松; 段志兵; 尹学浩; 邱致敏; 王磊

doi:10.13374/j.issn2095-9389.2025.08.26.001

基于视觉特征引导的复杂环境下道路小目标检测

Small-object detection on roads in complex environments using visual-feature guidance

摘要

摘要: 复杂场景下的环境感知对自动驾驶安全至关重要. 为提升低光照雾天条件下车辆的感知能力，本文首先基于晴朗天气下的KITTI数据集，引入一种结合深度信息的大气散射模型，用于模拟生成真实的低光照雾天场景数据. 随后，在YOLOv11 (You only look once)框架中设计了多层通道融合模块MLCFM（Multi-layer channel fusion module），通过对通道的分割与重组，增强了各层次特征的提取能力；接着利用语义重要性驱动的动态多尺度检测头，实现了更强的多尺度感知效果；最后，采用ATSS(Adaptive training sample selection)策略自适应地优化正负样本分配，以进一步提升小目标检测性能. 在增强KITTI数据集上的实验结果表明，改进后网络在Car、Cyclist、Pedestrian三类目标上的检测精度分别提高了2.2个百分点、11.8个百分点和7.8个百分点，总体的mAP@0.5提升了7.3个百分点，并通过可视化分析与消融实验进一步验证了各模块在复杂环境下提升检测性能的有效性.

Abstract: Accurate detection of small objects in complex road environments is essential for ensuring the safety, reliability, and robustness of autonomous driving systems. Under adverse conditions such as low illumination and fog, the performance of conventional vision-based perception systems degrades significantly. Images captured by cameras in such environments often exhibit reduced contrast, blurred textures, occluded details, and indistinct object boundaries due to insufficient lighting, light scattering by fog droplets, and atmospheric attenuation. These degradations increase the likelihood of missed and false detections, posing substantial risks in urban traffic scenarios where vulnerable road users, including pedestrians and cyclists, frequently appear. To address these challenges, this study proposes a visual-feature-guided small-object detection framework with systematic enhancements in three areas: training data construction, network architecture design, and adaptive sample allocation. Firstly, to overcome the scarcity of low-light, foggy training data, a depth-aware atmospheric scattering physical model is developed based on the KITTI clear-weather dataset. The model accurately simulates light scattering and attenuation under low-light, foggy conditions by incorporating scene depth, fog density, and illumination intensity. A low-illumination rendering strategy is introduced, and the realism of the generated images is evaluated using the AGGD metric, enabling the creation of diverse and realistic nighttime foggy images. This data augmentation substantially improves the model’s generalization capability under extreme weather conditions. Secondly, in network design, a Multi-Layer Channel Fusion Module (MLCFM) is introduced within the YOLOv11 framework. By splitting, reorganizing, and adaptively weighting feature channels across different levels, MLCFM preserves low-level texture details while enhancing high-level semantic discrimination, which is essential for small-object detection. In addition, a semantics-importance-driven dynamic multi-scale fusion structure is developed to adjust fusion weights based on the semantic contribution of features at different scales. This mechanism strengthens the detection of small objects, such as pedestrians and cyclists, while maintaining global contextual information for larger objects, such as vehicles, thereby improving sensitivity to small objects without compromising overall scene understanding. Finally, to address the difficulty of distinguishing targets from complex backgrounds and the imbalance of positive and negative samples in foggy scenes, an Adaptive Training Sample Selection (ATSS) strategy is introduced. ATSS dynamically determines positive and negative sample assignments based on the spatial distribution and statistical characteristics of candidate bounding boxes, improving the model’s attention to hard samples and reducing training instability under challenging conditions. Extensive experiments—including joint testing and ablation studies on a self-constructed low-light foggy dataset and the original KITTI dataset—demonstrate the effectiveness of the proposed approach. Detection accuracies for the Car, Cyclist, and Pedestrian categories are improved by 2.2, 11.8, and 7.8 percentage points, respectively, with an overall mean average precision (mAP@0.5) improvement of 7.3 percentage points. Visualization results further show that the enhanced network produces clearer and more precise bounding boxes, substantially reducing missed and false detections. In summary, this study presents a systematic small-object detection framework that introduces innovations in training data generation, feature-aware network design, and adaptive sample allocation. The proposed method effectively improves small-object detection performance under low-light foggy conditions, providing critical support for the safety and reliability of autonomous driving perception systems in complex environments.

HTML全文

参考文献(30)

施引文献

资源附件(0)