基于视觉特征引导的复杂环境下道路小目标检测

翟志鹏; 邵金菊(通讯作者); 高松; 段志兵; 尹学浩; 邱志敏; 王磊

doi:10.13374/j.issn2095-9389.2025.08.26.001

基于视觉特征引导的复杂环境下道路小目标检测

Road small target detection in complex environments based on visual feature guidance

摘要

摘要: 复杂场景下的环境感知对自动驾驶安全至关重要。为提升低光照雾天条件下车辆的感知能力，本文首先基于晴朗天气下的KITTI 数据集，引入一种结合深度信息的大气散射模型，用于模拟生成真实的低光照雾天场景。随后，在YOLOv11 (You Only Look Once)框架中设计了多层通道融合模块（Multi-Layer Channel Fusion Module），通过对通道的分割与重组，增强了各层次特征的提取能力；接着利用语义重要性驱动的动态多尺度融合结构，实现了更强的多尺度感知效果；最后，采用 ATSS (Adaptive Training Sample Selection)策略自适应地优化正负样本分配，以进一步提升小目标检测性能。实验与消融研究表明，改进后网络在 Car、Cyclist、Pedestrian 三类目标上的检测精度分别提高了2.2%、11.8% 和7.8%，总体的mAP 提升了7.3%，从而验证了所提方法的有效性，并通过可视化与定量指标深入分析了各模块的贡献,解释了所提网络的有效性。

Abstract: The capability to detect small objects in complex road environments is crucial for the safety and robustness of autonomous driving systems. Especially under adverse weather conditions such as low illumination and fog, the images captured by visual sensors suffer from reduced contrast, blurred details, and unclear object boundaries due to insufficient lighting, droplet occlusion, and atmospheric scattering effects, which easily lead to missed and false detections. To address this challenge, this paper proposes a small object detection method guided by visual features for complex environments, from three aspects: training data construction, network architecture design, and sample allocation strategy. Firstly, to overcome the lack of nighttime foggy training data, a depth-aware atmospheric scattering physical model is designed based on the clear-weather KITTI dataset. This model realistically simulates the scattering and attenuation of light in fog by considering scene depth, fog density, and illumination intensity, and introduces a low-illumination rendering strategy to generate diverse and realistic nighttime foggy images. This expanded dataset significantly improves the model’s generalization ability under extreme weather conditions. Secondly, on the detection network side, a Multi-Layer Channel Fusion Module (MLCFM) is proposed based on the YOLOv11 framework. By splitting, reorganizing, and adaptively weighting feature channels at different levels, the module preserves low-level texture details while enhancing high-level semantic discrimination, effectively extracting critical features for small objects.Then, a semantics-importance-driven dynamic multi-scale fusion structure is designed. It dynamically adjusts fusion weights based on the semantic contribution of features at different scales to various object categories, strengthening perception of small-sized targets such as pedestrians and cyclists, while maintaining global contextual information for larger targets like vehicles.

HTML全文

参考文献(0)

施引文献

资源附件(0)