基于多源传感器融合的抗天气干扰目标检测方法

周健鑫; 莫磊

doi:10.13374/j.issn2095-9389.2025.09.30.003

基于多源传感器融合的抗天气干扰目标检测方法

周健鑫,
莫磊

Robust target detection method against weather interference based on multisource sensor fusion

摘要

摘要: 在复杂天气条件下，单一传感器的目标检测性能易受影响，难以满足自动驾驶与智能交通等应用对鲁棒性的需求. 针对这一问题，本文提出一种基于四维（4D）毫米波雷达与激光雷达数据融合的目标检测方法——SeparateFusion. 该方法利用两类传感器在感知能力上的互补性，通过神经网络模型实现多源信息的高效融合. 首先，设计了三维早期融合模块GSE编码器，将两类点云映射至相同的柱状体视图，并分别对LiDAR和Radar点云的几何信息与语义信息进行增强处理，再提取柱状体特征，实现多模态数据的底层融合. 其次，提出二维特征提取增强模块BMM，在鸟瞰图（BEV）视图下引入MambaVisionMixer结构增强空间特征建模能力，并结合门控机制自适应过滤冗余信息，提升特征表达的有效性. 在公开的多模态数据集View-of-Delft（VOD）上的实验表明，该方法在一般天气及雨雾等恶劣天气下的目标检测精度和稳定性均优于多种现有的目标检测方法，能够有效减弱天气干扰对检测性能的影响. 研究结果验证了结合GSE编码器与BMM模块的SeparateFusion网络在多源传感器融合与抗干扰目标检测中的有效性，为全天候智能感知提供了一种可行方案.

Abstract: Robust object detection under adverse weather conditions remains a pressing challenge associated with autonomous driving and intelligent transportation, because single-sensor systems are prone to performance degradation in rain, fog, or snow. To address this issue, we propose SeparateFusion, a novel multisensor fusion framework that integrates four-dimensional (4D) millimeter-wave radar and LiDAR data via a deep neural network. By leveraging the resilience of radar to weather interference and the high spatial resolution of LiDAR, SeparateFusion delivers accurate, stable perception across diverse environments. The core idea is to treat geometry and semantics as complementary signals that should be modeled along separate but interacting paths to preserve the strengths of each modality, while noise and misalignment are mitigated early in the pipeline. The architecture comprises two key modules: the geometry–semantic enhancement (GSE) encoder for early three-dimensional (3D) fusion, and the bird’s-eye-view (BEV) feature enhancement module (BMM) for two-dimensional feature refinement. In practice, BMM includes a lightweight multiscale gating unit that operates alongside a Mamba-based mixer to refine BEV features. In the first stage, LiDAR and radar point clouds are independently projected into a shared pillar grid, ensuring spatial alignment. The GSE encoder enhances geometric and semantic information of each modality separately. Geometric features capture structural layouts from point coordinates, while semantic features encode attributes like intensity, Doppler velocity, and reflectivity. The encoder applies neighborhood-aware updates that preserve spatial continuity in the geometric stream, while allowing semantic cues to guide cross-modal correspondence. Restricting cross-modal interaction primarily to the semantic subspace mitigates discretization and registration errors that might otherwise propagate through deeper layers, while the geometric stream preserves the neighborhood structure for stable aggregation. Following this enhancement, pillar-level features are extracted, enabling early-stage multimodal fusion that aligns and preserves modality-specific advantages. In the second stage, the fused features are transformed into a BEV representation. The BMM module processes this representation using the MambaVisionMixer structure to capture both local and long-range dependencies in the spatial domain. In addition, a gating mechanism suppresses redundant or noisy signals, allowing the network to focus on discriminative information for detection. This two-stage design provides a balance between fine-grained geometry–semantic modeling in 3D space and high-level spatial reasoning in BEV space, contributing to strong robustness against weather-related degradation. Extensive experiments on the View-of-Delft (VoD) dataset show that our method consistently outperforms both state-of-the-art single-sensor detectors and existing multisensor fusion approaches. It achieves a mean average precision of 71.47% across the entire test area and one of 85.74% within the driving corridor, demonstrating notable gains in both global and lane-focused detection scenarios. Category-wise analysis further indicates consistent improvements for vehicles and vulnerable road users, with clearer benefits at longer ranges, where LiDAR sparsification and reflectivity decay are more severe. We follow the standard VoD protocol for training and evaluation and provide implementation details to facilitate reproducibility using the same splits and metrics. Additional evaluations on fog and snow simulation datasets confirm that SeparateFusion maintains clear advantages over previous methods in low-visibility conditions, indicating strong generalization capability. Ablation studies further validate the contributions of the GSE encoder and BMM module, showing that removing either component results in a significant drop in detection accuracy. This highlights the complementary nature of early 3D geometry–semantic enhancement and later-stage BEV feature gating. In summary, SeparateFusion introduces a structured two-stage fusion approach for integrating radar and LiDAR data, incorporating both early 3D geometry–semantic enhancement and later-stage BEV refinement with adaptive gating. The method achieves significant improvements over powerful single-sensor and existing fusion-based object detection methods under challenging weather, laying a promising foundation for next-generation all-weather intelligent perception intended for safety-critical applications.

HTML全文

参考文献(38)

施引文献

资源附件(0)