基于频域感知与无损特征传输的无人机小目标检测网络

UAV Tiny Object Detection Network Based on Frequency-Domain Perception and Lossless Feature Transmission

  • 摘要: 针对无人机航拍图像中微小目标特征易消失以及背景遮挡的难题,本文提出精细化空间感知分布网络 (RSD-Net)。该架构旨在解决现有检测器频率感知缺失及下采样信息丢失的结构性缺陷。具体而言:(1) 设计了阶段自适应特征提取 (SA-C3k2) 模块,利用显式边缘锐化与频域滤波,自适应增强浅层高频纹理并抑制深层背景噪声;(2) 构建重参数化空间无损分布颈部网络(RSD-Neck),结合SPD-Conv无损下采样与全局上下文建模,防止跨尺度特征融合中的语义稀释;(3) 引入双先验感知预测头 (DP-Head),融合显式视觉与隐式几何分布先验,实现鲁棒的定位质量评估。实验表明,RSD-Net 在 VisDrone2019-DET和NWPU VHR-10上,mAP50分别提升了4.99%和5.08%,同时 mAP50:95分别提升了3.42%和7.2%。在TinyPerson 泛化测试中,取得了一定的性能提升,验证了模型在跨域场景下良好的泛化鲁棒性。

     

    Abstract: Unmanned Aerial Vehicle (UAV) -based photography holds immense potential, yet detecting tiny objects remains a significant challenge due to extreme scale variations, complex background interference, and the tendency for feature information to vanish during network transmission. Existing detectors often suffer from structural limitations, specifically frequency-agnostic feature extraction and irreversible information loss from downsampling. To address these issues, this paper proposes a Refined Spatial-aware Distribution Network (RSD-Net), a novel end-to-end architecture designed to establish a full-link spatial awareness mechanism for robust tiny object detection. First, to resolve the mismatch between feature extraction and physical attributes, a Stage-Adaptive Feature Extraction (SA-C3k2) module is designed. Unlike traditional static convolutions, SA-C3k2 incorporates a frequency-domain adaptation mechanism. It utilizes the Scharr operator in shallow layers to explicitly sharpen tiny object edges, enhancing high-frequency texture signals, while employing a learnable Gaussian kernel in deep layers to suppress background noise. This design adaptively balances feature retention with noise suppression. Second, to prevent semantic dilution during cross-scale feature fusion, a Rep-parameterized Spatial-preserving Distribution Neck (RSD-Neck) is constructed. Addressing the limitations of the Nyquist sampling theorem in traditional strided convolutions, this module integrates Space-to-Depth Convolution (SPD-Conv) to achieve lossless downsampling and fine-grained feature alignment. Additionally, it employs a Rep-parameterized Local Adjacent Fusion (Rep-LAF) block to model global context, establishing a high-fidelity pathway for feature transmission. Third, a Dual-Prior Perception Head (DP-Head) is introduced to enhance localization quality estimation. By fusing explicit visual texture priors (derived from gradient magnitude) with implicit geometric distribution priors (derived from regression statistics), a "visual-statistical" dual verification mechanism is established, which significantly improves localization robustness in ambiguous scenarios. Extensive experiments on the VisDrone2019-DET and NWPU VHR-10 datasets demonstrate the effectiveness of the proposed method. Compared with baseline models, RSD-Net improves mAP50 by 4.99% and 5.08%, and mAP50:95 by 3.42% and 7.2%, respectively, while maintaining a lightweight parameter size (5.04M). Furthermore, generalization tests on the TinyPerson dataset verify the model's superior cross-domain robustness, proving its capability to efficiently handle pixel-level tiny objects in diverse aerial environments.

     

/

返回文章
返回