UAV Tiny Object Detection Network Based on Frequency-Domain Perception and Lossless Feature Transmission
-
Graphical Abstract
-
Abstract
Unmanned Aerial Vehicle (UAV) -based photography holds immense potential, yet detecting tiny objects remains a significant challenge due to extreme scale variations, complex background interference, and the tendency for feature information to vanish during network transmission. Existing detectors often suffer from structural limitations, specifically frequency-agnostic feature extraction and irreversible information loss from downsampling. To address these issues, this paper proposes a Refined Spatial-aware Distribution Network (RSD-Net), a novel end-to-end architecture designed to establish a full-link spatial awareness mechanism for robust tiny object detection. First, to resolve the mismatch between feature extraction and physical attributes, a Stage-Adaptive Feature Extraction (SA-C3k2) module is designed. Unlike traditional static convolutions, SA-C3k2 incorporates a frequency-domain adaptation mechanism. It utilizes the Scharr operator in shallow layers to explicitly sharpen tiny object edges, enhancing high-frequency texture signals, while employing a learnable Gaussian kernel in deep layers to suppress background noise. This design adaptively balances feature retention with noise suppression. Second, to prevent semantic dilution during cross-scale feature fusion, a Rep-parameterized Spatial-preserving Distribution Neck (RSD-Neck) is constructed. Addressing the limitations of the Nyquist sampling theorem in traditional strided convolutions, this module integrates Space-to-Depth Convolution (SPD-Conv) to achieve lossless downsampling and fine-grained feature alignment. Additionally, it employs a Rep-parameterized Local Adjacent Fusion (Rep-LAF) block to model global context, establishing a high-fidelity pathway for feature transmission. Third, a Dual-Prior Perception Head (DP-Head) is introduced to enhance localization quality estimation. By fusing explicit visual texture priors (derived from gradient magnitude) with implicit geometric distribution priors (derived from regression statistics), a "visual-statistical" dual verification mechanism is established, which significantly improves localization robustness in ambiguous scenarios. Extensive experiments on the VisDrone2019-DET and NWPU VHR-10 datasets demonstrate the effectiveness of the proposed method. Compared with baseline models, RSD-Net improves mAP50 by 4.99% and 5.08%, and mAP50:95 by 3.42% and 7.2%, respectively, while maintaining a lightweight parameter size (5.04M). Furthermore, generalization tests on the TinyPerson dataset verify the model's superior cross-domain robustness, proving its capability to efficiently handle pixel-level tiny objects in diverse aerial environments.
-
-