基于自适应角度分类与动态样本匹配的旋转目标检测方法

Rotated object detection method based on adaptive angle classification and dynamic sample matching

  • 摘要: 旋转目标检测旨在精准识别任意方向分布的目标,常用于遥感、工业字符识别等复杂场景。针对角度回归不连续与样本匹配不稳定等挑战,本文提出了一种基于角度分类的新型检测框,该方法在YOLOv8基础上进行了两方面关键改进:设计了结合目标几何形状的自适应角度平滑标签(SA-ASL),将角度预测由回归问题转化为自适应的标签分类问题,提升角度预测精度与稳定性;另外引入了渐进式的动态正负样本匹配机制,融合水平与旋转IOU,增强模型训练过程中的正样本选择质量。本方法在公开的DOTA数据集上的mAP值达到0.786,在工业字符数据集上的mAP值达到了0.924,显示出良好的泛化能力与鲁棒性,证明其在旋转目标检测任务中的实用价值。

     

    Abstract: Rotated object detection (ROD) is a critical subtask in computer vision, especially in real-world applications such as aerial remote sensing and industrial character detection, where objects frequently appear in arbitrary orientations with diverse aspect ratios. Unlike standard object detection that assumes axis-aligned bounding boxes, ROD requires precise estimation of object location and orientation. Conventional rotation regression methods suffer from angle periodicity and discontinuity issues, leading to unstable training and inaccurate predictions. Moreover, densely packed scenes with complex backgrounds make positive and negative sample assignment highly sensitive, often resulting in suboptimal convergence. To tackle these challenges, this work proposes a rotation-aware object detection approach based on YOLOv8, enhanced by two key components: a shape-aware adaptive angle classification strategy and a progressive dynamic matching mechanism.The angle classification strategy replaces traditional continuous angle regression with a discrete classification approach. To achieve this, angle annotations are converted into soft label vectors using a circular Gaussian window function, ensuring the periodic nature of angles is preserved. A novel aspect of this design is the integration of target shape information, where the smoothing parameter of the label distribution is adaptively adjusted based on the object’s aspect ratio. Specifically, for elongated targets such as ships or text lines, a narrow window is applied to enforce sharp classification around the true angle, enabling fine-grained orientation discrimination. Conversely, for square-like or low-aspect-ratio objects, a wider window is applied to tolerate angular ambiguity, stabilizing training across diverse target geometries. This shape-aware mechanism addresses discontinuities at angular boundaries and enhances classification accuracy in multi-oriented detection tasks.To complement the angle classification, a progressive dynamic sample matching mechanism is developed to improve the quality of positive sample selection during training. Instead of relying solely on rotated IoU (rIoU), which is unreliable at early stages when angle predictions are inaccurate, the method begins with horizontal IoU (hIoU) and gradually introduces rIoU via linear interpolation as training proceeds. The final matching score incorporates three components: classification confidence, IoU-based localization quality, and a cosine-based angle consistency term. This unified metric guides the selection of top-K positive samples per ground truth object, ensuring that high-quality matches are emphasized while low-quality or ambiguous matches are suppressed. This progressive transition improves training stability, accelerates convergence, and enhances rotation alignment between predictions and ground truth.Extensive experiments are conducted on two datasets. On the DOTA dataset, which includes multiple object classes with diverse orientations and aspect ratios, the proposed method achieves a mean Average Precision (mAP) of 0.786. This result demonstrates improvements across several categories, particularly those with high aspect ratios such as ships, vehicles, and containers. On a custom industrial character dataset, which consists of densely arranged, multi-oriented alphanumeric components captured under complex conditions, the method achieves a mAP of 0.924. The results validate its ability to generalize to scene text-like tasks. Ablation studies are conducted to isolate the impact of each component. The shape-aware classification contributes a 4.3% improvement in angle-sensitive categories, while the dynamic matching strategy results in smoother loss curves and more concentrated attention heatmaps.The method preserves the anchor-free structure and real-time inference capability of YOLOv8, while significantly enhancing its performance in rotation-sensitive contexts. All modifications are lightweight and easily integrable into existing pipelines without structural changes to the backbone.

     

/

返回文章
返回