Abstract:
Accurate traffic-sign detection is a foundational capability for intelligent transportation systems and autonomous driving technologies; however, it remains a formidable challenge in real-world environments characterized by small scales, severe occlusions, highly variable lighting conditions, and complex backgrounds. Traditional convolutional neural network (CNN)-based detectors often struggle to maintain reliable performance when traffic signs appear at long distances or become partially hidden by vehicles, foliage, or roadside infrastructure owing to inherent limitations in feature extraction, scale sensitivity, and model robustness. To overcome these limitations, this paper presents an enhanced RT-DETR-based approach specifically tailored for occluded-traffic-sign detection under resource-constrained conditions. First, recognizing the scarcity of publicly available data that accurately reflect occlusion scenarios, we curated the traffic sign dataset under occlusion conditions (TSDOC), which comprises
4698 high-resolution images annotated across eight common traffic sign categories—including prohibitory, warning, and indicative signs—with
3572 images allocated for training and
1126 for testing. TSDOC systematically simulates real driving environments by incorporating diverse occlusion types, such as partial masking by other vehicles, foreign object attachment, dynamic shadows, and varying degrees of weather-induced visibility reduction. This enables a rigorous evaluation of detection methods under complex, safety-critical scenarios that closely mirror roadside conditions. Second, to improve the small and occluded object representation without incurring in excessive computational overhead, we redesigned the RT-DETR backbone by replacing the standard ResNet-18 BasicBlock with a novel composite dilated residual block (CDRB). Each CDRB integrates a dilated reparameterization block (DRB) into an inverted residual mobile block (iRMB), thereby combining multi-scale dilated convolutions that capture long-range pixel dependencies essential for reconstructing partially visible sign features with structural reparameterization techniques that streamline the inference graph for reduced latency. Consequently, the modified backbone achieves a 26.0% reduction in parameter count and a 12.5% decrease in floating-point operations per second (GFLOPs) compared to the baseline RT-DETR-R18, while maintaining or improving feature discrimination for occluded targets. Third, for faster convergence and enhanced localization precision—particularly for small and partially occluded signs—we introduce the dynamic scaled IoU loss (DS-IoU), a novel joint loss function that integrates Inner-IoU’s auxiliary bounding box strategy with a dynamically adjustable scaling factor Ratio and incorporates the minimal point distance metric from MPDIoU. This adaptive loss formulation emphasizes interior region overlap and geometric consistency during training, effectively replacing the conventional GIoU loss and enabling the model to focus on the most informative spatial regions under challenging conditions. Comprehensive experiments demonstrate the effectiveness of the proposed approach. On the TSDOC, TT100K, and CCTSDB2021 benchmarks, the proposed model achieved a mean average precision (mAP) of 94.2%, 92.8%, and 91.7%, respectively (a 4.7%, 3.1%, and 2.4% gain over RT-DETR). The real-time inference speed reached 112.8 s
−1 a 18.5% improvement over RT-DETR. Ablation studies show that replacing the backbone with CDRB yields a 2.8% mAP increase, while DS-IoU further boosts recall under occlusion by 3.7%. This lightweight architecture and optimized loss function deliver higher detection accuracy and efficiency in occluded-traffic-sign scenarios, making it well suited for deployment in resource-constrained embedded systems.