Abstract:
A lightweight railway panoramic segmentation method based on channel mixing and cross-scale enhancement was proposed to address the challenges of low accuracy and difficulty of achieving lightweight real-time panoramic segmentation in high-speed railway scene images. The model consists of three main components: a lightweight CS_FasterNet feature extraction network, multi-scale feature interaction enhancement module, and prediction fusion output module. First, based on the FasterNet network, a lightweight CS_FasterNet feature extraction network was proposed. FasterNet reduces redundant calculations through partial convolution to enhance processing speed while preserving high detection accuracy. However, the original design applies filters to only a portion of the input channels, potentially limiting feature extraction for the remaining channels. This limitation was addressed by optimizing the aggregation of local and global feature information through partial convolution and channel shuffling, combined with feature recombination techniques to reduce computational complexity and improve feature extraction in lightweight railway scenes. Second, based on the completion of the lightweight CS_FasterNet feature extraction, a multiscale feature interaction enhancement module was designed to improve the network’s image segmentation ability and enhance the representation of features. This module consists of an attention-based intrascale feature interaction module and a cross-scale feature fusion module. The attention-based intrascale feature interaction module applies a multihead attention mechanism to extract pixel-level semantic features from high-level image features, expanding the receptive field and capturing fine-grained information. The cross-scale feature fusion module adopts both bottom-up and top-down fusion paths to integrate feature maps of different scale outputs from the backbone network, improving scale feature utilization and enabling comprehensive extraction of local details and global information. Finally, the prediction fusion module was refined to integrate the semantic and instance results. In the panoramic segmentation task, the Soft NMS method was used to improve the accuracy of pixel classification. Soft NMS reduces confidence scores for detection boxes based on the intersection-to-union ratio and uses a Gaussian weighted score to identify the true detection box, leading to improved segmentation accuracy and more refined panoramic segmentation outputs. Experimental results indicate that the proposed lightweight model excels in frame rate per second and computational complexity. A higher frame rate indicates faster segmentation speed, and lower computational complexity favors lightweight segmentation. In this model, evaluation metrics, such as the processing frame rate per second and computational complexity, outperformed the comparison methods. Compared with the UPSNet method, this method increased the processing frame rate by approximately 11.5 frames per second and improved the quality of the panoramic segmentation by approximately 9.9%. This method achieves accurate, real-time panoramic segmentation across various railway scenarios.