通道混洗与跨尺度增强的轻量级铁路全景分割

陈永; 周方春; 周建宇

doi:10.13374/j.issn2095-9389.2024.10.23.001

摘要: 针对高速铁路场景下图像全景分割时存在全景分割精确度低，难以实现轻量级实时分割等问题，提出了一种通道混洗与跨尺度增强的轻量级铁路全景分割方法. 首先，基于FasterNet网络，提出了轻量化CS_FasterNet特征提取网络，通过部分卷积和通道混洗优化了特征信息的聚合，实现对铁路场景下全景分割轻量化特征提取. 其次，设计了多尺度特征交互增强模块，利用特征交互和跨特征融合，全面地捕捉局部的细节和全局信息，提高图像特征提取的质量. 最后，改进预测融合模块对语义结果与实例结果进行融合，提升网络对图像分割的准确性，得到更加精细的全景分割输出结果. 实验结果表明：所提轻量级模型在模型每秒处理帧率和计算量等评价指标均优于对比方法，相较于UPSNet方法，本文方法的每秒约处理11.5帧，全景分割质量提升了约9.9%，能够实现对不同铁路场景下图像全景分割的准确性和实时性.

Abstract: A lightweight railway panoramic segmentation method based on channel mixing and cross-scale enhancement was proposed to address the challenges of low accuracy and difficulty of achieving lightweight real-time panoramic segmentation in high-speed railway scene images. The model consists of three main components: a lightweight CS_FasterNet feature extraction network, multi-scale feature interaction enhancement module, and prediction fusion output module. First, based on the FasterNet network, a lightweight CS_FasterNet feature extraction network was proposed. FasterNet reduces redundant calculations through partial convolution to enhance processing speed while preserving high detection accuracy. However, the original design applies filters to only a portion of the input channels, potentially limiting feature extraction for the remaining channels. This limitation was addressed by optimizing the aggregation of local and global feature information through partial convolution and channel shuffling, combined with feature recombination techniques to reduce computational complexity and improve feature extraction in lightweight railway scenes. Second, based on the completion of the lightweight CS_FasterNet feature extraction, a multiscale feature interaction enhancement module was designed to improve the network’s image segmentation ability and enhance the representation of features. This module consists of an attention-based intrascale feature interaction module and a cross-scale feature fusion module. The attention-based intrascale feature interaction module applies a multihead attention mechanism to extract pixel-level semantic features from high-level image features, expanding the receptive field and capturing fine-grained information. The cross-scale feature fusion module adopts both bottom-up and top-down fusion paths to integrate feature maps of different scale outputs from the backbone network, improving scale feature utilization and enabling comprehensive extraction of local details and global information. Finally, the prediction fusion module was refined to integrate the semantic and instance results. In the panoramic segmentation task, the Soft NMS method was used to improve the accuracy of pixel classification. Soft NMS reduces confidence scores for detection boxes based on the intersection-to-union ratio and uses a Gaussian weighted score to identify the true detection box, leading to improved segmentation accuracy and more refined panoramic segmentation outputs. Experimental results indicate that the proposed lightweight model excels in frame rate per second and computational complexity. A higher frame rate indicates faster segmentation speed, and lower computational complexity favors lightweight segmentation. In this model, evaluation metrics, such as the processing frame rate per second and computational complexity, outperformed the comparison methods. Compared with the UPSNet method, this method increased the processing frame rate by approximately 11.5 frames per second and improved the quality of the panoramic segmentation by approximately 9.9%. This method achieves accurate, real-time panoramic segmentation across various railway scenarios.

通道混洗与跨尺度增强的轻量级铁路全景分割

Lightweight railway panoramic segmentation based on channel shuffle and cross-scale enhancement