Abstract:
Retinal vascular segmentation holds significant value in medical research, playing an indispensable role in facilitating the screening of various diseases, such as diabetes, hypertension, and glaucoma. However, most current retinal vessel segmentation methods mainly rely on convolutional neural networks, which present limitations when dealing with long-term dependencies and global context connections. These limitations often result in poor segmentation of small blood vessels and low contrast between the ends of fundus blood vessel branches and the background. Addressing these issues is a pressing concern. To tackle these challenges, this paper proposes a new retinal blood vessel segmentation model, namely Dual Swin Transformer Fusion (DS-TransFusion). This model uses a two-scale encoder subnetwork based on a Swin Transformer, which is able to find correspondence and align features from heterogeneous inputs. Given an input image of a retinal blood vessel, the model first splits it into two nonoverlapping blocks of different sizes. These are then fed into the two branches of the encoder to extract coarse-grained and fine-grained features of the retinal blood vessels. At the jump junction, DS-TransFusion introduces the Transformer interactive fusion attention (TIFA) module. The core of this module is to use a multiscale attention (MA) mechanism to facilitate efficient interaction between multiscale features. It integrates features from two branches at different scales, achieves effective feature fusion, enriches cross-view context modeling and semantic dependency, and captures long-term correlations between data from different image views. This, in turn, enhances segmentation performance. In addition, to integrate multiscale representation in the hierarchical backbone, DS-TransFusion introduces an MA module between the encoder and decoder. This module learns the feature dependencies across different scales, collects the global correspondence of multiscale feature representations, and further optimizes the segmentation effect of the model. The results showed that DS-TransFusion performed impressively on public data sets STARE, CHASEDB1, and DRIVE, with accuracies of 96.50%, 97.22%, and 97.80%, and sensitivities of 84.10%, 84.55%, and 83.17%, respectively. Experimental results show that DS-TransFusion can effectively improve the accuracy of retinal blood vessel segmentation and accurately segment small blood vessels. Overall, DS-TransFusion, as a Swin Transformer-based retinal vessel segmentation model, has achieved remarkable results in solving the problems of unclear segmentation of small vessels and global context connection. Experimental results on several public data sets have validated the effectiveness and superiority of this method, suggesting its potential to provide more accurate retinal vascular segmentation results for auxiliary screening of various diseases.