Abstract:
Lightweight super-resolution reconstruction is an important technology in the field of image processing and has been widely applied in various domains. Although current image super-resolution methods, such as those based on convolutional neural networks and Transformer models, have achieved significant success in this field, they still face considerable challenges and limitations, including excessive parameter counts, high computational complexity, and substantial memory requirements. To address these issues, a novel lightweight image super-resolution reconstruction method named the Information Distillation and Shared Attention Network (IDSA-Net) is proposed, aiming to reduce computational burden while enhancing feature representation capability. First, an attention-sharing mechanism is introduced to construct an Attention-Sharing Distillation Module, allowing subsequent modules to avoid repeatedly computing the spatial attention matrix, which constitutes the main computational overhead in self-attention calculations. This enables cross-layer sharing of matrix information and reduces computational costs. The module integrates local feature extraction with convolutional operations, while self-attention calculations are paired with sequence modeling units, thereby achieving information purification during the feature extraction and distillation stages. This enhances the network's ability to capture effective features and suppresses noise introduced during distillation. Furthermore, a Large-Kernel Channel Information Purification Module is designed, which combines channel shuffle operations and utilizes large-kernel depth wise separable convolutions from large-kernel attention to expand the receptive field. This enhances the model's perception of multi-scale features and global context, allowing for reweighting of the distilled features to strengthen useful information and suppress redundancies. Additionally, a novel Instance Normalization-based Hybrid Attention Module is proposed to learn channel and spatial features. In the channel attention phase, average pooling is performed separately along the width and height directions of the input features to mitigate information loss caused by traditional global pooling. Channel features are extracted using one-dimensional depth wise separable convolutions, avoiding the dimensionality reduction and expansion operations commonly used in conventional methods. This better preserves the integrity of channel features and improves the precision of information representation. In the spatial attention phase, instance normalization is applied to each channel of every sample individually, enabling the model to focus more on local structural features within the image rather than relying on global statistical information. Finally, sub-pixel convolutional layers are employed for up sampling and image reconstruction. Quantitative and visual analyses through comparative and ablation experiments on four public benchmark datasets—Set5, Set14, BSD100, and Urban100—demonstrate that the proposed method outperforms eleven state-of-the-art image super-resolution methods in terms of Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index (SSIM), validating its effectiveness and accuracy.