Facial Expression Recognition Method Based on U-Net Architecture Improved VGG19 Model
-
Graphical Abstract
-
Abstract
In response to many problems in traditional facial recognition techniques, such as insufficient attention of network models to key channel features, large parameter quantities, and low recognition accuracy, this paper proposes an improved VGG19 model that incorporates the ideas from the U-Net architecture. While maintaining the deep feature extraction capability of VGG19, the model employs specially designed convolutional layers and skip connections, utilizing feature cropping and stitching techniques to efficiently integrate multi-scale features. This design ensures the harmonious integration of features from different layers, which is crucial for accurate facial expression recognition. Additionally, the paper introduces an improved SEAttention module for facial expression recognition tasks. The innovation in the SEAttention module lies in replacing the original activation function with the Mish activation function, which can dynamically adjust the weights of different channels, ensuring that important features are emphasized while redundant features are suppressed. This selective focus speeds up the convergence of the network and improves the ability of the model to detect subtle changes in facial expressions. Moreover, the parameters of the fully connected layers were modified by replacing the first and second fully connected layers with convolutional layers while retaining the last fully connected layer. The number of nodes in the fully connected layers in the original network was changed from 4096, 4096, 1000 to 7, thus addressing the issue of the large parameter size in the VGG19 network and enhancing its anti-overfitting capability. In this paper, we performed a large number of experiments on the FER2013 dataset and CK+ dataset. The improved VGG19 model significantly improves recognition accuracy compared to the original VGG19 model, with improvements of 1.58% and 4.04% on the FER2013 dataset and CK+dataset, respectively. Additionally, the parameter efficiency of the model was evaluated, showing a reduction in parameter quantity without sacrificing performance. This balance between model complexity and accuracy highlights the practical applicability of the proposed method in real-world facial recognition scenarios. In conclusion, integrating the U-Net architecture and SEAttention module into the VGG19 network has led to significant advancements in facial expression recognition. The improved model not only improves the performance of feature extraction and fusion, but also solves the problem of parameter size and computation efficiency. These innovations contribute to state-of-the-art performance in facial expression recognition, making the proposed method an important contribution to the fields of computer vision and deep learning. The robustness and efficiency of the proposed method make it a promising solution for various applications requiring accurate real-time facial expression analysis, such as human-computer interaction, security systems, and emotion-driven computing. Future work will explore the model's adaptability to other datasets and further optimization techniques to enhance its performance and applicability.
-
-