基于U-Net架构改进VGG19模型的人脸表情识别方法

赵小虎; 张景怡; 焦明之; 谢礼逊; 王兰飞; 孙维青; 张狄

doi:10.13374/j.issn2095-9389.2024.07.24.002

摘要: 针对传统面部识别技术中存在的诸多问题，如网络模型对关键通道特征的关注不足、参数量过大以及识别准确率不高等，本文提出了一种基于改进Visual Geometry Group 19（VGG19）模型的全新方案. 该方案融合了U-Net网络架构的设计理念，并引入了改进的SEAttention模块，以期提高模型的收敛速度和对面部细节的关注程度. 在保持VGG19深层特征提取能力的基础上，通过特定设计的卷积层和跳跃连接，实现了对特征的高效融合与优化. 经过改进的VGG19模型，不仅能更好地提取面部特征，还能在保证准确率的前提下，降低模型参数，提高运算效率. 为了验证改进模型的效果，利用FER2013数据集和CK+两个数据集对本文提出的模型进行了测试. 实验结果显示，改进后的VGG19网络在表情识别的准确率上分别取得了1.58%和4.04%的提升. 这一结果充分证明了本文提出的方法在解决传统面部识别问题方面的优越性，也为面部识别技术的进一步发展提供了新的思路.

Abstract: In response to the challenges faced by traditional facial recognition techniques, such as insufficient focus on key channel features, large number of parameters, and low recognition accuracy, this study proposes an improved VGG19 model that incorporates concepts from the U-Net architecture. While maintaining the deep feature extraction capability of VGG19, which is well-regarded in the field, the model employs specially designed convolutional layers and skip connections. The use of feature cropping and stitching techniques allows the model to efficiently integrate multi-scale features, thereby enhancing the robustness and effectiveness of facial expression recognition tasks. This design ensures the seamless integration of features from different layers, which is crucial for accurate facial expression recognition, as it maximizes the information yielded from each layer. Additionally, this paper introduces an improved SEAttention module, specifically designed for facial expression recognition tasks. The innovation of the SEAttention module lies in replacing the original activation function with the Mish activation function, which can dynamically adjust the weights of different channels to enhance performance. This adjustment ensures that important features are emphasized while redundant features are suppressed, streamlining the recognition process. This selective focus significantly speeds up the convergence of the network and improves the ability of the model to detect subtle changes in facial expressions, which is especially valuable in nuanced emotional contexts. Furthermore, modifications are made to the fully connected layers by substituting the first two layers with convolutional layers while retaining the fully connected final layer. This change reduces the number of nodes in these layers from 4096, 4096, 1000 to just 7, effectively addressing the large parameter size in the VGG19 network. Additionally, this modification improves the resistance of the model to overfitting, making it more robust when applied to new data. Extensive experiments were conducted on the FER2013 and CK+ datasets, demonstrating that the improved VGG19 model significantly enhanced recognition accuracy by 1.58% and 4.04%, respectively, compared to the original version. Furthermore, the parameter efficiency of the model was thoroughly evaluated, which indicated a substantial reduction in the overall parameter count without compromising performance. This balance between model complexity and accuracy highlights the practical applicability of the proposed method in real-world facial recognition scenarios, ensuring that it can be deployed in environments with limited computational resources. In conclusion, integrating the U-Net architecture and enhanced SEAttention module into the VGG19 network led to significant advancements in facial expression recognition. The improved model not only boosts performance in terms of feature extraction and fusion but is also adept in solving the pressing problems of parameter size and computational efficiency. These innovations contribute to achieving state-of-the-art performance in facial expression recognition, making the proposed method an important contribution to advancing computer vision and deep learning. The robustness and efficiency of the proposed method highlight its potential for various applications requiring accurate real-time facial expression analysis, such as human-computer interaction, security systems, and emotion-driven computing. Future work will explore the adaptability of the model to other datasets and additional optimization techniques, aiming to further enhance its performance and expand its applicability across diverse use cases.

基于U-Net架构改进VGG19模型的人脸表情识别方法

U-Net-based VGG19 model for improved facial expression recognition