Abstract:
With the rapid development of machine learning and deep neural network and the popularization of intelligent devices, face recognition technology has rapidly developed. At present, the accuracy of face recognition has exceeded that of the human eyes. Moreover, the software and hardware conditions of large-scale popularization are available, and the application fields are widely distributed. As an important part of face recognition technology, facial expression recognition has been a widely studied subject in the fields of artificial intelligence, security, automation, medical treatment, and driving in recent years. Expression recognition, an active research area in human–computer interaction, involves informatics and psychology and has good research prospect in teaching evaluation. Micro-expression, which has great research significance, is a kind of short-lived facial expression that humans unconsciously make when trying to hide some emotion. Different from the general static facial expression recognition, to realize micro-expression recognition, besides extracting the spatial feature information of facial expression deformation in the image, the temporal-motion information of the continuous image sequence also needs to be considered. In this study, given that static expression features lack temporal information, so that the subtle changes in expression cannot be fully reflected, facial dynamic expression sequences were used to fuse spatial features and temporal features, and neural networks were used to provide good features in the field of image classification. Expression sequences were processed, and a micro-expression recognition method based on separate long-term recurrent convolutional network (S-LRCN) was proposed. First, the micro-expression data set was selected to extract the facial image sequence, and the transfer learning method was introduced to extract the spatial features of the expression frame through the pre-trained convolution neural network model, to reduce the risk of overfitting in the network training, and the extracted features of the video sequence were inputted into long short-term memory (LSTM) to process the temporal-domain features. Finally, a small database of learners’ expression sequences was established, and the method was used to assist teaching evaluation.