Abstract:
Deep neural network-based video classification models enjoy widespread use because of their superior performance on visual tasks. However, with its broad-based application comes a deep-rooted concern about its security aspect. Recent research signals highlight these models’ high susceptibility to deception by adversarial examples. These adversarial examples, subtly laced with humanly imperceptible noise, escape the scope of human detection while posing a substantial risk to the integrity and security of these deep neural network constructs. Considerable research has been directed toward image-based adversarial examples, resulting in notable advances in understanding and combating such adversarial attacks within that scope. However, video-based adversarial attacks highlight a different landscape of complexities and challenges. The nuances of motion information, temporal coherence, and frame-to-frame correlation introduce a multidimensional battlefield, necessitating purpose-built solutions. The most straightforward implementation of adversarial attacks uses the fast gradient sign method (FGSM). Unfortunately, FGSM attacks lack several respects: the attack success rates are far from satisfactory, they are frequently easily identifiable, and their stealth measures do not pass muster in rigorous environments. Therefore, this study introduces a novel nonlinear conjugate gradient attack method inspired by the nonlinear conjugate gradient descent method. By relaxing the search step size constraints to comply with the strong Wolfe conditions, we aimed to maintain pace with the increasing loss value of our objective function. This critical enhancement helps maintain the trajectory of each iteration’s search direction and the simultaneous increase in the loss value, thereby yielding more consistent results, which ensures that our attack method can achieve a high attack success rate and concealment after each iteration. Further invigorating testament to our approach’s efficacy came through experimental results on the UCF101 dataset, underlining an impressive 91% attack success rate when the perturbation upper limit is 3/255. Our method consistently and markedly outshone the FGSM in attack success rates across various perturbation thresholds—even as it offered superior stealth. More critically, it allowed us to strike an effective balance between the attack success rate and runtime, a potent recipe for a disruptive contribution to the fraternity of adversarial attacks in video classification models. This adversarial attack method considers generating video adversarial examples from an optimization perspective. This represents a step forward in the ongoing drive to develop robust, reliable, and efficient techniques to understand adversarial attacks, specifically for deep neural network-based video classification models.