Abstract:
High-precision positioning and navigation technology are crucial for the autonomous operation of unmanned aerial vehicles (UAVs), enabling them to determine their location and navigate to predetermined destinations without human intervention. In scenarios where satellite navigation is unavailable, image matching–based visual navigation technology becomes essential owing to its simple device structure and high accuracy in passive positioning. When combined with inertial systems, this technology creates a highly autonomous and precise navigation system. Compared with traditional simultaneous localization and mapping for visual navigation, which requires extensive computation for continuous point cloud mapping, scene matching ensures real-time performance without such demands. At the core of the image-matching system is the registration of real-time captured images with preloaded reference images, a task complicated by the high-speed flight of UAVs and diverse image sources. This necessitates a rapid and robust registration process while maintaining high precision. To tackle these challenges head-on, we developed a novel descriptor known as dimensionality reduction second-order oriented gradient histogram (DSOG), which is characterized by its high precision and robustness, making it ideal for image matching. It effectively extracts image features by delineating pixel characteristics of oriented gradients and uses a regional feature extraction strategy. This is advantageous over point and line features, especially when handling nonlinear intensity differences among heterogeneous images during matching, enabling precise matching of image data collected by different sensors and satisfying high-precision navigation needs under all-weather conditions for aerial vehicles. Building upon this descriptor, we have crafted an optimized similarity measurement matching template. This enhances the traditional fast similarity measurement algorithm, which uses fast Fourier transform in the frequency domain, thereby reducing computational redundancy inherent in the matching process. Our framework has been rigorously evaluated across diverse multimodal image pairs, including optical–optical, optical–SAR, and optical–hyperspectral datasets. Our algorithm has been compared with current state-of-the-art image registration methods, including traditional feature–based approaches such as DSOG, histogram of oriented phase congruency (HOPC), and radiation-variation insensitive feature transform (RIFT), as well as deep learning–based techniques such as Loftr and Superpoint. The results demonstrate that our method considerably improves computational efficiency while maintaining matching precision. Moreover, unlike deep learning algorithms that require extensive data training for generalization, our algorithm achieves the necessary level of generalization without such extensive training. In particular, our algorithm achieves an average matching time of only 1.015 s for multimodal images, meeting real-time performance and robustness requirements for UAV scene–matching navigation. Our study not only offers innovative solutions for enhancing the precision and reliability of UAV navigation systems but also carries substantial practical significance. It has broad application potential in military, civil, and commercial sectors, thereby shaping the future of autonomous navigation in the aerospace industry.