Abstract:
Visual simultaneous localization and mapping(SLAM) algorithms that rely on point features have long been used in indoor positioning. However, these algorithms face challenges, particularly poor positioning accuracy in complex indoor environments. Indoor environments often have complex layouts, weakly textured surfaces, and varying lighting conditions, all of which undermine the performance of traditional point-based SLAM algorithms. These algorithms struggle to estimate the sensor position and orientation accurately, leading to unreliable mapping and navigation. The SLAM system, utilizing the Atlanta world assumption for visual odometry, track Atlanta coordinate system relative to the map to estimate camera rotation, thus avoiding cumulative errors from inter-frame tracking. Therefore, a visual SLAM method with multiple points, lines, and planes features based on semantic information and Atlanta constraints was proposed. First, an attention mechanism was introduced into the YOLOv8 semantic segmentation network to focus more on planar edge features and avoid the impact of indoor lights and shadows to achieve accurate plane segmentation. The Atlanta coordinate system for the current image frame is then detected using planar normal vectors and ground semantic information. In this study, planar normal vectors are extracted via principal component analysis (PCA), while ground normal vectors verify whether all planes in the current frame possessed Atlanta structural features. The Atlanta structure requires the direction of the planar normal vectors to be either perpendicular or parallel to the direction of the ground normal vectors. Finally, an accurate Atlanta coordinate system is then selected using a coordinate system state-marking rule to complete pose estimation. When integrated with a traditional point-line feature SLAM system, the absence of an Atlanta coordinate system in the current framework introduces cumulative errors. The coordinate system state-marking rule uses the method of mutual correction of the Atlanta coordinate systems to effectively eliminate the influence of cumulative errors. If an Atlanta coordinate system exists in the current frame, it is used to perform drift-free rotation estimation, and employs point-line features between adjacent frames for displacement estimation. Nonlinear optimization of the translation vector is performed by constructing the reprojection errors. If there is no Atlanta coordinate system in the current frame, point-line features are used to estimate the rotation and translation between adjacent frames, and the nonlinear optimization of the rotation matrix and translation vector is achieved by constructing reprojection errors. Experimental results in indoor environments show that the mean Average Precision (mAP) of the improved semantic segmentation network for plane segmentation is 10.9% higher than that of YOLOv8. The average absolute trajectory error of the SLAM system was 29.3% less than that of the Manhattan SLAM and 16.78% less than that of Atlanta SLAM. Overall, this indicates that the Atlanta SLAM system integrated with semantic information can reduce the influence of cumulative errors in indoor environments with weak textures and effectively improve the positioning accuracy of the SLAM system in indoor settings.