基于亚特兰大世界和语义信息的室内SLAM方法

于乃功; 程启明; 闫金涵; 付一凡; 谢秋生

doi:10.13374/j.issn2095-9389.2024.11.26.002

摘要: 基于点特征的视觉同步定位与地图构建（SLAM）算法在室内弱纹理环境下存在定位精度差的问题，为此，提出了一种基于语义信息和亚特兰大约束的点线面多重特征视觉SLAM方法. 首先，利用基于注意力机制改进的YOLOv8语义分割方法提取准确的平面特征. 然后，通过地面语义信息完成对亚特兰大坐标系的检测，以避免其他正交坐标系的影响. 最后，通过坐标系状态标记规则选择准确的坐标系并完成无漂移旋转估计. 实验结果显示，改进后的语义分割网络在平面分割mAP值上相较YOLOv8提高了10.9%，SLAM系统的平均绝对轨迹误差比曼哈顿SLAM减少了29.3%. 总体表明，融合语义信息后的亚特兰大SLAM系统可以减少室内弱纹理环境下的累积误差影响，有效提高SLAM系统在室内环境下的定位精度.

Abstract: Visual simultaneous localization and mapping(SLAM) algorithms that rely on point features have long been used in indoor positioning. However, these algorithms face challenges, particularly poor positioning accuracy in complex indoor environments. Indoor environments often have complex layouts, weakly textured surfaces, and varying lighting conditions, all of which undermine the performance of traditional point-based SLAM algorithms. These algorithms struggle to estimate the sensor position and orientation accurately, leading to unreliable mapping and navigation. The SLAM system, utilizing the Atlanta world assumption for visual odometry, track Atlanta coordinate system relative to the map to estimate camera rotation, thus avoiding cumulative errors from inter-frame tracking. Therefore, a visual SLAM method with multiple points, lines, and planes features based on semantic information and Atlanta constraints was proposed. First, an attention mechanism was introduced into the YOLOv8 semantic segmentation network to focus more on planar edge features and avoid the impact of indoor lights and shadows to achieve accurate plane segmentation. The Atlanta coordinate system for the current image frame is then detected using planar normal vectors and ground semantic information. In this study, planar normal vectors are extracted via principal component analysis (PCA), while ground normal vectors verify whether all planes in the current frame possessed Atlanta structural features. The Atlanta structure requires the direction of the planar normal vectors to be either perpendicular or parallel to the direction of the ground normal vectors. Finally, an accurate Atlanta coordinate system is then selected using a coordinate system state-marking rule to complete pose estimation. When integrated with a traditional point-line feature SLAM system, the absence of an Atlanta coordinate system in the current framework introduces cumulative errors. The coordinate system state-marking rule uses the method of mutual correction of the Atlanta coordinate systems to effectively eliminate the influence of cumulative errors. If an Atlanta coordinate system exists in the current frame, it is used to perform drift-free rotation estimation, and employs point-line features between adjacent frames for displacement estimation. Nonlinear optimization of the translation vector is performed by constructing the reprojection errors. If there is no Atlanta coordinate system in the current frame, point-line features are used to estimate the rotation and translation between adjacent frames, and the nonlinear optimization of the rotation matrix and translation vector is achieved by constructing reprojection errors. Experimental results in indoor environments show that the mean Average Precision (mAP) of the improved semantic segmentation network for plane segmentation is 10.9% higher than that of YOLOv8. The average absolute trajectory error of the SLAM system was 29.3% less than that of the Manhattan SLAM and 16.78% less than that of Atlanta SLAM. Overall, this indicates that the Atlanta SLAM system integrated with semantic information can reduce the influence of cumulative errors in indoor environments with weak textures and effectively improve the positioning accuracy of the SLAM system in indoor settings.

基于亚特兰大世界和语义信息的室内SLAM方法

Indoor SLAM algorithm based on atlanta world assumption and semantic information