面向抓取检测的位姿估计数据集自动采集标注系统

Automatic data collection and annotation system for a pose estimation dataset designed for grasping detection

  • 摘要: 机器人抓取在物流分拣、自动装配和医疗手术等领域中具有广泛的应用. 抓取检测是机器人抓取中的重要步骤之一,随着三维传感器的成本逐渐降低,抓取检测任务中越来越多地使用深度相机采集彩色图像和深度图像对(RGB-D),并采用基于位姿估计的方法实现机器人抓取. 然而,目前已经公开的基于RGB-D图像的位姿估计数据集,大多需要借助价格昂贵的三维激光扫描仪获得目标物体的三维模型,而且标注过程依赖人工操作,费时费力,不利于大规模数据集的制作. 为此,本文设计并实现了一个面向位姿估计的数据集自动采集标注系统. 该系统无需使用三维激光扫描仪,只通过采集、分析由深度相机获得的RGB-D图像序列即可重建出目标物体的三维模型,并自动标注目标物体的位姿信息,生成二维图像中的分割掩码. 实验中,使用该系统制作了包含84个物体、8400张RGB-D图像的位姿估计数据集,并将自动标注数据与手动标注数据进行了对比,发现两者分割掩码重合率可以达到98%,并且自动标注的位姿信息能够使模型点云与场景点云的对齐率达到100%,充分说明了所提系统自动标注结果的准确性与可靠性.

     

    Abstract: Robotic grasping has extensive applications in fields such as logistics sorting, automated assembly, and medical surgery. Grasping detection is an important step in robotic grasping. Recently, with the decrease in their costs, depth cameras have been gradually applied for grasping detection, which has promoted the application of pose estimation-based methods for robotic grasping. However, most publicly available RGB-D image-based pose estimation datasets rely on equipment such as expensive 3D laser scanners to obtain 3D models of target objects. Meanwhile, the annotation process relies heavily on manual operation, which is time-consuming, labor-intensive, and unfavorable for the creation of large-scale datasets. To address these issues, this study implements a dataset automatic acquisition and annotation system aimed at developing RGB-D image-based pose estimation methods for robotic grasping. The proposed system deploys easily and does not require an expensive 3D laser scanner. RGB-D image sequences are obtained only by an off-the-shelf depth camera, and the system can automatically acquire the reconstructed 3D model of the target object, annotated pose information, and 2D image segmentation masks. During the process of developing the automatic annotation algorithm for the proposed system, a novel minimum spanning tree-based normal propagation method is proposed to guarantee that consistent normal directions can be acquired so that deformations or tearing on the reconstructed 3D surface caused by inconsistent normal directions can be avoided. During the experiments, the proposed system created a pose estimation dataset containing 84 objects with 8400 RGB-D images. 3D models, image segmentation masks, and 6D poses were annotated by the system in every RGB-D image for each object. To evaluate the accuracy of the annotated segmentation masks, the annotated segmentation masks and the corresponding manually labeled results were compared. Furthermore, the accuracy of the annotation results was also assessed from the performance of an instance segmentation network trained by the annotated image masks. To evaluate the accuracy of the annotated poses, a point cloud registration mission was launched to align the model point cloud and the scene point cloud using the annotated pose parameters. Furthermore, a category-level pose estimation network was trained using the annotated pose parameters, and its performance can directly reflect the accuracy of the annotation results. The experimental results show that the overlapped area between the annotated mask and the manually labeled mask is greater than 98%. Additionally, a 100% alignment rate can be achieved, meaning that the model point cloud can be aligned to any scene point cloud through the corresponding annotated pose parameters. These results demonstrate that the designed and implemented system in this paper can be used to sufficiently create a high-quality dataset for developing real pose estimation-related solutions. A solid data foundation can be provided on the basis of the proposed system for future research and application of deep learning models aimed at robotic grasping detection.

     

/

返回文章
返回