异构三机器人协同搬运的高柔顺性研究

张树忠; 齐春雨; 张弓; 苏佳鸿; 邱伟前; 阮玉镇

doi:10.13374/j.issn2095-9389.2024.12.16.002

摘要: 针对异构三机器人系统的协同搬运柔顺性问题，提出基于近端策略优化（Proximal policy optimization）的强化学习控制方法. 在CoppeliaSim机器人仿真器中建立了异构三机器人协同搬运的仿真环境，分别开展了力控制与强化学习控制的对比仿真. 仿真结果表明：强化学习控制下，物体质心的轨迹误差在Z方向上最优，仅为力控制的4.7%，机器人2的末端速度变化和其典型关节的角速度变化更为平滑. 采用sim2real的方法，将两种控制方法部署到三机器人协同搬运实验中. 实验结果表明：强化学习控制下，Z方向的物体轨迹跟踪误差同样最优，仅为力控制的5.4%. 机器人2在X方向上的速度变化仅为力控制的20.7%，其典型关节展现出更好的柔顺性，角速度变化仅为力控制下的35.2%. 仿真与实验结果表明：强化学习的控制效果更优，也具备从仿真到现实迁移的可行性.

Abstract: This paper proposes a reinforcement learning (RL)-based control framework utilizing the proximal policy optimization (PPO) algorithm to address compliance issues in cooperative transportation tasks for heterogeneous tri-robot systems. The focus is on enhancing motion coordination and force adaptability in three heterogeneous robots during collaborative object transportation. A high-fidelity simulation environment was first constructed in the CoppeliaSim robotic simulator, where the tri-robot with distinct kinematic and dynamic configurations was programmed to collaboratively manipulate a shared object. Comparative simulations were conducted between traditional force control methods and the proposed RL-based approach to evaluate the robot performance in trajectory tracking accuracy, motion smoothness, and system compliance. Under the RL control framework, the PPO algorithm was trained to optimize the robots’ joint actions by maximizing a reward function designed to penalize trajectory deviations, excessive contact forces, and abrupt velocity changes. The simulation results demonstrate that the RL-controlled system achieves remarkable improvements in vertical (Z-axis) trajectory tracking precision. Specifically, the trajectory error of the object’s center of mass in the Z-direction was reduced to 4.7% of that observed under conventional force control. Furthermore, Robot 2—selected as a representative agent owing to its central role in the task—exhibited significantly smoother motion characteristics under RL control. Its end-effector velocity variations in the horizontal (X–Y) plane were attenuated by 82% compared to force control, while angular velocity fluctuations in its primary rotational joint were reduced to 35% of the baseline values, indicating enhanced mechanical compliance and reduced oscillatory behavior. To validate the real-world applicability of the learned policies, a sim2real transfer methodology was implemented. The control strategies were deployed on a physical tri-robot platform comprising one six-degrees-of-freedom (DOF) industrial manipulator and two customized four-DOF collaborative robots, tasked with synchronously transporting a deformable payload. The experimental results agreed with simulation predictions: the RL-based controller maintained superior Z-direction trajectory tracking performance, limiting errors to 5.4% of those under force control. Robot 2’s motion compliance showed further improvement in physical experiments, with its X-direction velocity variations reduced to 20.7% of the force control benchmark. Critical joint-level analyses revealed that the angular velocity variations of Robot 2’s third joint—a pivotal component for vertical motion compensation—were suppressed to 35.2% of the force control values, confirming the RL controller’s ability to mitigate mechanical vibrations and adapt to dynamic payload interactions. The study also investigates the robustness of the RL framework to real-world uncertainties, including sensor noise, communication latency, and payload deformation. Despite these challenges, the RL controller maintained stable performance, achieving a 92% reduction in peak contact forces compared to force control during sudden payload shifts. Statistical analyses of motion data further indicated that the RL-based system reduced the standard deviation of inter-robot coordination errors by 76% and 68% in simulation and physical experiments, respectively, underscoring its consistency across domains. Both simulation and experimental findings conclusively demonstrate that the PPO-based RL framework not only surpassed traditional force control in precision and compliance but also successfully bridged the sim2real gap. The framework’s ability to learn adaptive policies in simulation and transfer them to physical robots with minimal fine-tuning highlights its potential for deployment in industrial applications requiring heterogeneous multi-robot collaboration. This work advances the field of compliant robotic control by providing a scalable, data-driven solution that harmonizes trajectory accuracy, motion smoothness, and real-world adaptability in complex cooperative tasks.

异构三机器人协同搬运的高柔顺性研究

High flexibility of heterogeneous tri-robot collaborative handling