Abstract:
This paper proposes a reinforcement learning (RL)-based control framework utilizing the proximal policy optimization (PPO) algorithm to address compliance issues in cooperative transportation tasks for heterogeneous tri-robot systems. The focus is on enhancing motion coordination and force adaptability in three heterogeneous robots during collaborative object transportation. A high-fidelity simulation environment was first constructed in the CoppeliaSim robotic simulator, where the tri-robot with distinct kinematic and dynamic configurations was programmed to collaboratively manipulate a shared object. Comparative simulations were conducted between traditional force control methods and the proposed RL-based approach to evaluate the robot performance in trajectory tracking accuracy, motion smoothness, and system compliance. Under the RL control framework, the PPO algorithm was trained to optimize the robots’ joint actions by maximizing a reward function designed to penalize trajectory deviations, excessive contact forces, and abrupt velocity changes. The simulation results demonstrate that the RL-controlled system achieves remarkable improvements in vertical (
Z-axis) trajectory tracking precision. Specifically, the trajectory error of the object’s center of mass in the
Z-direction was reduced to 4.7% of that observed under conventional force control. Furthermore, Robot 2—selected as a representative agent owing to its central role in the task—exhibited significantly smoother motion characteristics under RL control. Its end-effector velocity variations in the horizontal (
X–
Y) plane were attenuated by 82% compared to force control, while angular velocity fluctuations in its primary rotational joint were reduced to 35% of the baseline values, indicating enhanced mechanical compliance and reduced oscillatory behavior. To validate the real-world applicability of the learned policies, a sim2real transfer methodology was implemented. The control strategies were deployed on a physical tri-robot platform comprising one six-degrees-of-freedom (DOF) industrial manipulator and two customized four-DOF collaborative robots, tasked with synchronously transporting a deformable payload. The experimental results agreed with simulation predictions: the RL-based controller maintained superior
Z-direction trajectory tracking performance, limiting errors to 5.4% of those under force control. Robot 2’s motion compliance showed further improvement in physical experiments, with its
X-direction velocity variations reduced to 20.7% of the force control benchmark. Critical joint-level analyses revealed that the angular velocity variations of Robot 2’s third joint—a pivotal component for vertical motion compensation—were suppressed to 35.2% of the force control values, confirming the RL controller’s ability to mitigate mechanical vibrations and adapt to dynamic payload interactions. The study also investigates the robustness of the RL framework to real-world uncertainties, including sensor noise, communication latency, and payload deformation. Despite these challenges, the RL controller maintained stable performance, achieving a 92% reduction in peak contact forces compared to force control during sudden payload shifts. Statistical analyses of motion data further indicated that the RL-based system reduced the standard deviation of inter-robot coordination errors by 76% and 68% in simulation and physical experiments, respectively, underscoring its consistency across domains. Both simulation and experimental findings conclusively demonstrate that the PPO-based RL framework not only surpassed traditional force control in precision and compliance but also successfully bridged the sim2real gap. The framework’s ability to learn adaptive policies in simulation and transfer them to physical robots with minimal fine-tuning highlights its potential for deployment in industrial applications requiring heterogeneous multi-robot collaboration. This work advances the field of compliant robotic control by providing a scalable, data-driven solution that harmonizes trajectory accuracy, motion smoothness, and real-world adaptability in complex cooperative tasks.