Intelligent cooperative exploration path planning for UAV swarm in an unknown environment
-
-
Abstract
Owing to the increasing complexity of task execution and a wide range of variability in environmental conditions, a single unmanned aerial vehicle (UAV) is insufficient to meet practical mission requirements. Multi-UAV systems have vast potential for applications in areas such as search and rescue. During search and rescue missions, UAVs acquire the location of the target to be rescued and subsequently plan a path that circumvents obstacles and leads to the target. Traditional path-planning algorithms require prior knowledge of obstacle distribution on the map, which may be difficult to obtain in real-world missions. To address the issue of traditional path-planning algorithms that rely on prior map information, this paper proposes a reinforcement learning-based approach for the collaborative exploration of multiple UAVs in unknown environments. First, a Markov decision process is employed to establish a game model and task objectives for the UAV cluster, considering the characteristics of collaborative exploration tasks and various constraints of UAV clusters. To maximize the search and rescue success rate, UAVs must satisfy dynamic and obstacle-avoidance constraints during mission execution. Second, a reinforcement learning-based method for the collaborative exploration of multiple UAVs is proposed. The multiagent soft actor–critic (MASAC) algorithm is used to iteratively train the UAVs’ collaborative exploration strategies. The actor network generates UAV actions, while the critic network evaluates the quality of these strategies. To enhance the algorithm’s generalization capability, training is conducted in randomly generated map environments. To avoid UAVs being obstructed by concave obstacles, a breadth-first search algorithm is used to calculate rewards based on the path distance between the UAVs and targets rather than the linear distance. During the exploration process, each UAV continuously collects and shares the map information with all other UAVs. They make individual action decisions based on the environment and information obtained from other UAVs, and the mission is considered successful if multiple UAVs hover above the target. Finally, a virtual simulation platform for algorithm validation is developed using the Unity game engine. The proposed algorithm is implemented using PyTorch, and bidirectional interaction between the Unity environment and Python algorithm is achieved through the ML-Agents (Machine learning agents) framework. Comparative experiments are conducted on the virtual simulation platform to compare the proposed algorithm with a non-cooperative single-agent SAC algorithm. The proposed method exhibits advantages in terms of task success rate, task completion efficiency, and episode rewards, validating the feasibility and effectiveness of the proposed approach.
-
-