基于双层优化的规模可扩展海上无人搜救方法

Bi-level optimization-based scalable unmanned maritime search and rescue

  • 摘要: 针对海上多无人机和无人船协同搜救路径规划复杂度随无人机/船和落水人员数量指数增长的问题,本文提出了一种结合任务分配和多智能体强化学习的规模可扩展的搜救方法. 首先,依据海上通信受限和落水人员漂流的特性,构建异构无人机/船搜救仿真环境,建立以无人机/船速度为变量,以最大化搜救成功率为目标的优化问题. 然后,提出“任务分配–无人搜救”双层优化框架,将大规模搜救问题拆分为任务分配子问题和小规模搜救子问题,采用多智能体近端策略优化算法(Multi-agent proximal policy optimization, MAPPO)在“6机搜救80人”场景中训练无人机/船策略,实现了小规模异构多机协同搜救. 最后,提出了一种基于空间信息聚类的任务分配算法 (Spatial information clustering, SIC). 通过将SIC和MAPPO嵌入双层优化框架得到算法SIC–MAPPO,在大规模搜救场景中复用小规模场景训练所得策略. 仿真结果表明:海面平均流速为0.17 m·s−1时,在“6机搜救80人”场景中训练MAPPO无人机策略,再将所得策略用于“60机搜救800人”场景中的搜救成功率为13.44%. 而所提SIC–MAPPO算法将该搜救成功率提升至93.36%,且中间规模场景中搜救成功率均保持85%以上,呈现对搜救规模的可扩展性. 与蚁群算法和贪心算法的对比实验表明,所提SIC–MAPPO算法具备对搜救规模和海洋环境的相对适应性. 在海面平均流速为0.85 m·s−1的“60机搜救800人”场景中,SIC–MAPPO达到80.975%的搜救成功率,比蚁群算法高2.425个百分点,比贪心算法高0.95个百分点.

     

    Abstract: Owing to their advantages of low cost, high mobility, and convenient deployment, unmanned aerial vehicles (UAV) are increasingly being used for multiple tasks that require rapid responses, automatic handling, or collaborative operations. Maritime search and rescue (MSR) is a potential scenario in which UAVs can be utilized. Existing studies focus primarily on single-UAV tasks such as multimodal feature fusion, marine-object detection, and path planning. The control of UAVs in MSR presents several challenges. First, in the ocean, no communication base station exists, and the complex electromagnetic environment causes severe channel fading, which adversely affects information exchange between UAVs. Second, because of ocean currents, the positions of the rescue targets change continuously, thus worsening information uncertainty. Third, the scale of an MSR mission is typically unknown in advance; however, the computational complexity of UAV path-planning algorithms increases exponentially with the mission scale. Hence, a simulation platform is established and a flexible bi-level optimization framework combining mission assignment and the UAV’s MSR policy is proposed. Considering the partially observable nature of the rescue scenario, the simulation platform inducts three types of heterogeneous UAVs responsible for search, route, and rescue. Owing to a detailed design of the system model and rescue-mission procedure, the platform reliably simulates the MSR environment. Based on this and with insights into the Markov nature of the MSR mission, multi-agent reinforcement learning (MARL) is utilized for the UAV’s MSR policy. The two main frameworks of MARL are centralized training and decentralized execution (CTDE) and decentralized training and decentralized execution (DTDE). To identify a suitable algorithm, three variants of the proximal policy optimization (PPO) algorithm encompassing DTDE and CTDE frameworks are trained and evaluated: independent proximal policy optimization (IPPO, multi-agent proximal policy optimization (MAPPO), and heterogeneous proximal policy optimization (HAPPO). To enhance the scalability of the trained policy, a mission-assignment strategy based on spatial information clustering (SIC) is proposed. By exploiting the K-means clustering algorithm in multiple rounds on the spatial information of the rescue targets and UAVs, each UAV is assigned a focusing sequence and buffer sequence at the start of the MSR mission. Using the proposed SIC algorithm and bi-level optimization framework, the scalability of all three PPO algorithms in the decentralized execution phase can be enhanced by constructing the proposed SIC–PPO algorithm. In SIC–PPO, a UAV can focus on information pertaining to the unique components of the rescue targets in its own focusing sequence, which is adaptively replenished by its buffer sequence. The SIC–PPO strategy mitigates the overlap coverage of multiple UAVs and increases the search-and-rescue efficiency. Simulation results show that all the three PPO variants converge under common ocean conditions and at a small rescue scale. Compared with the MAPPO algorithm, the proposed SIC–MAPPO algorithm increases the search-and-rescue success rate from 13.44% to 93.36% when applying MAPPO UAV strategies trained in the “6 drones and 80 people” scenario to the “60 drones and 800 people” scenario. This demonstrates the scalability of the SIC–PPO algorithm across different mission scales, which allows large-scale MSR missions to be conducted efficiently. Compared with ant colony optimization and greedy algorithms, the proposed SIC–MAPPO algorithm demonstrates scalability and stability in the ocean environment. Under a mean ocean current velocity of 0.85 m·s−1 in the “60 drones and 800 people” scenario, the SIC–MAPPO algorithm achieves a rescue success rate of 80.975%, which is 2.425 and 0.95 percentage points higher than those afforded by the ant colony optimization and greedy algorithms, respectively.

     

/

返回文章
返回