WANG Sen, WANG Shizhen, LÜ Ningning, JING Youbo. Bi-level optimization-based scalable unmanned maritime search and rescueJ. Chinese Journal of Engineering. DOI: 10.13374/j.issn2095-9389.2025.08.11.001
Citation: WANG Sen, WANG Shizhen, LÜ Ningning, JING Youbo. Bi-level optimization-based scalable unmanned maritime search and rescueJ. Chinese Journal of Engineering. DOI: 10.13374/j.issn2095-9389.2025.08.11.001

Bi-level optimization-based scalable unmanned maritime search and rescue

  • Owing to their advantages of low cost, high mobility, and convenient deployment, unmanned aerial vehicles (UAV) are increasingly being used for multiple tasks that require rapid responses, automatic handling, or collaborative operations. Maritime search and rescue (MSR) is a potential scenario in which UAVs can be utilized. Existing studies focus primarily on single-UAV tasks such as multimodal feature fusion, marine-object detection, and path planning. The control of UAVs in MSR presents several challenges. First, in the ocean, no communication base station exists, and the complex electromagnetic environment causes severe channel fading, which adversely affects information exchange between UAVs. Second, because of ocean currents, the positions of the rescue targets change continuously, thus worsening information uncertainty. Third, the scale of an MSR mission is typically unknown in advance; however, the computational complexity of UAV path-planning algorithms increases exponentially with the mission scale. Hence, a simulation platform is established and a flexible bi-level optimization framework combining mission assignment and the UAV’s MSR policy is proposed. Considering the partially observable nature of the rescue scenario, the simulation platform inducts three types of heterogeneous UAVs responsible for search, route, and rescue. Owing to a detailed design of the system model and rescue-mission procedure, the platform reliably simulates the MSR environment. Based on this and with insights into the Markov nature of the MSR mission, multi-agent reinforcement learning (MARL) is utilized for the UAV’s MSR policy. The two main frameworks of MARL are centralized training and decentralized execution (CTDE) and decentralized training and decentralized execution (DTDE). To identify a suitable algorithm, three variants of the proximal policy optimization (PPO) algorithm encompassing DTDE and CTDE frameworks are trained and evaluated: independent proximal policy optimization (IPPO, multi-agent proximal policy optimization (MAPPO), and heterogeneous proximal policy optimization (HAPPO). To enhance the scalability of the trained policy, a mission-assignment strategy based on spatial information clustering (SIC) is proposed. By exploiting the K-means clustering algorithm in multiple rounds on the spatial information of the rescue targets and UAVs, each UAV is assigned a focusing sequence and buffer sequence at the start of the MSR mission. Using the proposed SIC algorithm and bi-level optimization framework, the scalability of all three PPO algorithms in the decentralized execution phase can be enhanced by constructing the proposed SIC–PPO algorithm. In SIC–PPO, a UAV can focus on information pertaining to the unique components of the rescue targets in its own focusing sequence, which is adaptively replenished by its buffer sequence. The SIC–PPO strategy mitigates the overlap coverage of multiple UAVs and increases the search-and-rescue efficiency. Simulation results show that all the three PPO variants converge under common ocean conditions and at a small rescue scale. Compared with the MAPPO algorithm, the proposed SIC–MAPPO algorithm increases the search-and-rescue success rate from 13.44% to 93.36% when applying MAPPO UAV strategies trained in the “6 drones and 80 people” scenario to the “60 drones and 800 people” scenario. This demonstrates the scalability of the SIC–PPO algorithm across different mission scales, which allows large-scale MSR missions to be conducted efficiently. Compared with ant colony optimization and greedy algorithms, the proposed SIC–MAPPO algorithm demonstrates scalability and stability in the ocean environment. Under a mean ocean current velocity of 0.85 m·s−1 in the “60 drones and 800 people” scenario, the SIC–MAPPO algorithm achieves a rescue success rate of 80.975%, which is 2.425 and 0.95 percentage points higher than those afforded by the ant colony optimization and greedy algorithms, respectively.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return