面向工业场景的无人机时空众包资源分配

UAV spatiotemporal crowdsourcing resource allocation based on deep reinforcement learning

  • 摘要: 无人机时空众包资源分配是工业物联网能源管理中的重要任务之一. 尽管现有方法考虑了联合反映时间敏感性和公平性的信息新鲜度指标,但忽略了无人机禁飞区和窃听者对数据新鲜度的影响. 本文提出了一种基于深度强化学习的无人机时空众包资源分配方法,在考虑无人机禁飞区约束和对窃听者发送干扰信号以保障数据安全的情况下,最小化平均信息新鲜度和物联网设备能耗,从而得到最优无人机轨迹、发射干扰信号功率和物联网发射功率. 然而,无人机时空众包中的资源分配复杂且存在挑战,主要表现为决策变量类型多且与考虑服务质量要求的系统性能指标关系复杂. 本文将该问题建模为马尔可夫决策过程并使用先进的深度强化学习算法求解该问题,即软演员–评论家(SAC)算法. 在多无人机场景下验证了所提出算法在解决无人机时空众包资源分配任务中的有效性和正确性. 另外,SAC算法相较于其他两种先进的深度强化学习算法,即深度确定性策略梯度算法和双延迟深度确定性策略梯度算法,具有更快的收敛速度和更优的解. 最后,分析了最优无人机数目的选择方案.

     

    Abstract: Spatiotemporal crowdsourcing involves the use of various Internet of Things (IoT) devices distributed across industrial environments to collect and transmit spatiotemporal data related to industrial operations. Unmanned aerial vehicles (UAVs) play a crucial role in further collecting this data from IoT devices, especially in spatiotemporal crowdsourcing tasks. In the realm of industrial IoT energy management, allocating spatiotemporal crowdsourcing resources to UAVs poses substantial challenges. Traditional approaches to this problem have focused on optimizing the Age of Information (AoI) to ensure timely and equitable data updates. However, these methods often overlook critical operational constraints such as UAV no-fly zones and the risk of data interception by eavesdroppers. These issues can adversely affect the freshness and integrity of the information being gathered and transmitted. To address these shortcomings, this paper presents a novel deep reinforcement learning-based framework for UAV spatiotemporal crowdsourcing resource allocation. This approach aims to minimize the average AoI across the network while also reducing the energy consumption of IoT devices. It incorporates spatial constraints imposed by UAV no-fly zones and actively manages the transmission of jamming signals to mitigate the threat posed by eavesdroppers, thus ensuring data security. However, the complexity of allocating spatiotemporal crowdsourcing resources to UAVs is notable owing to numerous decision variables, which increase linearly with the duration of the service. Furthermore, the relationship between performance metrics and decision variables is intricate, requiring adherence to quality of service requirements. This problem is formalized as a Markov decision process (MDP), providing a structured approach to model the decision-making scenario faced by UAVs in a dynamic environment. To solve this MDP, we employ the soft actor critic (SAC) algorithm, an advanced deep reinforcement learning method known for its sample efficiency and stability. The SAC algorithm is adept at handling the continuous action spaces typical of UAV flight paths and power control problems, making it particularly well-suited for our application. We rigorously test our proposed methods in scenarios involving multiple UAVs, demonstrating the algorithm’s effectiveness in managing the spatiotemporal allocation of resources. Our results show that the SAC algorithm achieves faster convergence speed and better solutions than existing state-of-the-art methods, such as the twin delayed deep deterministic policy gradient (TD3) and the deep deterministic policy gradient (DDPG) algorithms. Furthermore, the paper delves into the strategic selection of the optimal number of UAVs to balance the trade-offs between coverage, energy consumption, and operational efficiency. By analytically and empirically examining the impact of the UAV fleet size on system performance, we provide insights into configuring UAV networks to achieve optimal outcomes in terms of AoI, energy management, and security. In conclusion, our research introduces a robust and intelligent framework for UAV resource allocation. The demonstrated efficacy of the SAC algorithm in this context paves the way for its future application in other domains where secure, efficient, and intelligent resource management is paramount.

     

/

返回文章
返回