Exploring the possibilities of MADDPG for UAV swarm control by simulating in Pac-Man environment

Artem Novikov, Sergiy Yakovlev, Ivan Gushchin

Abstract


This paper explores the application of the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) for model training to control UAV swarms in dynamic and adversarial scenarios. Using a modified Pac-Man environment, Pac-Man represents a target UAV, and Ghosts represents the UAV swarm that counteracts it. The grid-based representation of Pac-Man mazes is used as an abstraction of a two-dimensional terrain model, which serves as a plane of pathways with obstacles that correspond to the UAV flight conditions at a certain altitude. The proposed approach provides a clear discretization of space, simplifying pathfinding, collision avoidance, and the planning of reconnaissance or interception routes by combining decentralized local autonomy with centralized training, which enables UAVs to coordinate effectively and quickly adapt to changing conditions. This study evaluates the performance of MADDPG-trained model-controlled adversaries against heuristic navigation strategies, such as A* and Breadth-First Search (BFS). Traditional Rule-Based Pursuit and Prediction Algorithms inspired by the behaviors of Blinky and Pinky ghosts from the classic Pac-Man game are included as benchmarks to assess the impact of learning-based methods. The purpose of this study was to determine the effectiveness of MADDPG-trained models in enhancing UAV swarm control by analyzing its adaptability and coordination capabilities in adversarial environments by computer modeling in simplified missions-like 2D environments. Experiments conducted across varying levels of terrain complexity revealed that MADDPG-trained model demonstrated superior adaptability and strategic coordination compared to the rule-based methods. Ghosts controlled by a model trained via MADDPG significantly reduce the success rate of Pac-Man agents, particularly in highly constrained environments, emphasizing the potential of learning-based adversarial strategies in UAV applications such as urban navigation, defense, and surveillance. Conclusions. MADDPG is a promising robust framework for training models to control UAV swarms, particularly in adversarial settings. This study highlights its adaptability and ability to outperform traditional rule-based methods in dynamic and complex environments. Future research should focus on comparing the effectiveness of MADDPG-trained models with multi-agent algorithms, such as Expectimax, Alpha-Beta Pruning, and Monte Carlo Tree Search (MCTS), to further understand the advantages and limitations of learning-based approaches compared with traditional decision-making methods in collaborative and adversarial UAV operations. Additionally, the exploration of 3D implementations, integrating maze height decomposition and flight restrictions, as well as incorporating cybersecurity considerations and real-world threats like anti-drone systems and electronic warfare, will enhance the robustness and applicability of these methods in realistic UAV scenarios.

Keywords


multi-agent reinforcement learning; navigation; adversarial UAV strategies; computer modelling

Full Text:

PDF

References


Fesenko, H., Illiashenko, O., Kharchenko, V., Leichenko, K., Sachenko, A., & Scislo, L. Methods and Software Tools for Reliable Operation of Flying LiFi Networks in Destruction Conditions. Sensors, 2024, vol. 24, no. 17, article no. 5707. DOI: 10.3390/s24175707.

Leichenko, K., Fesenko, H., Kharchenko, V., & Illiashenko, O. Deployment of a UAV Swarm-Based LiFi Network in the Obstacle-Ridden Environment: Algorithms of Finding the Path for UAV Placement. Radioelectronic and Computer Systems, 2024, vol. 2024, no. 1, pp. 176-195. DOI: 10.32620/reks.2024.1.14.

Tahir, M. A., Mir, I., & Islam, T. U. A Review of UAV Platforms for Autonomous Applications: Comprehensive Analysis and Future Directions. IEEE Access, 2023, vol. 11, pp. 52540-52554. DOI: 10.1109/ACCESS.2023.3273780.

Javaid, S., Saeed, N., Qadir, Z., Fahim, H., He, B., Song, H., & Bilal, M. Communication and Control in Collaborative UAVs: Recent Advances and Future Trends. IEEE Transactions on Intelligent Transportation Systems, 2023, vol. 24, no. 6, pp. 5719-5739. DOI: 10.1109/TITS.2023.3248841.

Xu, D., & Chen, G. Autonomous and Cooperative Control of UAV Cluster with Multi-Agent Reinforcement Learning. The Aeronautical Journal, 2022, vol. 126, no. 1300, pp. 932-951. DOI: 10.1017/aer.2021.112.

Moon, J., Papaioannou, S., Laoudias, C., Kolios, P., & Kim, S. Deep Reinforcement Learning Multi-UAV Trajectory Control for Target Tracking. IEEE Internet of Things Journal, 2021, vol. 8, no. 20, pp. 15441-15455. DOI: 10.1109/JIOT.2021.3073973.

Wang, L., Wang, K., Pan, C., Xu, W., Aslam, N., & Hanzo, L. Multi-Agent Deep Reinforcement Learning-Based Trajectory Planning for Multi-UAV Assisted Mobile Edge Computing. IEEE Transactions on Cognitive Communications and Networking, 2021, vol. 7, no. 1, pp. 73-84. DOI: 10.1109/TCCN.2020.3027695.

Brotee, S., Kabir, F., Razzaque, M. A., Roy, P., Hassan, M. R., & Hassan, M. M. Optimizing UAV-UGV Coalition Operations: A Hybrid Clustering and Multi-Agent Reinforcement Learning Approach for Path Planning in Obstructed Environment. ArXiv, 2024. Available at: https://arxiv.org/abs/2401.01481 (accessed 22 November 2024).

Ouahouah, S., Bagaa, M., Prados-Garzon, J., & Taleb, T. Deep-Reinforcement-Learning-Based Collision Avoidance in UAV Environment. IEEE Internet of Things Journal, 2022, vol. 9, no. 6, pp. 4015-4030. DOI: 10.1109/JIOT.2021.3118949.

Seid, A. M., Boateng, G. O., Mareri, B., Sun, G., & Jiang, W. Multi-Agent DRL for Task Offloading and Resource Allocation in Multi-UAV Enabled IoT Edge Network. IEEE Transactions on Network and Service Management, 2021, vol. 18, no. 4, pp. 4531-4547. DOI: 10.1109/TNSM.2021.3096673.

Bayerlein, H., Theile, M., Caccamo, M., & Gesbert, D. Multi-UAV Path Planning for Wireless Data Harvesting With Deep Reinforcement Learning. IEEE Open Journal of the Communications Society, 2021, vol. 2, pp. 1171-1187. DOI: 10.1109/OJCOMS.2021.3081996.

Mechan, F., Bartonicek, Z., Malone, D., & Lees, R. S. Unmanned Aerial Vehicles for Surveillance and Control of Vectors of Malaria and Other Vector-Borne Diseases. Malaria Journal, 2023, vol. 22, article no. 23. Available at: https://doi.org/10.1186/s12936-022-04414-0.

Zahínos, R., Abaunza, H., Murillo, J. I., Trujillo, M. A., & Viguria, A. Cooperative Multi-UAV System for Surveillance and Search & Rescue Operations Over a Mobile 5G Node. 2022 International Conference on Unmanned Aircraft Systems (ICUAS), Dubrovnik, Croatia, 2022. IEEE, pp. 1016-1024. DOI: 10.1109/ICUAS54217.2022.9836167

Khan, A., Gupta, S., & Gupta, S. K. Emerging UAV Technology for Disaster Detection, Mitigation, Response, and Preparedness. Journal of Field Robotics, 2022, vol. 39, no. 6, pp. 905-955. DOI: 10.1002/rob.22075.

Samvelyan, M., Paglieri, D., Jiang, M., & Rocktäschel, T. Multi-Agent Diagnostics for Robustness via Illuminated Diversity. ArXiv, 2024. Available at: https://arxiv.org/abs/2401.13460 (accessed 22 November 2024).

Salem, N., Haneya, H., Balbaid, H., & Asrar, M. Exploring the Maze: A Comparative Study of Path Finding Algorithms for PAC-Man Game. 2024 21st Learning and Technology Conference (L&T), Jeddah, Saudi Arabia, 2024. IEEE, pp. 92-97. DOI: 10.1109/LT60077.2024.10469459.

Causa, F., & Fasano, G. Navigation-aware Path Planning for Multiple UAVs in Urban Environment. 2020 AIAA/IEEE 39th Digital Avionics Systems Conference (DASC), San Antonio, TX, USA, 2020, pp. 1-10. DOI: 10.1109/DASC50938.2020.9256724.

Beatty, E., & Dong, E. Multi-agent reinforcement learning for UAV sensor management. Proc. SPIE 12544, Open Architecture/Open Business Model Net-Centric Systems and Defense Transformation 2023, 125440H. DOI: 10.1117/12.2669269.

Kouzeghar, M., Song, Y., Meghjani, M., & Bouffanais, R. Multi-Target Pursuit by a Decentralized Heterogeneous UAV Swarm using Deep Multi-Agent Reinforcement Learning. 2023 IEEE International Conference on Robotics and Automation (ICRA), London, United Kingdom, 2023, pp. 3289-3295. DOI: 10.1109/ICRA48891.2023.10160919.

Iwatani, T. Toru Iwatani, 1986 PacMan Designer. Programmers At Work, 2010. Available at: https://programmersatwork.wordpress.com/toru-iwatani-1986-pacman-designer/ (accessed 22 November 2024).

Birch, C. Understanding Pac-Man Ghost Behavior. GameInternals, 2010. Available at: https://gameinternals.com/understanding-pac-man-ghost-behavior (accessed 22 November 2024).

Gulati, A., Soni, S., & Rao, S. Interleaving Fast and Slow Decision Making. 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi'an, China, 2021, IEEE, pp. 1535-1541. DOI: 10.1109/ICRA48506.2021.9561562.

Russell, L., Goubran, R., & Kwamena, F. Emerging Urban Challenge: RPAS/UAVs in Cities. 2019 15th International Conference on Distributed Computing in Sensor Systems (DCOSS), Santorini, Greece, 2019, IEEE, pp. 546-553. DOI: 10.1109/DCOSS.2019.00103.

Isik, O. K., Hong, J., Petrunin, I., & Tsourdos, A. Integrity Analysis for GPS-Based Navigation of UAVs in Urban Environment. Robotics, 2020, vol. 9, no. 3, article no. 66. DOI: 10.3390/robotics9030066.

Zou, Y. General Pacman AI: Game Agent With Tree Search, Adversarial Search and Model-Based RL Algorithms. 2021 2nd International Conference on Big Data & Artificial Intelligence & Software Engineering (ICBASE), Zhuhai, China, 2021, IEEE, pp. 253-260 DOI: 10.1109/ICBASE53849.2021.00053.

Everitt, T., & Hutter, M. Analytical Results on the BFS vs. DFS Algorithm Selection Problem. Part I: Tree Search. Advances in Artificial Intelligence. AI 2015. Lecture Notes in Computer Science 9457, Springer, Cham, 2015, pp. 187-198. DOI: 10.1007/978-3-319-26350-2_14.

Panigrahi, P. K., & Tripathy, H. K. Low Complexity Graph Based Navigation and Path Finding of Mobile Robot Using BFS. Proceedings of the 2nd International Conference on Perception and Machine Intelligence (PerMIn '15). Association for Computing Machinery. New York, NY, USA, 2015, pp. 189-195. DOI: 10.1145/2708463.2709068.

Navya, P., & Ranjith, R. Analysis of Path Planning Algorithms for Service Robots in Hospital Environment. 2021 12th International Conference on Computing Communication and Networking Technologies (ICCCNT), Kharagpur, India, 2021, IEEE, pp. 1-6. DOI: 10.1109/ICCCNT51525.2021.9579474.

Holte, R. C. Common Misconceptions Concerning Heuristic Search. Symposium on Combinatorial Search, 2010, vol. 1, no. 1. DOI: 10.1609/SOCS.V1I1.18160.

Park, J. K., & Kim, J. Collision Avoidance Method for UAV Using A* Search Algorithm. Advances in Intelligent, Interactive Systems and Applications. IISA 2018. Advances in Intelligent Systems and Computing. Springer, Cham, 2019, pp. 285-297. DOI: 10.1007/978-3-030-02804-6_25.

Esakki, B., Marreddy, G., Ganesh, M. S., & Elangovan, E. Simulation and Experimental Approach for Optimal Path Planning of UAV Using A* and MEA* Algorithms. International Journal of Simulation, Multi-disciplinary Design and Optimization, 2021, vol. 12, article no. 24. DOI: 10.1051/smdo/2021024.

Zammit, C., & Van Kampen, E.-J. Comparison of A* and RRT in Real–Time 3D Path Planning of UAVs. AIAA SciTech 2020 Forum, Orlando, FL, USA, 2020. DOI: 10.2514/6.2020-0861.

Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, P., & Mordatch, I. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments. ArXiv, 2017. Available at: https://arxiv.org/abs/1706.02275 (accessed 22 November 2024).

Jin, Y., Liu, Q., Shen, L., & Zhu, L. Deep Deterministic Policy Gradient Algorithm Based on Convolutional Block Attention for Autonomous Driving. Symmetry, 2021, vol. 13, no. 6, article no. 1061. DOI: 10.3390/sym13061061.

Wang, Z., Wan, R., Gui, X., & Zhou, G. Deep Reinforcement Learning of Cooperative Control with Four Robotic Agents by MADDPG. 2020 International Conference on Computer Engineering and Intelligent Control (ICCEIC), Chongqing, China, 2020, IEEE, pp. 287-290. DOI: 10.1109/ICCEIC51584.2020.00061.

Miyazaki, K., Matsunaga, N., & Murata, K. Formation Path Learning for Cooperative Transportation of Multiple Robots Using MADDPG. In 2021 21st International Conference on Control, Automation and Systems (ICCAS), Jeju, Republic of Korea, 2021. IEEE, pp. 1619-1623. DOI: 10.23919/ICCAS52745.2021.9649891.

Bachiri, K., Yahyaouy, A., Gualous, H., Malek, M., Bennani, Y., Makany, P., & Rogovschi, N. Multi-Agent DDPG Based Electric Vehicles Charging Station Recommendation. Energies, 2023, vol. 16, no. 16, article no. 6067. DOI: 10.3390/en16166067.

Ding, R., Xu, Y., Gao, F., & Shen, X. Trajectory Design and Access Control for Air–Ground Coordinated Communications System With Multiagent Deep Reinforcement Learning. IEEE Internet of Things Journal, 2022, vol. 9, no. 8, pp. 5785-5798. DOI: 10.1109/JIOT.2021.3062091.

Veprytska, O. Y., & Kharchenko, V. Analysis of AI Powered Attacks and Protection of UAV Assets: Quality Model-Based Assessing Cybersecurity of Mobile System for Demining. 5th International Workshop on Intelligent Information Technologies and Systems of Information Security. IntelITSIS’2024, Khmelnytskyi, Ukraine, 2024.

Illiashenko, O., Kharchenko, V., Babeshko, I., Fesenko, H., & Di Gian-domenico, F. Security-Informed Safety Analysis of Autonomous Transport Systems Considering AI-Powered Cyberattacks and Protection. Entropy, 2023, vol. 25, no. 8, article no. 1123. DOI: 10.3390/e25081123.

Skorobohatko, S., Fesenko, H., Kharchenko, V., & Yakovlev, S. Architecture and Reliability Models of Hybrid Sensor Networks for Environmental and Emergency Monitoring Systems. Cybernetics and Systems Analysis, 2024, vol. 60, pp. 293–304. DOI: 10.1007/s10559-024-00670-x.

Leichenko, K., Fesenko, H., Borges, J., & Kharchenko, V. Search for the Shortest Route Considering Physical Obstacles: Method of Controlled Waterfall, Tool, and Application. 10th International Conference on Desert Systems and Technologies (DESSERT), Athens, Greece, 2023, pp. 1-5. DOI: 10.1109/DESSERT61349.2023.10416479.

Chen, L., Liang, H., Pan, Y., & Li, T. Human-in-the-Loop Consensus Tracking Control for UAV Systems via an Improved Prescribed Performance Approach. IEEE Transactions on Aerospace and Electronic Systems, 2023, vol. 59, no. 6, pp. 8380-8391. DOI: 10.1109/TAES.2023.3304283.




DOI: https://doi.org/10.32620/reks.2025.1.21

Refbacks

  • There are currently no refbacks.