Safe deep reinforcement learning method for guaranteed compliance with physical constraints in autonomous energy systems of critical infrastructure (a case study of healthcare facilities)

Maksym Kushnarov

Abstract


The study examines the complex processes of intelligent management of energy resilience in modern healthcare facilities during critical situations, large-scale failures, and prolonged outages of the external centralized power supply. The aim is to develop a comprehensive mathematical model and a Safe Deep Reinforcement Learning (Safe DRL) method that ensures guaranteed compliance with the strict physical and operational constraints of hospital energy systems, even during the intensive training phase of a neural network agent. The objectives are: to formalize in detail the decision-making procedure in an energy system in detail by transitioning to the paradigm of Constrained Markov Decision Processes (CMDP); to develop an innovative mathematical model featuring the implementation of a specialized safety layer based on Lyapunov functions; and to ensure high resilience and autonomy of the system through the implementation of a decentralized Edge-Fog data processing architecture. The methods used include: the theory of Constrained Markov Decision Processes (CMDP), deep reinforcement learning methods based on the Actor-Critic architecture, the mathematical apparatus of Lyapunov stability theory for the analytical correction of actions, and methods of simulation modeling for complex dynamic energy systems. The following results were obtained. In the course of the study, a Safe DRL method was proposed and substantiated, which integrates a Lyapunov-based projection directly into the training loop for the immediate correction of the agent’s control actions. This makes it possible to ensure strict theoretical guarantees of maintaining the required State of Charge (SoC) of battery systems and to prevent critical violations of energy system parameters beyond established safety limits, which is crucial for patient life-support. The effectiveness of the proposed approach was confirmed by a series of numerical experiments in the specialized environment, HospitalEnergyEnv. Under a full blackout scenario, the agent demonstrated adaptability and high accuracy in resource management without any violation of the established physical limits during the entire process of autonomous operation. Conclusions. The scientific novelty of the obtained results lies in the following: the existing optimization model for building energy management systems (BEMS) has been improved by introducing an analytical safety projection mechanism, which minimizes the risks of emergency equipment shutdown during the adaptation of artificial intelligence algorithms; further development of decentralized control methods for critical infrastructure based on Edge-Fog computing has been achieved, which significantly increases system fault tolerance in the event of a loss of connection with the global network and ensures obtaining quasi-optimal solutions in high-dimensional problems. The practical value of this work lies in the potential to create highly reliable autonomous energy systems for critical infrastructure facilities.

Keywords


Deep Reinforcement Learning; Lyapunov functions; energy resilience; microgrids; smart hospital; physical constraints; Edge-Fog computing

References


Fernando, N., Shrestha, S., Loke, S. W., & Lee, K. On edge-fog-cloud collaboration and reaping its benefits: a heterogeneous multi-tier edge computing architecture. Future Internet, 2025, vol. 17, no. 1, article no. 22. DOI: 10.3390/fi17010022.

Kushnarov, M. O., & Shostak, I. V. Hibrydna modelʹ adaptyvnoyi priorytezatsiyi enerhetychnykh resursiv medychnoho zakladu v umovakh krytychnoho defitsytu [Hybrid Model of Adaptive Prioritization of Energy Resources in a Medical Facility under Critical Deficit Conditions]. Vidkryti informatsiyni ta komp'yuterni intehrovani tekhnolohiyi – Open Information and Computer Integrated Technologies, 2026, no. 107. pp. 241-256 DOI: 10.32620/oikit.2026.107.16. (In Ukrainian).

Su, T., Wu, T., Zhao, J., Scaglione, A., & Xie, L. Safe deep reinforcement learning for microgrid energy management in distribution networks with leveraged spatial-temporal perception. IEEE Transactions on Smart Grid, 2023, vol. 14, no. 2, pp. 154-165. DOI: 10.1109/TSG.2023.3243142.

Neto, J. R. T., Capron, B. D. O., Secchi, A. R., & Chanona, A. D. R. Safe reinforcement learning with Lyapunov-based constraints for control of an unstable reactor. Systems and Control Transactions, 2025, vol. 4. DOI: 10.69997/sct.137298.

Ghasem, M., Moosavi, A. H., & Ebrahimi, D. A comprehensive survey of reinforcement learning: from algorithms to practical challenges. arXiv, 2025. DOI: 10.48550/arXiv.2411.18892.

Su, T., Wu, T., Zhao, J., Scaglione, A., & Xie, L. A review of safe reinforcement learning methods for modern power systems. Proceedings of the IEEE, 2025, vol. 113, no. 3. pp. 213-255. DOI: 10.1109/JPROC.2025.3584656.

Gu, S., Yang, L., Du, Y., Chen, G., Walter, F., & Wang, J. A review of safe reinforcement learning: methods, theories, and applications. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, vol. 46, no. 12. pp. 11216-11235. DOI: 10.1109/TPAMI.2024.3457538.

Terven, J. R. Deep reinforcement learning: a chronological overview and methods. AI, 2025, vol. 6, no. 3. Article no. 46. DOI: 10.3390/ai6030046.

Prudencio, R. F., Maximo, M. R. O. A., & Colombini, E. L. A survey on offline reinforcement learning: taxonomy, review, and open problems. IEEE Transactions on Neural Networks and Learning Systems, 2024, vol. 35. pp. 10237-10257. DOI: 10.1109/TNNLS.2023.3250269.

Cheng, R., Orosz, G., Murray, R. M., & Burdick, J. W. End-to-End Safe Reinforcement Learning through Barrier Functions for Safety-Critical Continuous Control Tasks. Proceedings of the AAAI Conference on Artificial Intelligence, 2019, vol. 33, pp. 3387-3395. DOI: 10.1609/aaai.v33i01.33013386.

National Fire Protection Association. NFPA 99: Health Care Facilities Code. Quincy: NFPA, 2021. Available at: https://blog.koorsen.com/overview-of-nfpa-99-health-care-facilities-code (accessed 12.01.2026)

El-deep, S. E., Abohany, A. A., Sallam, K. M., & Abd El-Mageed, A. A. A comprehensive survey on impact of applying various technologies on the internet of medical things. Artificial Intelligence Review, 2025, vol. 58, article no. 86. DOI: 10.1007/s10462-024-11063-z.

Selim, A., Zhao, J., Dong, J., & Lian, J. Safe deep reinforcement learning for robust frequency and voltage-constrained networked microgrid restoration. IEEE Transactions on Industry Applications, 2026, vol. 62, no. 2, pp. 3635-3647. DOI: 10.1109/TIA.2025.3626472.

Cocault, P., Bertrand, S., & Piet-Lahanier, H. Safe deep reinforcement learning control with self-learned neural Lyapunov functions and state constraints. Proceedings of the 10th International Conference on Control, Decision and Information Technologies (CoDIT), Valletta, Malta, 2024. DOI: 10.1109/CoDIT62066.2024.10708548.

Rajagopal, D., & Subramanian, P. K. T. AI augmented edge and fog computing for Internet of Health Things (IoHT). PeerJ Computer Science, 2025, vol. 11. article no. e2431. DOI: 10.7717/peerj-cs.2431.

Arrieta, A. B., Díaz-Rodríguez, N., Del Ser, J., & et al. Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Information Fusion, 2020, vol. 58. pp. 82-115. DOI: 10.1016/j.inffus.2019.12.012.

Yang, S., & Zhu, Y. Offline reinforcement learning for microgrid voltage regulation. arXiv, 2025. DOI: 10.48550/arXiv.2505.09920.




DOI: https://doi.org/10.32620/aktt.2026.2.12