Comparative analysis of the machine learning models determining COVID-19 patient risk levels

Kseniia Bazilevych, Olena Kyrylenko, Yurii Parfenyuk, Serhii Krivtsov, Ievgen Meniailov, Victoriya Kuznietcova, Dmytro Chumachenko

Abstract


The COVID-19 pandemic has posed unprecedented challenges to global healthcare systems, emphasizing the need for predictive tools for resource allocation and patient care. This study delves into the potential of machine learning models to predict the risk levels of COVID-19 patients using a comprehensive dataset. This study aimed to evaluate and compare the efficacy of three distinct machine learning methodologies – Bayesian Criterion, Logistic Regression, and Gradient Boosting – in predicting the risk associated with COVID-19 patients based on their symptoms, status, and medical history. This research is targeted at the process of patient state determination. The research subjects are machine learning methods for patient state determination. To achieve the aim of the research, the following tasks have been formulated: methods and models of the COVID-19 patients state determination should be analyzed; classification model of the patient state determination based on Bayes criterion should be developed; classification model of the patient state determination based on logistic regression should be developed; classification model of the patient state determination based on gradient boosting should be developed; the information system should be developed; the experimental study based on machine learning methods should be provided; and the results of the experimental study should be analyzed. Methods: using a dataset provided by the Mexican government, which encompasses over a million unique patients with 21 distinct features, we developed an information system in C# programming language. This system allows users to select their preferred method for risk calculation, offering a real-time decision-making tool for healthcare professionals. Results: All models demonstrated commendable accuracy levels. However, subtle differences in their performance metrics, such as sensitivity, precision, and the F1-score, were observed. The Gradient Boosting method slightly outperformed the other models in terms of overall accuracy. Conclusions: While each model showcased its merits, the choice of method should be based on the specific needs and constraints of the healthcare system. The Gradient Boosting method emerged as marginally superior in this study. This research underscores the potential of machine learning in enhancing pandemic response strategies, offering both scientific insights and practical tools for healthcare professionals.

Keywords


patient state determination; classification; machine learning; COVID-19; Bayes criterion; logistic regression; gradient boosting

Full Text:

PDF

References


Hu, B., Guo, H., Zhou, P., & Shi, Z.-L. Characteristics of SARS-CoV-2 and COVID-19. Nature Reviews Microbiology, 2020, vol. 19, iss. 19, pp. 1–14. DOI: 10.1038/s41579-020-00459-7.

Esakandari, H., Nabi-Afjadi, M., Fakkari-Afjadi, J., Farahmandian, N., Miresmaeili, S.-M., & Bahreini, E. A Comprehensive Review of COVID-19 Characteristics. Biological Procedures Online, 2020, vol. 22, iss. 1, article no. 19. DOI: 10.1186/s12575-020-00128-2.

Forchette, L., Sebastian, W., & Liu, T. A Comprehensive Review of COVID-19 Virology, Vaccines, Variants, and Therapeutics. Current Medical Science, 2021, vol 41, iss. 6, pp. 1037-1051. DOI: 10.1007/s11596-021-2395-1.

Tang, R., Jiang, J., Zhang, Y., & Luo, J. Open Government Data (OGD) sites and the sharing of country-specific real-time pandemic information: An investigation into COVID-19 datasets available on worldwide OGDs. Information Processing and Management, 2023, vol. 60, iss. 6, pp. 103489-103489. DOI: 10.1016/j.ipm.2023.103489.

Zakharchenko, O., Avramenko, R., Zakharchenko, A., Korobchuk, A., Fedushko, S., Syerov, Y., & Trach, O. Multifaceted Nature of Social Media Content Propagating COVID-19 Vaccine Hesitancy: Ukrainian Case. Procedia Computer Science, 2022, vol. 198, pp. 682–687. DOI: 10.1016/j.procs.2021.12.306.

Izonin, I., Tkachenko, R., Dronyuk, I., Tkachenko, P., Gregus, M., & Rashkevych, M. Predictive modeling based on small data in clinical medicine: RBF-based additive input-doubling method. Mathematical Biosciences and Engineering, 2021, vol. 18, iss. 3, pp. 2599–2613. DOI: 10.3934/mbe.2021132.

Badi, H., Badi, I., Moutaouakil, K. E., Khamjane, A., & Bahri, A. Sentiment analysis and prediction of polarity vaccines based on Twitter data using deep NLP techniques. Radioelectronic and Computer Systems, 2022, vol. 104, iss. 4, pp. 19–29. DOI: 10.32620/reks.2022.4.02.

Strilets, V., Donets, V., Ugryumov, M., Artiuch, S., Zelenskyi, R., & Goncharova, T. Agent-oriented data clustering for medical monitoring. Radioelectronic and computer systems, 2022, vol. 101, iss. 1, pp. 103–114. DOI: 10.32620/reks.2022.1.08.

Lukas, H., Xu, C., Yu, Y., & Gao, W. Emerging Telemedicine Tools for Remote COVID-19 Diagnosis, Monitoring, and Management. ACS Nano, 2020, vol. 14, iss. 12, pp. 16180–16193. DOI: 10.1021/acsnano.0c08494.

Arefiev, V., Kovalenko, G., Frant, M., Chumachenko, T., Polyvianna, Y., Pivnenko, S., Bolotin, V., Mayboroda, O., Solodiankin, O., Tarasov, O., Bezyemenni, M., Lyon, C., Redlinger, M., Sapachova, M., Mezhenskyi, A. A., Ducluzeau, A.-L., Bortz, E., Gerilovych, A., & Drown, D. M. Complete Genome Sequence of Salmonella enterica subsp. enterica Serovar Kottbus Strain Kharkiv, Isolated from a Commercial Pork Production Facility in Ukraine. Microbiology Resource Announcements, 2020, vol. 9, iss. 49, e01171-20. DOI: 10.1128/mra.01171-20.

Bazilevych, K., Krivtsov, S., & Butkevych, M. Intelligent Evaluation of the Informative Features of Cardiac Studies Diagnostic Data using Shannon Method. CEUR Workshop Proceedings, 2021, vol. 3003, pp. 65-75.

Meniailov, I., & Padalko, H. Application of Multidimensional Scaling Model for Hepatitis C Data Dimensionality Reduction. CEUR Workshop Proceedings, 2022, vol. 3348, pp. 33–43.

Radutniy, R., Nechyporenko, A., Alekseeva, V., Titova, G., Bibik, D., & Gargin, V. V. Automated Measurement of Bone Thickness on SCT Sections and Other Images. 2020 IEEE Third International Conference on Data Stream Mining & Processing (DSMP), 2020, pp. 222-226, DOI: 10.1109/dsmp47368.2020.9204289.

Aswathy, A. L., Anand, H. S., & Chandra, S. S. V. COVID-19 severity detection using machine learning techniques from CT-images. Evolutionary Intelligence, 2022, vol. 1, iss. 9, pp. 1423-1431. DOI: 10.1007/s12065-022-00739-6.

Yakovlev, S., Bazilevych, K., Chumachenko, D., Chumachenko, T., Hulianytskyi, L., Meniailov, I., & Tkachenko, A. The Concept of Developing a Decision Support System for the Epidemic Morbidity Control. CEUR Workshop Proceedings, 2020, vol. 2753, pp. 265–274.

Aggarwal, A., Chakradar, M., Bhatia, M. S., Kumar, M., Stephan, T., Gupta, S., Alsamhi, S. H., & Al-Dois, H. COVID-19 Risk Prediction for Diabetic Patients Using Fuzzy Inference System and Machine Learning Approaches. Journal of Healthcare Engineering, 2022, vol. 2022, pp. 1–10. DOI: 10.1155/2022/4096950.

Burdick, H., Lam, C., Mataraso, S., Lynn-Palevsky, A., Braden, G., Dellinger, R.P., McCoy, A., Vincent, J.-L., Green-Saxena, A., Barnes, G., Hoffman, J., Calvert, J., Pellegrini, E., & Das, R. Prediction of respiratory decompensation in Covid-19 patients using machine learning: The READY trial. Computers in Biology and Medicine, 2020, vol. 124, article no. 103949. DOI: 10.1016/j.compbiomed.2020.103949.

Gao, C.A., Markov, N. S., Stoeger, T., Pawlowski, A. E., Kang, M., Nannapaneni, P., Grant, R. A., Pickens, C., Walter, J. M., Kruser, J.M., Rasmussen, L. V., Schneider, D., Starren, J., Donnelly, H. K., Donayre, A., Luo, Y., Budinger, S., Wunderink, R. G., Misharin, A. V., & Singer, B. D. Machine learning links unresolving secondary pneumonia to mortality in patients with severe pneumonia, including COVID-19. Journal of Clinical Investigation, 2023, vol. 133, iss. 12, article no. e170682. DOI: 10.1172/jci170682.

Banoei, M. M., Dinparastisaleh, R., Zadeh, A. V., & Mirsaeidi, M. Machine-learning-based COVID-19 mortality prediction model and identification of patients at low and high risk of dying. Critical Care, 2021, vol. 25, iss. 1, article no. 328. DOI: 10.1186/s13054-021-03749-5.

Burdick, H., Lam, C., Mataraso, S., Siefkas, A., Braden, G., Dellinger, R. P., McCoy, A. J., Vincent, J.-L., Green-Saxena, A., Barnes, G. L., Hoffman, J., Calvert, J., Pellegrini, E., & Das, R. Is Machine Learning a Better Way to Identify COVID-19 Patients Who Might Benefit from Hydroxychloroquine Treatment? – The IDENTIFY Trial. Journal of Clinical Medicine, 2020, vol. 9, iss. 12, pp. 3834-3834. DOI: 10.3390/jcm9123834.

Mashtalir, V. P., Shlyakhov, V. V., & Yakovlev, S. V. Group Structures on Quotient Sets in Classification Problems. Cybernetics and Systems Analysis, 2014, vol. 50, iss. 4, pp. 507-518. DOI: 10.1007/s10559-014-9639-z.

Youssef, I. K., & Hassan, M. H. M. A Comparative Study for Some Mathematical Models of Epidemic Diseases with Application to Strategic Management. Applied Sciences, 2022, vol. 12, iss. 24, article no. 12639. DOI: 10.3390/app122412639.

Bazilevych, K., Butkevych, M., & Dotsenko, N. Cardiac Studies Diagnostic Data Informative Features Investigation based on Cumulative Frequency Analysis. CEUR Workshop Proceedings, 2022, vol. 3348, pp. 84–89.

Davidich, N., Chumachenko, I., Davidich, Y., Taisiia, H., Artsybasheva, N., & Tatiana, M. Advanced Traveller Information Systems to Optimizing Freight Driver Route Selection. 2020 13th International Conference on Developments in eSystems Engineering (DeSE), 2020, pp. 111-115. DOI: 10.1109/dese51703.2020.9450763.

Zanini, A., & Woodbury, A. D. Contaminant source reconstruction by empirical Bayes and Akaike’s Bayesian Information Criterion. Journal of Contaminant Hydrology, 2016, vol. 185-186, pp. 74–86. DOI: 10.1016/j.jconhyd.2016.01.006.

Krak, I., Kudin, H., Kasianiuk, V., & Efremov, M. Hyperplane Clustering of the Data in the Vector Space of Features Based on Pseudo Inversion Tools. CEUR Workshop Proceedings, 2021, vol. 3003, pp. 98-105.

Ying, J., Wang, Q., Xu, T., & Lu, Z. Diagnostic potential of a gradient boosting-based model for detecting pediatric sepsis. Genomics, 2021, vol. 113, iss. 1, pp. 874-883. DOI: 10.1016/j.ygeno.2020.10.018.

Ma, B., Meng, F., Yan, G., Yan, H., Chai, B., & Song, F. Diagnostic classification of cancers using extreme gradient boosting algorithm and multi-omics data. Computers in Biology and Medicine, 2020, vol. 121, article no. 103761. DOI: 10.1016/j.compbiomed.2020.103761.

Nizri, M. COVID-19 Dataset. 2023 [online] www.kaggle.com. Available at: https://www.kaggle.com/datasets/meirnizri/covid19-dataset (Accessed 23 May 2023).




DOI: https://doi.org/10.32620/reks.2023.3.01

Refbacks

  • There are currently no refbacks.