Agent-oriented data clustering for medical monitoring
Abstract
Medical data processing is one of the priority machine learning areas. Usually, data obtained in the process of medical patient monitoring are complex and have a different nature. Solving the problem of clustering, classification, or forecasting problem these data requires the creation of new methods or improvement of existing methods to improve the decision accuracy and effectiveness. The classical clustering approaches and the c-means fuzzy clustering method were analyzed. Based on the multiagent systems theory, it is proposed to use in the c-means method the separate rules for selecting elites when forming clusters and selecting the best of them in accordance with the chosen intra-cluster distance measures. The result of solving such a problem is the number of clusters, as well as the number of elements in them. The method quality was tested on Fisher iris data set using three measures of intra-cluster distance: Mahalanobis distance, Mahalanobis distance considering the membership function, and Kullbak-Leibler entropy. The highest accuracy of 98% was obtained for the distance measured by the Kullbak-Leibler entropy. Therefore, this measure was chosen to solve the clustering problem of medical monitoring data for prostate disease. Medical monitoring data were divided into four classes of patient states: “healthy persons”, “non-metastatic patients”, “metastatic patients” and “hormone-resistant patients”. The accuracy of clustering according to medical data was 95,6%. In addition to accuracy, the confusion matrix, ROC- and LF-curves were used to assess the method quality. The minimum value of the ROC-curve was 0.96 for Fisher's irises and 0.95 for medical monitoring data, which characterizes the high quality of the proposed clustering method. The loss function value is also quite small (-0.056 and -0.0176 for each considered data set), which means that the optimal cluster number and the distribution of data over them are obtained. Based on the obtained results analysis, the proposed method can be recommended for use in medical information and diagnostic decision support systems for clustering monitoring data.
Keywords
Full Text:
PDFReferences
Berka, P., Rauch, S., Zighed, D. Data mining and medical knowledge management: Cases and applications. N.Y., Herskey, 2009. 464 p. DOI: 10.4018/978-1-60566-218-3.
Karahoca, A. Data mining applications in engineering and medicine. London, IntechOpen, 2012. 338 p. DOI: 10.5772/2616.
Kountchev, R., Iantovics, B. Advances in intelligent analysis of medical data and decision support systems. Springer, 2013. 247 p. DOI: 10.1007/978-3-319-00029-9.
Giannopoulou, E. G. Data Mining in Medical and Biological Research. London, IntechOpen, 2008. 332 p. DOI: 10.5772/95.
Bodyanskiy, Ye., Deineko, A., Pliss, I., Chala, O. Adaptive Probabilistic Neuro-Fuzzy System and its Hybrid Learning in Medical Diagnostics Task. The Open Bioinformatics Journal, 2021, vol. 14, pp. 123-129. DOI: 10.2174/18750362021140100123.
Starenkiy, V, Artiukh, S., Ugryumov, M., Strilets, V., Chernish, S., Chumechenko, D. A Method for Assessing the Risks of Complications in Chemoradiation Treatment of Squamous Cell Carcinoma of the Head and Neck. The Open Bioinformatics Journal, 2021, vol. 14, pp. 138-143. DOI: 10.2174/18750362021140100138.
Schlesinger, M., Hlavac, V. Ten lectures on statistical and structural pattern recognition. Springer, Dordrecht, 2002. 522 p. DOI: 10.1007/978-94-017-3217-8.
Wishart, D. Exploiting the graphical user interface in statistical software: the next generation. Interface '98. Computing Science and Statistics, 1998, no. 30, pp. 257-263.
Charu, C. Aggarwal., Chandan, K. Reddy (ed.). Data clustering: algorithms and applications. CRC Press, Taylor & Francis Group, 2014. 622 p.
Manning, Ch. D., Raghavan, P., Schutze, H. Introduction to Information Retrieval. Cambridge University Press, 2008. 506 p.
Sneath, P. H. A., Sokal, R. R. Numerical Taxonomy: The Principles and Practice of Numerical Classification. San Francisco, W.H. Freeman and Company, 1973. 573 p.
Giordani, P., Ferraro, M. B., Martella, F. Non-Hierarchical Clustering. An Introduction to Clustering with R. Behaviormetrics: Quantitative Approaches to Human Behavior, 2020, vol. 1, pp. 75-109. DOI: 10.1007/978-981-13-0553-5_3.
Krasilnikov, P., Marti, J.-J. I., Arnold, R., Shoba, S. A Handbook of Soil Terminology, Correlation and Classification. London, Earthscan, 2009. 449 p.
Wang, Sh., Sun, Y., Bao, Z. On the Efficiency of K-Means Clustering: Evaluation, Optimization, and Algorithm Selection. Proceedings of the VLDB Endowment, 2020, vol. 14, pp. 163–175. DOI: 10.14778/3425879.3425887.
Tulyakova, N. O., Trofimchuk, A. N. Lokal'no-adaptivnaya fil'tratsiya nestatsionarnogo shuma v dlitel'nykh elektrokardiograficheskikh signalakh [Locally adaptive filtering of non-stationary noise in long-term electrocardiographic signals]. Radioelektronni i komp'uterni sistemi – Radioelectronic and computer systems, 2020, no. 4(96), pp. 16-33. DOI: 10.32620/reks.2020.4.02.
MacKay, David. Chapter 20. An Example Inference Task: Clustering. Information Theory, Inference and Learning Algorithms. Cambridge University Press, 2003, pp. 284–292.
Umargono, E., Suseno, J. E., S.K., V. K-Means Clustering Optimization using the Elbow Method and Early Centroid Determination Based-on Mean and Median. Proceedings of the International Conferences on Information System and Technology, 2018, pp. 234-240. DOI: 10.5220/0009908402340240.
Kinlaw, W. B., Kritzman, M. P., Turkington, D. A New Index of the Business Cycle. Macroeconomics: Prices, 2020. 30 p. DOI: 10.2139/ssrn.3521300.
Gaidyshev, I. G. Analiz i obrabotka dannykh. Spetsialnyi spravochnik [Analysis and data processing: a special reference book]. Sankt Peterburg, Peter, 2001. 752 p.
Finn, V. K., Shesternikova O. P. The Heuristics of Detection of Empirical Regularities by JSM Reasoning. Automatic Documentation and Mathematical Linguistics, 2018, vol. 52, Issue 5, pp. 215–247. DOI: 10.3103/S0005105518050023.
Bakumenko, N., Strilets, V., Ugryumov, M. Application of the C-Means Fuzzy Clustering Method for the Patient’s State Recognition Problems in the Medicine Monitoring Systems. CEUR Workshop Proceedings of 3rd International Conference on Computational Linguistics and Intelligent Systems, COLINS 2019, 2019, vol. I, pp. 218-227. Available at: https://www.researchgate.net/publication/338819685 (accessed 15.02.2021).
Winkler, R., Klawonn, F., Kruse, R. Problems of Fuzzy c-Means Clustering and Similar Algorithms with High Dimensional Data Sets. Challenges at the Interface of Data Analysis, Computer Science, and Optimization, 2012, pp. 79-87. DOI: 10.1007/978-3-642-24466-7_9.
Amruthnath, N., Gupta, T. Fault Class Prediction in Unsupervised Learning using Model-Based Clustering Approach. 2018 International Conference on Information and Computer Technologies (ICICT), 2018, pp. 5-12. DOI: 10.13140/RG.2.2.22085.14563.
Askari, S. Fuzzy C-Means clustering algorithm for data with unequal cluster sizes and contaminated with noise and outliers: Review and development. Expert Systems with Applications, 2020, vol. 165, article no. 113856. DOI: 10.1016/j.eswa.2020.113856.
Zarinbala, M., Zarandia, Fazel M. H., Turksen, I.B. Relative entropy fuzzy c-means clustering. Information Sciences, 2014, vol. 260, pp. 74-97. DOI: 10.1016/j.ins.2013.11.004.
Menga, Yinfeng., Liangb, Jiye., Caob, Fuyuan., He, Yijun. A new distance with derivative information for functional k-means clustering algorithm. Information Sciences, 2018, vol. 463–464, pp. 166-185. DOI: 10.1016/j.ins.2018.06.035.
Siddique, A. B., Arif, R. B., Khan, M. M. R., Ashrafi, Z. Implementation of Fuzzy C-Means and Possibilistic C-Means Clustering Algorithms, Cluster Tendency Analysis and Cluster Validation. ArXiv e-Journal, 2019. 8 p. Available at: https://arxiv.org/abs/1809.08417v3 (accessed 27.02.2021).
Nielsen, F. On the Jensen–Shannon Symmetrization of Distances Relying on Abstract Means. Entropy, 2019, vol. 21, iss. 5, article no. 485. DOI: 10.3390/e21050485.
Møllersen, K., Dhar, S., Godtliebsen, F. On Data-Independent Properties for Density-Based Dissimilarity Measures in Hybrid Clustering. Applied Mathematics, 2016, vol. 7, no. 15, pp. 1674-1706. DOI: 10.4236/am.2016.715143.
DOI: https://doi.org/10.32620/reks.2022.1.08
Refbacks
- There are currently no refbacks.