DETERMINING THE PROBABILITY OF HEART DISEASE BASED ON DATA MINING METHODS

Ксения Алексеевна Базилевич, Евгений Сергеевич Меняйлов, Сергей Игоревич Горанина, Кирилл Андреевич Федулов

Abstract


In the modern world, when people suffer from various diseases, many experts are looking for ways and methods to treat and diagnose them. The solution to the problem, which lies in the limited possibilities of timely diagnosis, lies in the field of Data Mining methods. The first of the methods considered in the paper is the method of estimating logistic regression parameters based on the method of estimating odds and probabilities. The method is most advantageous to use for samples with a small number of parameters. On a sample with a large number of parameters, this method ceases to be relevant and loses its accuracy. The second method considered is a method for estimating the probability of a disease using a Bayesian classifier. It is more profitable to use this method on samples with a large number of parameters, since the method does not lose its accuracy with an increase in the number of variables, however, despite this number of signs should be constant. In the case of a variable number of attributes, the use of such a classifier in an explicit form leads to the loss of its covariance. The paper also discusses a method for estimating logistic regression parameters based on the maximum likelihood method. This method has long been one of the best for solving problems of this type. This is due to several reasons: the relevance and the possibility of application in various fields, as well as the possibility of implementing the method on modern productive computers. The disadvantage of the method is its complexity. As a result of the study, methods were determined, analyzed and implemented that allow to estimate the probability of the patient's disease with the given parameters. The obtained data will allow to more accurately assess the state of health in the conditions of constantly changing diagnostic parameters.


Keywords


classification; probability estimation; logistic regression; Bayes classifier; odds ration method; maximum likelihood method.

References


P. Baldi and S. Brunak, Bioinformatics: The Machine Learning Approach (2nd ed.) [Text] / MIT Press, 2001.

M. W. Berry, Survey of Text Mining: Clustering, Classification, and Retrieval [Text] / Springer, 2003.

J. B. MacQueen, Some Methods for classification and Analysis of Multivariate Observations, Proceedings of 5-th Berkeley Symposium on Mathematical Statistics and Probability [Text] / Berkeley, University of California Press, 1967 – pg. 281-297.

M. Deshpande, Automated approaches for classifying structures. In Proc. 2002Workshop onDataMining in Bioinformatics (BIOKDD’02) [Text] / M. Deshpande, M. Kuramochi, G. Karypis Edmonton. Canada, 2002 – pg. 11–18.

W. Frakes, Information Retrieval: Data Structures and Algorithms [Text] / W. Frakes, R. Baeza-Yates. Prentice Hall, 1992.

International Agency for Research on Cancer [Electronic resource] / Lyon, France, 2013, Access Mode: http://globocan.iarc.fr.

Cancer control: early detection. WHO Guide for effective programmes. [Electronic resource] / Geneva: World Health Organization; 2007, Access Mode: http://apps.who.int/iris/bitstream/10665/43743/1/9241547338_eng.pdf.

Rubin G, The expanding role of primary care in cancer control [Text] / Rubin G, Berendsen A, Crawford SM, Dommett R, Earle C. Lancet Oncol, 2015 – pg.

–72.

Bowers, N.L., Actuarial mathematic [Text] / Bowers, N.L., Gerber, H.U., Hickman, J.C., Jones, D.A., Nesbitt, C.J. Schaumburg, Illinois, USA by Society Of Actuaries, (1997) – pg 621.

Norman T. J. Bailey, The mathematical approach to biology and medicine norman [Text] / Norman T. J. Bailey. Wiley, 1967 – pg. 296.

Tom I.Je. Tehnologija analiza medicinskih dannyh statisticheskimi I nejrosetevymi metodami [Text] / I.Je. Tom, O.V. Krasko, N.A. Novoselova, M.P. Potapnev, T.A. Uglova // Iskusstvennyj intellekt. – 2004. – №2. – pg. 372-376.

Supplemental Excel Data Sets [Electronic resource] / Access Mode: http://mercury.webster.edu/aleshunas/Data%20Sets/Supplemental%20Excel%20Data%20Sets.htm.

Cox D.R. Analysis of Binary Data [Text] / D.R.Cox, E.J. Snell.– Chapman and Hall / CRC, 1989. – pg. 240.

Hosmer-Lemeshow Test [Electronic resource] / Access Mode: http://www.real-statistics.com/logistic-regression/hosmer-lemeshow-test/.




DOI: https://doi.org/10.32620/oikit.2019.83.14

Refbacks

  • There are currently no refbacks.