Dimensionality cutback and deep learning algorithms efficacy as to the breast cancer diagnostic dataset

Gennady Chuiko, Denys Honcharov

Abstract


Breast cancer is a significant threat because it is the most frequently diagnosed form of cancer and one of the leading causes of mortality among women. Early diagnosis and timely treatment are crucial for saving lives and reducing treatment costs. Various medical imaging techniques, such as mammography, computed tomography, histopathology, and ultrasound, are contemporary approaches for detecting and classifying breast cancer. Machine learning professionals prefer Deep Learning algorithms when analyzing substantial medical imaging data. However, the application of deep learning-based diagnostic methods in clinical practice is limited despite their potential effectiveness. Deep Learning methods are complex and opaque; however, their effectiveness can help balance these challenges. The research subjects. Deep Learning algorithms implemented in WEKA software and their efficacy on the Wisconsin Breast Cancer dataset. Objective. Significant cutback of the dataset's dimensionality without losing the predictive power. Methods. Computer experiments in the WEKA medium provide preprocessing, supervised, and unsupervised Deep Learning for full and reduced datasets with estimations of their efficacy. Results. Triple sequential filtering notably reduced the dimensionality of the initial dataset: from 30 attributes up to four. Unexpectedly, all three Deep Learning classifiers implemented in WEKA (Dl4jMlp, Multilayer Perceptron, and Voted Perceptron) showed the statistically same performance. In addition, the performance was statistically the same for full and reduced datasets. For example, the percentage of correctly classified instances was in range (95.9-97.7) with a standard deviation of less than 2.5 %. Two clustering algorithms that use neurons (Self Organized Map, SOM, and Learning Vector Quantization, LVQ) have also shown similar results. The two clusters in all datasets are not well separated, but they accurately represent both preassigned classes, with the Fowlkes–Mallow indexes (FMI) ranging from 0.81 to 0.99. Conclusion. The results indicate that the dimensionality of the Wisconsin Breast Cancer dataset, which is increasingly becoming the "gold standard" for diagnosing Malignant-Benign tumors, can be significantly reduced without losing predictive power. The Deep Learning algorithms in WEKA deliver excellent performance for both supervised and unsupervised learning, regardless of whether dealing with full or reduced datasets.

Keywords


breast cancer; Deep Learning algorithms; WEKA; Wisconsin Breast Cancer dataset; diagnosing Malignant-Benign tumors

Full Text:

PDF

References


Orlova, N. M., Tonkovyd, O. B., Palamar, I. V., Klimas, L. A., Shkondin, S. V., & Tkach, V. S. Medyko-statystychnyi analiz zakhvoriuvanosti, smertnosti ta svoiechasnosti vyiavlennia raku molochnoi zalozy v Ukraini [Medical and statistical analysis of incidence, mortality, and timeliness of breast cancer diagnosis in Ukraine]. Visnyk Vinnytskoho natsionalnoho medychnoho universytetu – Rep. of Vinnytsia Nation. Med. Univ., 2024, vol. 28(1), pp. 113-120. DOI: 10.31393/reports-vnmedical-2024-28(1)-20. (In Ukrainian)

Zielonke, N, Kregting, L. M., Heijnsdijk, E. A. M., Veerus, P., Heinävaara, S., McKee, M., Kok, I. M. C. M., Koning, H. J., & Ravesteyn, N. T. The potential of breast cancer screening in Europe. Int J Cancer, 2021, vol. 148, iss. 2, pp. 406-418. DOI: 10.1002/ijc.33204.

Nusrat Mohi ud din, Rayees Ahmad Dar, Muzafar Rasool, & Assif Assad. Breast cancer detection using deep learning: Datasets, methods, and challenges ahead, Computers in Biology and Medicine, 2022, vol. 149, article no. 106073, DOI: 10.1016/j.compbiomed.2022.106073.

Tolstoluzka, O., & Telezhenko, D., Development and Training of LSTM Models for Controlling Virtual Distributed Systems Using TensorFlow and Keras. Radioelectronic and Computer Systems, 2024, no. 1(109), pp. 27-37. DOI: 10.32620/reks.2024.3.02.

Zwitter, M., & Soklic, M. Breast Cancer. UCI Machine Learning Repository. Institute of Oncology, University Medical Center, Ljubljana, Yugoslavia, 1998. DOI: 10.24432/C51P4M.

Chuiko, G. P., & Yaremchuk, O. M. Handling the Breast Cancer Recurrence Data for a More Reliable Forecast. Kompiuterni systemy ta informatsiini tekhnolohii – Computer Systems and Information Technologies, 2023, vol. (4), pp. 10-15. DOI: 10.31891/csit-2023-4-2.

Wolberg, W., Mangasarian, O., Street, N., & Street, W. Breast Cancer Wisconsin (Diagnostic). UCI Machine Learning Repository, 1995. DOI: 10.24432/C5DW2B.

BreakHis. Breast Cancer Histopathological Database (BreakHis). Available at: https://www.kaggle.com/datasets/ambarish/breakhis (accessed 12.06.2024).

Spanhol, F. A., Oliveira, L. S., Petitjean, C., & Heutte, L. A. Dataset for Breast Cancer Histopathological Image Classification. IEEE Transactions on Biomedical Engineering, 2016, vol. 63, iss. 7, pp. 1455-1462. DOI: 10.1109/TBME.2015.2496264.

Nasser, M., & Yusof, U. K. Deep Learning Based Methods for Breast Cancer Diagnosis: A Systematic Review and Future Direction. Diagnostics, 2023, vol. 13, iss. 1, pp. 1-26. DOI: 10.3390/diagnostics13010161.

Nemade, V., Pathak, S., & Dubey, A. K. A Systematic Literature Review of Breast Cancer Diagnosis Using Machine Intelligence Techniques. Archives of Computational Methods in Engineering, 2022, vol. 29, pp. 4401-4430. DOI: 10.1007/s11831-022-09738-3.

Shashmi, K. Curse of Dimensionality – A "Curse" to Machine Learning. Towards Data Science. Available at: https://towardsdatascience.com/curse-of-dimensionality-a-curse-to-machine-learning-c122ee33bfeb (accessed 12.06.2024).

Chuiko, G. P., Honcharov, D. S., Dvornik, O. V., Krainyk, Ya. M., Darnapuk, Ye. O., & Yaremchuk, O. M. Attribute Selection, Outliers Impact Study, and Visualization within Breast Cancer Detection. 2023 IEEE 13th International Conference on Electronics and Information Technologies (ELIT), Lviv, Ukraine, 2023, pp. 1-5. DOI: 10.1109/ELIT61488.2023.10310922.

Rosenblatt, F. The Perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 1958, vol. 65, iss. 6, pp. 386-408. DOI: 10.1037/h0042519.

Freund, Y, & Schapire, R. E. Large Margin Classification Using the Perceptron Algorithm. Mach Learn, 1999, vol. 37, iss. 3, pp. 277–296. DOI: 10.1023/A:1007662407062.

Lang, S., Bravo-Marquez, F., Beckham, C., Hall, M., & Frank, E. WekaDeeplearning4j: A Deep Learning package for Weka based on Deeplearning4j. Knowledge-Based Syst., 2019, vol. 178, pp. 48-50. DOI: 10.1016/j.knosys.2019.04.013.

Nova, D., & Estevez, P. A review of learning vector quantization classifiers. Neural Computing and Applications, 2014, vol. 25, pp. 511-524. DOI: 10.1007/s00521-013-1535-3.

Wehrens, R., & Kruisselbrink, J. W. Flexible Self-Organizing Maps in kohonen 3.0. Journal of Statistical Software, 2018, vol. 87, pp. 1-18. DOI: 10.18637/JSS.V087.I07.

Running an experiment using clusterers. Available at: https://waikato.github.io/weka-wiki/experimenter/running_an_experiment_using_clusterers/ (accessed 12.06.2024).

Chicco, D., & Jurman, G. A statistical comparison between Matthews correlation coefficient (MCC), prevalence threshold, and Fowlkes–Mallows index. J Biomed Inform., 2023, vol. 144, article no. 104426. DOI: 10.1016/j.jbi.2023.104426.

Xu, D., & Tian, Y. A. Comprehensive Survey of Clustering Algorithms. Ann. Data. Sci. 2015, vol. 2, pp. 165-193. DOI: 10.1007/s40745-015-0040-1.

Calinski-Harabasz Index – Cluster Validity indices. Available at: https://www.geeksforgeeks.org/calinski-harabasz-index-cluster-validity-indices-set-3/ (accessed 12.06.2024).

Fowlkes, E. B., & Mallows, C. L. A Method for Comparing Two Hierarchical Clusterings. Journal of the American Statistical Association, 1983, vol. 78, iss. 383, pp. 553-569. DOI: 10.2307/2288117.

Ajayan, S. S. J. , Reddy, N. V. U., Devasenapati, S. B., & Rebelli, S. Analysis of COVID-19 CT Chest Image Classification using Dl4jMlp Classifier and Multilayer Perceptron in WEKA Environment. Curr Med Imaging Former Curr Med Imaging Rev., 2023, vol. 20, pp. 1-7. DOI: 10.2174/1573405620666230417090246.




DOI: https://doi.org/10.32620/reks.2024.4.08

Refbacks

  • There are currently no refbacks.