Empirical evaluation of feature selection and machine learning techniques to recommend clones for software refactoring

Manpreet Kaur, Dhavleesh Rattan, Madan Lal

Abstract


The article’s subject matter deals with the management of software clones. Software clones are duplicate code fragments that can exist in the same or different software files. Software clone detection and management has become a well-established research area. Software clones should be managed to minimize their ill-effects, as the presence of clones can increase the software’s maintenance cost and resource requirements. Refactoring is a commonly used technique for managing clones. A software clone detection tool can detect many clones from the software, but not all detected clones are suitable for refactoring. A developer needs a subset of detected clones that can be easily refactored. This study aims to suggest software clones for refactoring using machine learning techniques. This study evaluates the performance of fourteen machine-learning algorithms and investigates the influence of three feature selection methods on clone recommendation accuracy. The tasks to be solved are as follows: selecting appropriate features from datasets, developing machine learning-based models that can suggest suitable clones for refactoring, and selecting an effective machine learning and feature selection algorithm for recommending clones for refactoring. The methods used for feature selection are correlation, InfoGain, and ReliefF.  The study is conducted on datasets from six open-source software written in Java. The experimental results show that the Decision Tree and LogitBoost classifiers achieve the highest accuracy of 94.44 % on the Lucene dataset.  ReliefF yields the best performance among the feature selection methods, particularly when used with the Decision Tree algorithm. This study concludes that Random Committee, Random Forest, and Decision Tree perform best when paired with correlation, InfoGain, and ReliefF, respectively. Overall, the Decision Tree classifier, combined with the ReliefF feature selection method, delivers the highest average precision, recall, and F-score across datasets.

Keywords


Software clones; Clone management; Clone recommendation, Clone refactoring, feature selection, machine learning

Full Text:

PDF

References


Rattan, D., Bhatia, R., & Singh, M. Software clone detection: A systematic review. Information and Software Technology, 2013, vol. 55, no. 7, pp. 1165–1199. DOI: 10.1016/j.infsof.2013.01.008.

Roy, C. K., Cordy, J. R., & Koschke, R. Comparison and evaluation of code clone detection techniques and tools: A qualitative approach. Science of Computer Programming, 2009, vol. 74, no. 7, pp. 470–495. DOI: 10.1016/j.scico.2009.02.007.

Roy, C. K., Zibran, M. F., & Koschke, R. The Vision of Software Clone Management: Past, Present, and Future. Proceedings of the IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering, 2014, pp. 18–33. DOI: 10.1109/CSMR-WCRE.2014.6747168.

Sheneamer, A., & Kalita, J. A Survey of Software Clone Detection Techniques. International Journal of Computer Applications, 2016, vol. 137, no. 10, pp. 1–21. DOI: 10.5120/ijca2016908896.

Zibran, M. F. Analysis and visualization for clone refactoring. Proceedings of the 2015 IEEE 9th International Workshop on Software Clones (IWSC), 2015, pp. 47–48. DOI: 10.1109/IWSC.2015.7069889.

Tairas, R., & Gray, J. Clone maintenance through analysis and refactoring. Proceedings of the ACM SIGSOFT Symposium on Foundations of Software Engineering, 2008, pp. 29–32. DOI: 10.1145/1496653.1496661.

Mondal, M., Roy, C. K., & Schneider, K. A. A survey on clone refactoring and tracking. Journal of Systems and Software, 2020, vol. 159, article no. 110429. DOI: 10.1016/j.jss.2019.110429.

Duala-Ekoko, E., & Robillard, M. P. Clone tracker: Tool support for code clone management. Proceedings of the International Conference on Software Engineering, 2008, pp. 843–846. DOI: 10.1145/1368088.1368218.

Kaur, M., Rattan, D., & Lal, M. An Approach To Recommend Clones For Refactoring Using Machine Learning And Feature Selection. IOSR Journal of Computer Engineering, 2023, vol. 25, no. 6, pp. 62–64. DOI: 10.9790/0661-2506016264.

Chen, Z., Kwon, Y. W., & Song, M. Clone refactoring inspection by summarizing clone refactorings and detecting inconsistent changes during software evolution. Journal of Software: Evolution and Process, 2018, vol. 30, no. 1, pp. 1–24. DOI: 10.1002/smr.1951.

Alharbi, M. A comparative study of automated refactoring tools. IEEE Access, 2024, vol. 12, pp. 18764–18781. DOI: 10.1109/ACCESS.2024.3361314.

Alomar, E. A., & Mkaouer, M. W. Behind the intent of extract method refactoring. IEEE Transactions on Software Engineering, 2024, vol. 50, no. 1, pp. 668–694. DOI: 10.1109/TSE.2023.3345800.

Kalhor, S., Keyvanpour, M. R., & Salajegheh, A. A systematic review of refactoring opportunities by software antipattern detection. Automated Software Engineering, 2024, vol. 31, no. 1, article no. 42. DOI: 10.1007/s10515-024-00443-y.

Higo, Y., Kamiya, T., Kusumoto, S., & Inoue, K. Refactoring Support Based on Code Clone Analysis. Proceedings of the Product Focused Software Process Improvement, 5th International Conference (PROFES 2004), Kansai Science City, Japan, 2004, pp. 220–233. DOI: 10.1007/978-3-540-24659-6_16.

Higo, Y., Kamiya, T., Kusumoto, S., & Inoue, K. ARIES: Refactoring support tool for code clone. Proceedings of the International Conference on Software Engineering, 2005, pp. 53–56. DOI: 10.1145/1083292.1083306.

Higo, Y., Kusumoto, S., & Inoue, K. A metric-based approach to identifying refactoring opportunities for merging code clones in a Java software system. Journal of Software Maintenance and Evolution: Research and Practice, 2008, vol. 20, no. 6, pp. 435–461. DOI: 10.1002/smr.394.

Schulze, S., Kuhlemann, M., & Rosenmüller, M. Towards a refactoring guideline using code clone classification. Proceedings of the ACM International Conference, 2009, pp. 1–4. DOI: 10.1145/1636642.1636648.

Choi, E., Yoshida, N., Ishio, T., Inoue, K., & Sano, T. Extracting code clones for refactoring using combinations of clone metrics. Proceedings of the International Workshop on Software Clones (IWCS), 2011, pp. 7–13. DOI: 10.1145/1985404.1985407.

Mondal, M., Roy, C. K., & Schneider, K. A. Automatic identification of important clones for refactoring and tracking. Proceedings of the 2014 IEEE International Workshop on Source Code Analysis and Manipulation (SCAM), 2014, pp. 11–20. DOI: 10.1109/SCAM.2014.11.

Wang, W., & Godfrey, M. W. Recommending clones for refactoring using design, context, and history. Proceedings of the 4th IEEE International Conference on Software Maintenance and Evolution (ICSME), 2014, pp. 331–340. DOI: 10.1109/ICSME.2014.55.

Rongrong, S., Liping, Z., & Fengrong, Z. A Method for Identifying and Recommending Reconstructed Clones. Proceedings of the 2019 International Conference on Management Engineering, Software Engineering and Service Sciences (ICMESS), 2019, pp. 39–44.

Sheneamer, A. M. An Automatic Advisor for Refactoring Software Clones Based on Machine Learning. IEEE Access, 2020, vol. 8, pp. 124978–124988. DOI: 10.1109/ACCESS.2020.3006178.

Yue, R., Gao, Z., Meng, N., Xiong, Y., Wang, X., & Morgenthaler, J. D. Automatic clone recommendation for refactoring based on the present and the past. Proceedings of the 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME), 2018, pp. 115–126. DOI: 10.1109/ICSME.2018.00021.

Fanqi, M. Using self-organized mapping to seek refactorable code clone. Proceedings of the 2014 International Conference on Communication Systems and Network Technologies (CSNT), 2014, pp. 851–855. DOI: 10.1109/CSNT.2014.177.

Kaur, M., & Rattan, D. A systematic literature review on the use of machine learning in code clone research. Computer Science Review, 2023, vol. 47. DOI: 10.1016/j.cosrev.2022.100528.

Quradaa, F. H., Shahzad, S., & Almoqbily, R. S. A systematic literature review on the applications of recurrent neural networks in code clone research. Plos One, 2024, vol. 19, no. 2, article no. e0296858. DOI: 10.1371/journal.pone.0296858.

Idouglid, L., Tkatek, S., Elfayq, K., & Guezzaz, A. A novel anomaly detection model for the industrial internet of things using machine learning techniques. Radioelectronics and Computer Systems, 2024, vol. 2024, no. 1, pp. 143–151. DOI: 10.32620/reks.2024.1.12.




DOI: https://doi.org/10.32620/reks.2025.3.04

Refbacks

  • There are currently no refbacks.