Ensemble machine learning approaches for fake news classification

Halyna Padalko; Vasyl Chomko; Sergiy Yakovlev; Dmytro Chumachenko

doi:10.32620/reks.2023.4.01

Ensemble machine learning approaches for fake news classification

Halyna Padalko, Vasyl Chomko, Sergiy Yakovlev, Dmytro Chumachenko

Abstract

In today’s interconnected digital landscape, the proliferation of fake news has become a significant challenge, with far-reaching implications for individuals, institutions, and societies. The rapid spread of misleading information undermines the credibility of genuine news outlets and threatens informed decision-making, public trust, and democratic processes. Recognizing the profound relevance and urgency of addressing this issue, this research embarked on a mission to harness the power of machine learning to combat fake news menace. This study develops an ensemble machine learning model for fake news classification. The research is targeted at spreading fake news. The research subjects are machine learning methods for misinformation classification. Methods: we employed three state-of-the-art algorithms: LightGBM, XGBoost, and Balanced Random Forest (BRF). Each model was meticulously trained on a comprehensive dataset curated to encompass a diverse range of news articles, ensuring a broad representation of linguistic patterns and styles. A distinctive feature of the proposed approach is the emphasis on token importance. By leveraging specific tokens that exhibited a high degree of influence on classification outcomes, we enhanced the precision and reliability of the developed models. The empirical results were both promising and illuminating. The LightGBM model emerged as the top performer among the three, registering an impressive F1-score of 97.74% and an accuracy rate of 97.64%. Notably, all three of the proposed models consistently outperformed several existing models previously documented in academic literature. This comparative analysis underscores the efficacy and superiority of the proposed ensemble approach. In conclusion, this study contributes a robust, innovative, and scalable solution to the pressing challenge of fake news detection. By harnessing the capabilities of advanced machine learning techniques, the research findings pave the way for enhancing the integrity and veracity of information in an increasingly digitalized world, thereby safeguarding public trust and promoting informed discourse.

Keywords

fake news; classification; misinformation; disinformation; balanced random forest; XGBoost; LightGBM; WELFake; machine learning

Full Text:

PDF

References

Vraga, E. K., & Bode, L. Defining Misinformation and Understanding its Bounded Nature: Using Expertise and Evidence for Describing Misinformation. Political Communication, 2020, vol. 37, iss. 1, pp.136–144. DOI: 10.1080/10584609.2020.1716500.

Ó Fathaigh, R., Helberger, N., & Appelman, N. The perils of legally defining disinformation. Internet Policy Review, 2021, vol. 10, iss. 4. 26 p. DOI: 10.14763/2021.4.1584.

van der Linden, S. Misinformation: susceptibility, spread, and interventions to immunize the public. Nature Medicine, 2022, vol. 28, iss. 3, pp. 460-467. DOI: 10.1038/s41591-022-01713-6.

Lazer, D. M. J., Baum, M. A., Benkler, Y., Berinsky, A. J., Greenhill, K. M., Menczer, F., Metzger, M. J., Nyhan, B., Pennycook, G., Rothschild, D., Schudson, M., Sloman, S. A., Sunstein, C. R., Thorson, E. A., Watts, D. J., & Zittrain, J. L. The Science of Fake News. Science, 2018, vol. 359, iss. 6380, pp. 1094-1096. DOI: 10.1126/science.aao2998.

Zakharchenko, A., Peráček, T., Fedushko, S., Syerov, Y., & Trach, O. When Fact-Checking and ‘BBC Standards’ Are Helpless: ‘Fake Newsworthy Event’ Manipulation and the Reaction of the ‘High-Quality Media’ on It. Sustainability, 2021, vol. 13, iss. 2, article no. 573. DOI: 10.3390/su13020573.

Pennycook, G., & Rand, D. G. The psychology of fake news. Trends in Cognitive Sciences, 2021, vol. 25, iss. 5, pp. 388-402. DOI: 10.1016/j.tics.2021.02.007.

Kim, B., Xiong, A., Lee, D., & Han, K. A systematic review on fake news research through the lens of news creation and consumption: Research efforts, challenges, and future directions. PLOS ONE, 2021, vol. 16, iss. 12, article no. e0260080. DOI: 10.1371/journal.pone.0260080.

Thompson, R. C., Joseph, S., & Adeliyi, T. T. A Systematic Literature Review and Meta-Analysis of Studies on Online Fake News Detection. Information, 2022, vol. 13, iss. 11, article no. 527. DOI: 10.3390/info13110527.

Santos, F. C. C. Artificial Intelligence in Automated Detection of Disinformation: A Thematic Analysis. Journalism and Media, 2023, vol. 4, iss. 2, pp. 679-687. DOI: 10.3390/journalmedia4020043.

Wawrzynski, T. Artificial intelligence and cyberculture. Radioelectronic And Computer Systems, 2020, no. 3, pp. 20-26. DOI: 10.32620/reks.2020.3.02.

Badi, H., Badi, I., Moutaouakil, K. E., Khamjane, A., & Bahri, A. Sentiment analysis and prediction of polarity vaccines based on Twitter data using deep NLP techniques. Radioelectronic and Computer Systems, 2022, no. 4, pp. 19-29. DOI: 10.32620/reks.2022.4.02.

Lai, C.-M., Chen, M.-H., Kristiani, E., Verma, V. K., & Yang, C.-T. Fake News Classification Based on Content Level Features. Applied Sciences, 2022, vol. 12, iss. 3, article no. 1116. DOI: 10.3390/app12031116.

Kumar, S., & Arora, B. A Review of Fake News Detection Using Machine Learning Techniques. 2021 Second International Conference on Electronics and Sustainable Communication Systems (ICESC), 2021, pp. 1-8. DOI: 10.1109/ICESC51422.2021.9532796.

Capuano, N., Fenza, G., Loia, V., & Nota, F. D. Content-Based Fake News Detection With Machine and Deep Learning: a Systematic Review. Neurocomputing, 2023, vol. 530, pp. 91-103. DOI: 10.1016/j.neucom.2023.02.005.

Babaiev, V. M., Kadykova, I. M., Husieva, Yu. Yu., & Chumachenko, I. V. The method of adaptation of a project-oriented organization’s strategy to exogenous changes. Naukovyi Visnyk Natsionalnoho Hirnychoho Universytetu, 2017, vol. 2, pp. 134-140.

Yakovlev, S., Bazilevych, K., Chumachenko, D., Chumachenko, T., Hulianytskyi, L., Meniailov, I., & Tkachenko, A. The Concept of Developing a Decision Support System for the Epidemic Morbidity Control. CEUR Workshop Proceedings, 2020, vol. 2753, pp. 265-274.

Akram, H., & Shahzad, K. Ensembling Machine Learning Models for Urdu Fake News Detection. CEUR Workshop Proceedings, 2022, vol. 3159, pp. 1142-1149.

Tian, Z., & Baskiyar, S. Fake News Detection using Machine Learning with Feature Selection. Proceedings of the 2021 6th International Conference on Computing, Communication and Security, ICCCS 2021, 2021, pp. 1-6. DOI: 10.1109/ICCCS51487.2021.9776346.

Choudhury, D., & Acharjee, T. A novel approach to fake news detection in social networks using genetic algorithm applying machine learning classifiers. Multimedia Tools and Applications, 2022, vol. 82, iss. 6. DOI: 10.1007/s11042-022-12788-1.

Fahad, N., Goh, K.-S., Hossen, I., Shopnil, K. M. S., Mitu, I. J., Alif, A., & Connie, T. Stand up Against Bad Intended News: An Approach to Detect Fake News using Machine Learning. Emerging science journal, 2023, vol. 7, iss. 4, pp. 1247-1259. DOI: 10.28991/esj-2023-07-04-015.

Park, M., & Chai, S. Constructing a User-Centered Fake News Detection Model by Using Classification Algorithms in Machine Learning Techniques. IEEE Access, 2023, vol. 11, pp. 71517-71527. DOI: 10.1109/ACCESS.2023.3294613.

Salh, D., & Nabi, R. M. Kurdish Fake News Detection Based on Machine Learning Approaches. Passer journal of basic and applied sciences, 2023, vol. 5, iss. 2, pp. 262-271. DOI: 10.24271/psr.2023.380132.1226.

Kumar Dutta, A., Qureshi, B., Albagory, Y., Alsanea, M., Al Faraj, M., & Rahaman Wahab Sait, A. Optimal Weighted Extreme Learning Machine for Cybersecurity Fake News Classification. Computer Systems Science and Engineering, 2023, vol. 44, iss. 3, pp. 2395-2409. DOI: 10.32604/csse.2023.027502.

Tohabar, Md. Y., Nasrah, N., & Samir, A. M. Bengali Fake News Detection Using Machine Learning and Effectiveness of Sentiment as a Feature. 2021 Joint 10th International Conference on Informatics, Electronics & Vision (ICIEV) and 2021 5th International Conference on Imaging, Vision & Pattern Recognition (icIVPR) 2021. DOI: 10.1109/icievicivpr52578.2021.9564138.

Fifita, F., Smith, J., Hanzsek-Brill, M. B., Li, X., & Zhou, M. Machine Learning-Based Identifications of COVID-19 Fake News Using Biomedical Information Extraction. Big Data and Cognitive Computing, 2023, vol. 7, iss. 1, article no. 46. DOI: 10.3390/bdcc7010046.

Yousif, S. A., & Jehad, R. Classification of Covid-19 fake news using machine learning algorithms. AIP Conference Proceedings, 2022, vol. 2483, iss. 1, article no. 070007. DOI: 10.1063/5.0117133.

Verma, P. K., Agrawal, P., Amorim, I., & Prodan, R. WELFake: Word Embedding Over Linguistic Features for Fake News Detection. IEEE Transactions on Computational Social Systems, 2021, vol. 8, iss. 4, pp. 881-893. DOI: 10.1109/tcss.2021.3068519.

Agusta, Z. P., & Adiwijaya, A. Modified balanced random forest for improving imbalanced data prediction. International Journal of Advances in Intelligent Informatics, 2019, vol. 5, iss. 1, pp. 58-65. DOI: 10.26555/ijain.v5i1.255.

Chen, T., & Guestrin, C. XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’16, 2016, pp. 785-794. DOI: 10.1145/2939672.2939785.

Natekin, A., & Knoll, A. Gradient boosting machines, a tutorial. Frontiers in Neurorobotics, 2013, vol. 7, article no. 21. DOI: 10.3389/fnbot.2013.00021.

Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., & Liu, T.-Y. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. NIPS'17: Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017, pp. 3149-3157.

Optuna - A hyperparameter optimization framework. Optuna, 2023. Available at: https://optuna.org (Accessed 26 Apr. 2023).

Gupta, A., Batla, A., Kumar, C., & Jain, G. Comparative Analysis of Machine Learning Models for Fake News Classification. 3rd International Conference on Intelligent Technologies (CONIT), 2023, pp. 1-5. DOI: 10.1109/CONIT59222.2023.10205870.

Kausar, N., AliKhan, A., & Sattar, M. Towards better representation learning using hybrid deep learning model for fake news detection. Social Network Analysis and Mining, 2022, vol. 12, iss. 1. DOI: 10.1007/s13278-022-00986-6.

DOI: https://doi.org/10.32620/reks.2023.4.01

Refbacks

There are currently no refbacks.

Username
Password
Remember me

RADIOELECTRONIC AND COMPUTER SYSTEMS

Ensemble machine learning approaches for fake news classification

Abstract

Keywords

Full Text:

References

Refbacks