A general method for real-time detection of information threats with a Ukraine case study
Abstract
Keywords
Full Text:
PDFReferences
Robertson, S., & Zaragoza, H. The probabilistic relevance framework: BM25 and beyond. Foundations and Trends in Information Retrieval, 2009, vol. 3, iss. 4, pp. 333–389. DOI: 10.1561/1500000019.
Xiong, L., Xiong, C., Li, Y., Tang, K.F., Liu, J., Bennett, P., Ahmed, J., & Overwijk, A. Approximate nearest neighbor negative contrastive learning for dense text retrieval. Proceedings of the International Conference on Learning Representations, ICLR, 2020. Available at: https://arxiv.org/abs/2007.00808 (accessed 12.05.2025).
Stanovsky, G., Michael, J., Zettlemoyer, L., & Dagan, I. Supervised open information extraction. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, New Orleans, Louisiana, USA, ACL, 2018, pp. 885–895. DOI: 10.18653/v1/N18-1081.
Hamborg, F., Donnay, K. & Gipp, B. Automated identification of media bias in news articles: an interdisciplinary literature review. International Journal on Digital Libraries, 2019, vol. 20, pp. 391–415. DOI: 10.1007/s00799-018-0261-y.
Shu, K., Bhattacharjee, A., Alatawi, F., Nazer, T.H., Ding, K., Karami, M., & Liu, H. Combating disinformation in a social media age. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2020, vol. 10, iss. 6, article no. e1385. DOI: 10.1002/widm.1385.
Doddapaneni, S., Khan, M. S. U. R., Venkatesh, D., Dabre, R., Kunchukuttan, A., & Khapra, M. M. Cross-lingual auto evaluation for assessing multilingual LLMs. Proceedings of the 64rd Annual Meeting of the Association for Computational Linguistics (volume 1: Long Papers), Vienna, Austria, ACL, 2025, pp. 29297–29329. DOI: 10.18653/v1/2025.acl-long.1419.
Pires, T., Schlinger, E., & Garrette, D. How multilingual is Multilingual BERT? Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, ACL, 2019, pp. 4996–5001. DOI: 10.18653/v1/P19-1493.
Chen, D., Li, Z., Gu, B., & Chen, Z. Multimodal named entity recognition with image attributes and image knowledge. Database Systems for Advanced Applications. DASFAA, 2021, Lecture Notes in Computer Science, vol. 12682, pp. 183–198. DOI:10.1007/978-3-030-73197-7_12.
Satvat, K., Gjomemo, R., & Venkatakrishnan, V.N. Extractor: Extracting attack behavior from threat reports. Proceedings of the 2021 IEEE European Symposium on Security and Privacy, EuroS&P, Vienna, Austria, IEEE, 2021, pp. 414–429. DOI:10.1109/EuroSP51992.2021.00046.
Kapan, S., & Sora Gunal, E. Improved phishing attack detection with machine learning: A comprehensive evaluation of classifiers and features. Applied Sciences, 2023, vol. 13, iss. 24, pp. 1–13, DOI: 10.3390/app132413269.
Kondratyuk, D., & Straka, M. 75 languages, 1 model: Parsing Universal Dependencies universally. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing EMNLP-IJCNLP, Hong Kong, China, ACL, 2019, pp. 2779–2795. DOI: 10.18653/v1/D19-1284.
Lu, Y., Nie, Z., Cheng, T., Gao, Y., & Wen, J. R. Name disambiguation using a web connection. Proceedings of AAAI 2007 Workshop on Information Integration on the Web, IIWeb, Vancouver, Canada, AAAI, 2007, pp. 57–61. Available at: https://cdn.aaai.org/Workshops/2007/WS-07-14/WS07-14-010.pdf (accessed 12.05.2025).
Ester, M., Kriegel, H.-P., Sander, J., & Xu, X. A density-based algorithm for discovering clusters in large spatial databases with noise. Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, AAAI, Portland, Oregon, USA, 1996, pp. 226–231. DOI: 10.5555/3001460.3001507.
Reimers, N., & Gurevych, I. Sentence-BERT: Sentence embeddings using Siamese BERT-networks. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Hong Kong, China, ACL, 2019, pp. 3982–3992. DOI: 10.18653/v1/D19-1410.
Gao, T., Yao, X., & Chen, D. SimCSE: Simple contrastive learning of sentence embeddings. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Punta Cana, Dominican Republic, EMNLP, 2021, pp. 6894–6910. DOI: 10.18653/v1/2021.emnlp-main.552.
Potthast, M., Kiesel, J., Reinartz, K., Bevendorff, J., & Stein, B. A stylometric inquiry into hyperpartisan and fake news. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Australia, ACL, 2018, pp. 231–240. DOI: 10.18653/v1/P18-1022.
Corley, C., & Mihalcea, R. Measuring the semantic similarity of texts. Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment, Ann Arbor, USA, ACL, 2005, pp. 13–18. DOI: 10.3115/1641356.1641359.
Schubert, E., Sander, J., Ester, M., Kriegel, H.-P., & Xu, X. DBSCAN revisited, revisited: Why and how you should (still) use DBSCAN. ACM Transactions on Database Systems, TODS, 2017, vol. 42, iss. 3, pp. 1–21. DOI: 10.1145/3068335.
Gu, X., Angelov, P. P., Kangin, D., & Principe, J. C. A new type of distance metric and its use for clustering. Evolving Systems, 2017, vol. 8, iss. 3, pp. 167–177. DOI: 10.1007/s12530-017-9195-7.
Artetxe, M., & Schwenk, H. Massively multilingual sentence embeddings for zero-shot cross-lingual transfer and beyond. Transactions of the Association for Computational Linguistics, 2019, vol. 7, pp. 597–610. DOI: 10.1162/tacl_a_00288.
Hutto, C .J., & Gilbert, E. VADER: A parsimonious rule-based model for sentiment analysis of social media text. Proceedings of the 8th International Conference on Weblogs and Social Media, ICWSM, Ann Arbor, Michigan, USA, AAAI, 2014, vol. 8, no. 1, pp. 216–225. DOI: 10.1609/icwsm.v8i1.14550.
Medhat, W., Hassan, A., & Korashy, H. Sentiment analysis algorithms and applications: A survey. Ain Shams Engineering Journal, 2014, vol. 5, no. 4, pp. 1093–1113. DOI: 10.1016/j.asej.2014.04.011.
Sun, C., Huang, L., & Qiu, X. Utilizing BERT for aspect-based sentiment analysis via constructing auxiliary sentence. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT, Minneapolis, Minnesota, USA, Association for Computational Linguistics, 2019, pp. 380–385. DOI: 10.18653/v1/N19-1035.
Joshi, A., Bhattacharyya, P., & Carman, M.J. Automatic sarcasm detection: A survey. ACM Computing Surveys, 2017, vol. 50, iss. 5, pp. 1–22. DOI: 10.1145/3124420.
Kaushik, D., Hovy, E., & Lipton, Z.C. Learning the difference that makes a difference with counterfactually-augmented data. Proceedings of the 8th International Conference on Learning Representations, ICLR, 2020. Available at: https://arxiv.org/abs/1909.12434 (accessed 12.09.2025).
Volkova, S., & Jang, J.Y. Misleading or falsification: Inferring deceptive strategies and types in online news and social media. Proceedings of the 27th ACM International Conference on Information and Knowledge Management, Beijing, PRC, ACM, 2018, pp. 575–583. DOI: 10.1145/3184558.3188728.
Zhang, L., Wang, S., & Liu, B. Deep learning for sentiment analysis: A survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2018, vol. 8, no. 4, e1253. DOI: 10.1002/widm.1253.
Ghosh, D., & Veale, T. Fracking sarcasm using neural network. Proceedings of the 7th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, WASSA, San Diego, USA, ACL, 2016, pp. 161–169. DOI: 10.18653/v1/W16-0425.
Cakebread‑Andrews, O., Ha, L. A., Frommholz, I., & Can, B. Error analysis of NLP models and non‑native speakers of English identifying sarcasm in Reddit comments. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, Torino, Italy, ELRA and ICCL, 2024, pp. 6247–6256. Available at: https://aclanthology.org/2024.lrec-main.552/ (accessed 12.05.2025).
Breve, B., Caruccio, L., Cirillo, S., Deufemia, V., & Polese, G. Analyzing the worldwide perception of the Russia–Ukraine conflict through Twitter. Journal of Big Data, 2024, vol. 11, article no. 76, pp. 1–33. DOI: 10.1186/s40537‑024‑00921‑w.
Manning, C. D., Raghavan, P., & Schütze, H. Introduction to Information Retrieval. Cambridge, Cambridge University Press, 2008. 506 p.
Joachims, T. Text categorization with support vector machines: learning with many relevant features. Proceedings of the 10th European Conference on Machine Learning, Chemnitz, Germany, AAAI, 1998, pp. 137–142. DOI: 10.1007/BFb0026683.
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, Minnesota, USA, ACL, 2019, vol. 1, pp. 4171–4186. DOI: 10.18653/v1/N19-1423.
Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., & Hovy, E. Hierarchical attention networks for document classification. Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, California, USA, ACL, 2016, pp. 1480–1489. DOI: 10.18653/v1/N16-1174.
Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzmán, F., Grave, E., Ott, M., Zettlemoyer, L., & Stoyanov, V. Unsupervised cross-lingual representation learning at scale. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL, 2020, pp. 8440–8451. DOI: 10.18653/v1/2020.acl-main.747.
Baccianella, S., Esuli, A., & Sebastiani, F. Senti WordNet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining. Proceedings of the Seventh International Conference on Language Resources and Evaluation, Valletta, Malta, ELRA, 2010, pp. 2200–2204. Available at: http://lrec-conf.org/proceedings/lrec2010/pdf/769_Paper.pdf (accessed 13.09.2025).
Sanh, V., Debut, L., Chaumond, J., & Wolf, T. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. Proceedings of the 5th Workshop on Energy Efficient Machine Learning and Cognitive Computing - NeurIPS 2019, Vancouver, BD, Canada, IEEE 2019. DOI: 10.48550/arXiv.1910.01108.
Jiao, X., Yin, Y., Shang, L., Jiang, X., Chen, X., Li, L., Wang, F., & Liu, Q. TinyBERT: Distilling BERT for natural language understanding. Findings of the Association for Computational Linguistics, 2020, pp. 4163–4174. DOI: 10.18653/v1/2020.findings-emnlp.372.
Breiman, L. Random forests. Machine Learning. 2001, vol. 45, no. 1, pp. 5–32. DOI: 10.1023/A:1010933404324.
Prabowo, R., & Thelwall, M. Sentiment analysis: A combined approach. Journal of Informetrics, 2009, vol. 3, no. 2, pp. 143–157. DOI: 10.1016/j.joi.2009.01.003.
Forman, G. An extensive empirical study of feature selection metrics for text classification. Journal of Machine Learning Research. 2003, vol. 3, pp. 1289–1305. Available at: https://www.jmlr.org/papers/volume3/forman03a/forman03a.pdf (accessed 14.09.2025).
Campos, R., Mangaravite, V., Pasquali, A., Jorge, A.M., Nunes, C., & Jatowt, A. YAKE! Keyword extraction from single documents using multiple local features. Information Sciences, 2020, vol. 509, pp. 257–289. DOI: 10.1016/j.ins.2019.09.013.
Mihalcea, R., & Tarau, P. TextRank: Bringing order into texts. Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain, ACL, 2004, pp. 404–411. Available at: https://aclanthology.org/W04-3252 (accessed 14.05.2025).
Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 1990, vol. 41, no. 6, pp. 391–407. DOI: 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9.
Blei, D. M., Ng, A. Y., & Jordan, M. I. Latent Dirichlet Allocation. Journal of Machine Learning Research, 2003, vol. 3, pp. 993–1022. Available at: https://www.jmlr.org/papers/volume3/blei03a/blei03a.pdf (accessed 14.05.2025).
Grootendorst, M. KeyBERT: Minimal keyword extraction with BERT. 2020. Available at: http://dx.doi.org/10.5281/zenodo.4461265 (accessed 14.05.2025).
Cohan, A., Feldman, S., Beltagy, I., Downey, D., & Weld, D. S. SPECTER: Document-level representation learning using citation‑informed transformers. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. ACL, 2020, pp. 2270–2282. DOI: 10.18653/v1/2020.acl-main.207.
Wan, X., & Xiao, J. Single document keyphrase extraction using neighborhood knowledge. Proceedings of the 23rd AAAI Conference on Artificial Intelligence, Chicago, USA, AAAI, 2008, pp. 855–860. Available at: https://cdn.aaai.org/AAAI/2008/AAAI08-136.pdf (accessed 12.05.2025).
Papagiannopoulou, E., & Tsoumakas, G. A review of keyphrase extraction. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery. 2020, vol. 10, no. 2, article no. e1339. DOI: 10.1002/widm.1339.
Hardt, M., Price, E., & Srebro, N. Equality of opportunity in supervised learning. NIPS'16: Proceedings of the 30th International Conference on Neural Information Processing Systems, Barcelona, Spain,Curran Associates Inc., 2016, pp. 3323–3331. DOI: 10.5555/3157382.
Arnold, M., Bellamy, R.K.E., Hind, M., Houde, S., Mehta, S., Nair, R., & Nushi, B. Factsheets: Increasing trust in AI services through supplier’s declarations of conformity. IBM Journal of Research and Development. 2019, vol. 63, no. 4/5, pp. 6:1–6:13. DOI: 10.1147/JRD.2019.2942288.
Dodge, J., Ilharco, G., Schwartz, R., Farhadi, A., Hajishirzi, H., & Smith, N. A. Fine-tuning pretrained language models: Weight initializations, data orders, and early stopping. 2020. DOI: 10.48550/arXiv.2002.06305.
Fort, S., Ren, J., & Lakshminarayanan, B. Exploring the limits of out-of-distribution detection. NIPS'21: Proceedings of the 35th International Conference on Neural Information Processing Systems. NeurIPS, Virtual Conference, Curran Associates Inc., 2021, pp. 1–14. Available at: https://proceedings.neurips.cc/paper_files/paper/2021/file/3941c4358616274ac2436eacf67fae05-Paper.pdf (accessed 14.05.2025).
Ribeiro, M. T., Singh, S., & Guestrin, C. "Why should I trust you?": Explaining the predictions of any classifier. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, USA, ACM, 2016, pp. 1135–1144. DOI: 10.1145/2939672.2939778.
Lundberg, S. M., & Lee, S.-I. A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems. NeurIPS, Long Beach, California, USA, 2017, vol. 30. Available at: https://proceedings.neurips.cc/paper_files/paper/2017/file/8a20a8621978632d76c43dfd28b67767-Paper.pdf (accessed 14.05.2025).
Wachter, S., Mittelstadt, B., & Russell, C. Counterfactual explanations without opening the black box: Automated decisions and the GDPR. Harvard Journal of Law & Technology. 2017, vol. 31, no. 2, pp. 841–887. Available at: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3063289 (accessed 14.05.2025).
Shu, K., Sliva, A., Wang, S., Tang, J., & Liu, H. Fake news detection on social media: A data mining perspective. ACM SIGKDD Explorations Newsletter. 2017, vol. 19, no. 1, pp. 22–36. DOI: 10.1145/3137597.3137600.
Vidgen, B., Harris, A., Nguyen, D., Tromble, R., Hale, S., & Margetts, H. Challenges and frontiers in abusive content detection. Proceedings of the Third Workshop on Abusive Language, Florence, Italy, ACL, 2020, pp. 6025–6044. Available at: https://ora.ox.ac.uk/objects/uuid:3864e746-88c8-4f99-b912-52f4b4be289a/files/m25d33782b006cec944b22e5d744ed1b7 (accessed 14.05.2025).
Floridi, L., Cowls, J., Beltrametti, M., Chatila, R., Chazerand, P., Dignum, V., Luetge, C., Madelin, R., Pagallo, U., Rossi, F., Schafer, B., Valcke, P., & Vayena, E. AI4People – An ethical framework for a good AI society. Minds and Machines. 2018, vol. 28, no. 4, pp. 689–707. DOI: 10.1007/s11023-018-9482-5.
Sachenko, A., Lendiuk, T., Lipianina‑Honcharenko, K., Dobrowolski, M., Boguta, G., & Bytsyura, L. Method of determining the text sentiment by thematic rubrics. Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Systems, Lviv, Ukraine, CEUR, 2024, pp. 404–414. DOI: 10.31110/COLINS/2024-3/026.
Lipianina-Honcharenko, K., Soia, M., Yurkiv, K., & Ivasechko, A. Evaluation of the effectiveness of machine learning methods for detecting disinformation in Ukrainian text data. Proceedings of the 5th International Conference on Computational Methods for Information Security, Zaporizhzhia, Ukraine, CEUR, 2024, pp. 97–109. Available at: https://ceur-ws.org/Vol-3702/paper9.pdf (accessed 14.05.2025).
Lipianina-Honcharenko, K., Lendiuk, D., Melnyk, N., Komar, M., & Lendiuk, T. Evaluation of the keyword selection methods effectiveness for the fake news classification. Proceedings of the 10th International Scientific Conference on Information Technology and Interactions, Kyiv, Ukraine, CEUR, 2024, pp. 109–122. Available at: https://ceur-ws.org/Vol-3909/Paper_9.pdf. (accessed 12.05.2025).
Zhang, L. L., Han, S., Wei, J., Zheng, N., Cao, T., Yang, Y., & Liu, Y. NN‑Meter: Towards accurate latency prediction of deep‑learning model inference on diverse edge devices. Proceedings of the 19th Annual International Conference on Mobile Systems, Applications, and Services. MobiSys, Wisconsin, USA, ACM, 2021, pp. 81–93. DOI: 10.1145/3458864.3467882.
Reddi, V.J., Cheng, C., Kanter, D., Mattson, P., Schmuelling, G., Wu, C.J., Coleman, C., Diamos, G., Elibol, M., Hall, D., Hazelwood, K., Hsu, B., Idiculla, N., Kumar, D., Levenberg, J., Tang, H., Warden, P., & et. al. MLPerf inference benchmark. Proceedings of the ISCA'20: Proceedings of the ACM/IEEE 47th Annual International Symposium on Computer Architecture. ISCA, Valencia, Spain, IEEE, 2020, pp. 446–459. DOI: 10.1109/ISCA45697.2020.00045.
Memarian, B., & Doleck, T. Fairness, accountability, transparency, and ethics (FATE) in artificial intelligence (AI) and higher education: A systematic review. Computers and Education: Artificial Intelligence. 2023, vol. 5, article no. 100152. DOI: 10.1016/j.caeai.2023.100152.
Kharchenko, V., Fesenko, H., & Illiashenko, O. Basic model of non-functional characteristics for assessment of artificial intelligence quality. Radioelectronic and Computer Systems. 2022, no. 2, pp. 131–144. DOI: 10.32620/reks.2022.2.11
Streamlit. Streamlit web application. Available at: https://github.com/streamlit/streamlit (accessed 14.05.2025).
Dotsenko, S., Illiashenko, O., Kharchenko, V., & Morozova, O. Integrated information model of an enterprise and cybersecurity management system: From data to activity. International Journal of Cyber Warfare and Terrorism. 2022, vol. 12, no. 2, pp. 1–21. DOI: 10.4018/IJCWT.305860.
Mygal, V., Mygal, G., & Illiashenko, O. Intelligent decision support – cognitive aspects. In: Digital Transformation, Cyber Security and Resilience of Modern Societies. Studies in Big Data, vol. 84. Cham, Springer, 2021, pp. 395–411. DOI: 10.1007/978-3-030-79934-2_26.
Kharchenko, V., Fesenko, H., & Illiashenko, O. Quality Models for Artificial Intelligence Systems: Characteristic-Based Approach, Development and Application. Sensors, 2022, vol. 22, iss. 13, article no. 4865. DOI: 10.3390/s22134865.
DOI: https://doi.org/10.32620/reks.2025.3.15
Refbacks
- There are currently no refbacks.
