The system for the automated analysis of server logs using NLP methods and text compression algorithms

Vladyslav Chernyshchuk, Olga Morozova

Abstract


The subject of the article is the processes of  the automated analysis of server logs of software systems using Natural Language Processing methods. The goal is to develop an approach to improve the efficiency of log analysis by applying NLP algorithms and text compression methods. The tasks to be solved are: to study the features of logs as a source of technical information; to analyze modern methods of text data processing, including statistical, vector-based, and neural network approaches; to justify the feasibility of using NLP algorithms for automating log analysis; to substantiate the use of text compression methods to reduce data volume; and to develop a concept and architecture of an automated log analysis system. The methods used include: Natural Language Processing techniques such as TF-IDF, Word2Vec, FastText, and transformer-based models; text preprocessing methods; log compression and log template extraction approaches; and information retrieval methods applied to text corpora. The following results were obtained. A concept of an automated server log analysis system based on NLP methods and text compression algorithms was developed. A generalized system architecture was proposed, including modules for log collection, storage, filtering, preprocessing, text compression, NLP analysis, incident representation, and solution search. An algorithm for system operation was developed, providing step-by-step log processing while taking into account the textual nature and large volume of data. It was established that the use of NLP methods improves the accuracy of error detection and incident classification, while text compression reduces the computational load and increases system performance. Conclusions. The proposed approach improves the efficiency of software diagnostics, reduces the time required for error detection and resolution, and decreases the workload on developers. The scientific novelty of the obtained results lies in the following: an approach to automated log analysis combining NLP methods and text compression techniques was developed; a generalized architecture of a log analysis system that considers the specifics of unstructured textual data was proposed; existing log processing methods were further developed through the integration of modern NLP models with data volume optimization stages, which improves analysis efficiency in conditions of large-scale information flows

Keywords


log analysis; natural language processing; NLP algorithms; text compression; intelligent systems; software diagnostics; data processing; fault tolerance

References


He, P., Zhu, J., He, S., Li, J., & Lyu, M. R.A Survey on Automated Log Analysis for Reliability Engineering. ACM Computing Survey, 2020, vol. 1, iss. 1, pp. 1–37. DOI: 10.48550/arXiv.2009.07237.

What is Site Reliability Engineering (SRE)? Amazon Web Services. Available at: https://aws.amazon.com/what-is/sre/ (accessed 15.01.2026).

Salz, P. Logging in the Java Platform. Available at: https://docs.oracle.com (accessed 15.01.2026).

Sommerville, I. Software Engineering. Harlow, Pearson, 2016. 816 p.

Tanenbaum, A. S., & Bos, H. Modern Operating Systems. Boston, Pearson, 2015. 1136 p.

Silberschatz, A., Galvin, P. B., & Gagne, G. Operating System Concepts. Hoboken, Wiley, 2018. 976 p.

Gerhards, R. The Syslog Protocol (RFC 5424). IETF, 2009. 37 p.

Log4j Documentation. Apache Software Foundation. Available at: https://logging.apache.org/log4j/ (accessed 15.01.2026).

Jurafsky, D. & Martin, J. H. Speech and Language Processing, 2023. 600 p.

Devlin, J., Chang, M.-W., Lee, K., & Toutanova K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of NAACL-HLT, 2019, pp. 4171–4186.

Xu, W., Huang, L., Fox, A., Patterson, D., & Jordan, M. Detecting Large-Scale System Problems by Mining Console Logs. Proceedings of SOSP, 2009, pp. 117–132.

He, P., Zhu, J., Zheng, Z., & Lyu, M. R. Drain: An Online Log Parsing Approach with Fixed Depth Tree. IEEE ICWS, 2017, pp. 33–40.

Du, M., & Li, F. DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning. ACM CCS, 2017, pp. 1285–1298.

Zhang, T., Liu, Y., & Wang, J. Log Analysis Using Large Language Models for Anomaly Detection. Proceedings of the ACM Symposium on Cloud Computing (SoCC), 2024, pp. 98–110.

Chen, X., Li, Y., & Zhang, H. A Survey on Deep Learning for Log Analysis in Distributed Systems. Future Generation Computer Systems, 2023, vol. 140, pp. 452–468.

Wang, Z., Xu, Q., & Chen, L. Log Parsing and Anomaly Detection Using Pre-trained Language Models. IEEE Transactions on Network and Service Managemen, 2024, vol. 21, iss. 1, pp. 112–125.




DOI: https://doi.org/10.32620/aktt.2026.2.13