Paragraph-oriented methods for determining the coherence and thematic unity of scientific and technical texts

Ihor Shevchenko, Pavlo Andreev, Maiia Dernova, Olena Poddubei

Abstract


The subject of the article is to determine the degree of scientific and technical text connectedness using statistical calculations. The aim of the scientific investigation is to study the possibilities of using the coherence of fluctuations in the relative frequencies of keywords in paragraphs to determine the lexical coherence and thematic unity of scientific and technical texts. The task is to develop a method for determining the thematic unity of a text at the set of paragraphs level; to develop a method for determining the coherence of a text at the set of paragraphs level; and to test the developed methods on a collection of documents. The methods used are statistical analysis and computational experiment methods. The following results were obtained. The study has shown that it is advisable to cluster paragraphs as points in the keyword space to determine the degree of scientific and technical text coherence at the level of paragraphs. This opens up the possibility of calculating the degree of thematic unity within the clusters and in the entire text. The degree of text fragments and the whole text coherence is determined by analyzing the sequence of paragraph numbers in the clusters. This makes it possible to formally determine the quality of the material presented in a scientific and technical article or in a textbook. Conclusions. The scientific novelty of the study is as follows: there was refined on the method for determination of the connectedness degree (coherence and thematic unity) of scientific and technical texts at the level of paragraphs by implementation of paragraphs clustering in the keywords space, using the calculation of thematic unity degree inside the clusters and in the overall text, as well as through analysis of paragraphs numbers sequence in clusters in order to determine the degree of text fragments and the overall text coherence. The methods are language-independent, based on clear hypotheses, and complement each other. The methods have an adjusting element that can be used to adapt it to different thematic and stylistic areas. It has been experimentally proved that the proposed methods for the determination of scientific and technical text connectedness are efficient and can provide the framework for information technology of content analysis of scientific and technical texts. The proposed methods do not use WEB resources for syntactic and semantic analysis, providing the possibility to use them autonomously.

Keywords


text coherence; thematic unity; paragraphs; keywords; relative frequencies; clusters

Full Text:

PDF

References


Shevchenko, I., Andreev, P., Dernova, M. and Khairova, N. Vykorystannya statystychnoyi modeli kogerentnosti zviaznogo tekstu v iakosti dodatkovogo instrumentu kilkisnogo kontent-analizu [Use of the statistical model of coherence of connected text as an additional tool of quantitative content analysis]. Visnyk Kremenchuts’koho natsional’noho universytetu imeni Mykhayla Ostrohrads’koho, Kremenchuk, KrNU Publ., 2021, no. 5, pp. 62-67. DOI: 10.30929/1995-0519.2021.5.62-67.

Badi, H., Badi, I., El Moutaouakil, K., Khamjane, A. and Bahri, A. Sentiment analysis and prediction of polarity vaccines based on Twitter data using deep NLP techniques. Radioelectronic and Computer Systems, 2022, no. 4(104), pp. 19-29. DOI: 10.32620/reks.2022.4.02.

Barzilay, R. and Lapata, M. Modeling local coherence: An entity-based approach. Computational Linguistics, 2008, vol. 34, iss. 1, pp. 1-34. DOI: 10.1162/coli.2008.34.1.1.

Guinaudeau, C. and Strube, M. Graph-based local coherence modeling. Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, 2013, vol. 1, pp. 93-103. Available at: https://aclanthology.org/P13-1010.pdf (accessed 12.02.2023).

Putra, J. W. G. and Tokunaga, T. Evaluating text coherence based on semantic similarity graph. Proceedings of TextGraphs-11: the Workshop on Graph-based Methods for Natural Language Processing ACL, 2017, Association for Computational Linguistics, Vancouver, Canada, August 3, 2017, pp. 76-85. Available at: http://aclanthology.lst.uni-saarland.de/W17-2410.pdf (accessed 03.02.2023).

Laban, P., Dai, L., Bandarkar, L. and Hearst, M. A. Can Transformer Models Measure Coherence In Text? Re-Thinking the Shuffle Test. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Short Papers), Association for Computational Linguistics, August 1-6, 2021, pp. 1058-1064. Available at: https://aclanthology.org/2021.acl-short.134.pdf (accessed 03.02.2023).

Pogorilyy, S. D., Kramov, A. A. and Biletskyi, P. V. Metod otsinky kogerentnosti ukrainomovnykh tekstiv z vykorystanniam zgortkovoi neyronnoi merezhi [Method for coherece evaluation of ukrainian texts using convo-lutional neural network]. Zbirnyk naukovykh pratsʹ Viysʹkovoho instytutu Kyyivsʹkoho natsionalʹnoho universytetu imeni Tarasa Shevchenka – Collection of Scientific Works of the Military Institute of Kyiv National Taras Shevchenko University, 2020, vol. 65, pp. 64-71. DOI: 10.17721/2519-481X/2019/65-08.

Li, J. and Hovy, E. A model of coherence based on distributed sentence representation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), October 25-29, 2014, Doha, Qatar, pp. 2039-2048. DOI: 10.3115/v1/D14-1218.

Cui, B., Li, Y., Zhang, Y. and Zhang, Z. Text Coherence Analysis Based on Deep Neural Network. Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, November 6-10, 2017, Singapore, Singapore, pp. 2027-2030. DOI: 10.1145/3132847.3133047.

Wadud, Md. A. H. and Rakib Md. R. H. Text Coherence Analysis based on Misspelling Oblivious Word Embeddings and Deep Neural Network. International Journal of Advanced Computer Science and Applications (IJACSA), 2021, vol. 12, no. 1, pp. 194-203. DOI: 10.14569/IJACSA.2021.0120124.

Xu, J., Ren, X., Zhang, Y., Zeng, Q., Cai, X. and Sun, X. A Skeleton-Based Model for Promoting Coherence Among Sentences in Narrative Story Generation. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018, Association for Computational Linguistics, Brussels, Belgium, pp. 4306-4315. DOI: 10.18653/v1/D18-1462.

Putri, E. H., Fadilah, D. R., Ivan, Suhartono, D., and Wiannastiti, M. Thematic Development for Measuring Cohesion and Coherence Between Sentences in English Paragraph. Fourth International Conference on Information and Communication Technologies (ICoICT), Bandung, Indonesia, 2016, pp. 54-59. DOI: 10.1109/ICoICT.2016.7571883.

Abdolahi, M. and Zahedi, M. A new model for text coherence evaluation using statistical characteristics. Journal of Electrical and Computer Engineering Innovations, 2018, vol. 6, iss. 1, pp. 15-24. DOI: 10.22061/JECEI.2018.799.

Li, K., Yan, D., Liu, Y. and Zhu, Q. A network-based feature extraction model for imbalanced text data. Expert Systems with Applications, 2022, vol. 195, article no. 116600. DOI: 10.1016/j.eswa.2022.116600.

Wang, X., Chen, Y., Liu, W. and Tai, W. Research on Text Classification Model Based on Self-Attention Mechanism and Multi-Neural Network. 3rd International Conference on Big Data & Artificial Intelligence & Software Engineering (ICBASE2022), October 21-23, 2022, Guangzhou, China. Available at: https://ceur-ws.org/Vol-3304/paper30.pdf (accessed 30.12.2022).

Crossley, S. A., Kyle, K. and Dascalu, M. The Tool for the Automatic Analysis of Cohesion 2.0: Integrating Semantic Similarity and Text Overlap. Behavioral Research Methods, 2019, vol. 51, iss. 1, pp. 14-27. DOI: 10.3758/s13428-018-1142-4.

Crossley, S. A., Kyle, K. and McNamara, D. S. The tool for the automatic analysis of text cohesion (TAACO): Automatic assessment of local, global, and text cohesion. Behavior Research Methods, 2016, vol. 48, iss. 4, pp. 1227-1237. DOI: 10.3758/s13428-015-0651-7.

Le, Elisabeth. The role of paragraphs in the construction of coherence text linguistics and translation studies. International Review of Applied Linguistics in Language Teaching (IRAL), 2004, vol. 42, iss. 3, pp. 259-275. DOI: 10.1515/iral.2004.013.




DOI: https://doi.org/10.32620/reks.2023.2.03

Refbacks

  • There are currently no refbacks.