Development of tools for information protection of optical text recognition systems

Konstantin Dergachov, Leonid Krasnov, Vladislav Bilozerskyi, Anatolii Zymovin

Abstract


The subject of research. There has been studying a new universal method of information protection in optical text recognition systems when transmitting confidential data over open communication channels. This work develops the concept of creating a modern, simple and reliable method for protecting information during its transmission over communication channels, to determine the objective criteria for the quality of its work, to create a set of algorithms for implementing the proposed method and software for conducting experimental studies. The current work puts on the concept of creation of a simple and reliable current method for protecting information when passing it through communication channels, also to define the objective criteria for assessing the tool operation quality and to exercise the dedicated programs, which implement the proposed methods and developed algorithms. Based on the results of these studies, it must evaluate the practice effectiveness of the proposed method in terms of both the transmitted data coding/decoding reliability and the secrecy of the fact of special information transmission. Results. It is described the universal concept of producing and use of the contemporary methods of information protection in optical text recognition systems in a confidential data transmission over open communication channels. The main criteria for these systems performance quality are determined. A new combined method for encrypting transmitted messages using QR-codes with subsequent masking of the fact of data transmission by various methods of LSB-steganography is proposed. To conduct experimental studies, a text recognition program based on Tesseract OCR software version 4.0 was developed. The program in Python uses the recent resources of the OpenCV library. The dedicated software technique contributed to assessing the efficiency of the algorithms, which realized the transmitted data encryption and therefore communication links privacy. There are examples of the system operation and results of the software testing in modes of messages encoding for subsequent hidden transmission. Conclusion. The case studies acknowledge the high efficiency of the proposed method of confidential data protection when transmitting them via open networks. The technique can be taken as a basis for developing software aimed at protecting information in OCR systems offered by various manufacturers.

Keywords


information protection in optical text recognition systems; correct text recognition probability; algorithms for preliminary processing of initial data; text information encoding; QR-code; LSB-steganography algorithms for hidden data transmission

Full Text:

PDF

References


Sahu, N., Sonkusare, M. A Study on Optical Character Recognition-Techniques. The International Journal of Computational Science, Information Technology and Control Engineering (IJCSITCE), 2017, vol. 4, no. 1. 14 p. DOI: 10.5121/ijcsitce.2017.4101.

Mujibur Rahman Majumder et al. Offline optical character recognition (OCR) method: An effective method for scanned documents. 22nd International Conference on Computer and Information Technology (ICCIT) – 2019, pp. 1-5. DOI: 10.1109/ICCIT48885. 2019. 9038593.

Viet, Anh Phan. et al. Improved OCR quality for smart scanned document management system. Journal of Science and Technique − Le Quy Don Technical University, 2020, no. 210, pp. 51-67.

Tesseract − ocr/Tesseract. Available at: https://github.com/tesseract-ocr/tesseract. (аccessed 17.01.2022).

Python-tesseract − Optical character recognition (OCR) tool for Python. Available at: https: //pypi.org/project/ pytesseract/. (аccessed 17.01.2022).

Pawar, N., Shaikh, Z., Shinde, P., Warke, Y., Image to Text Conversion Using Tesseract. International Research Journal of Engineering and Technology (IRJET), 2019, vol. 6, iss. 2, pp. 516-519.

Abbyy Finereader (Skaner s iskusstvennym intellektom dlya otsifrovki v PDF i raspoznavaniya teksta) [Abbyy Finereader (Scanner with artificial intelligence for digitizing to PDF and OCR)]. Available at: https://www.abbyy.com/ru/finereader/ (аccessed 17.01.2022).

OCRopus − OCR-sistema dlya raspoznavaniya tekstov na baze tesseract [OCRopus − tesseract based OCR system for text recognition]. Available at: https://ru.wikipedia.org/wiki/Cognitive_Technologies (аccessed 17.01.2022).

Dergachov, K. et al. Data pre-processing to increase the quality of optical text recognition systems. Radioelektronni i komp'uterni sistemi – Radioelectronic and computer systems, 2021, no. 4(100), pp. 183-198. DOI: 10.32620/reks.2021.4.15.

Dergachov, K. et al. Methods and algorithms for protecting information in optical text recognition systems. Radioelektronni i komp'uterni sistemi – Radioelectronic and computer systems, 2022, no. 1(101), pp. 154-169. DOI: 1032620/reks.2022.1.12.

Srivastava S., Verma A., Sharma, S. Optical Character Recognition Techniques: A Review. IEEE International Students' Conference on Electrical, Electronics and Computer Science (SCEECS), 2022. pp. 1-6. DOI: 10.1109/SCEECS54111.2022. 9740911.

Lin, G.-S. et al. Keyword Detection Based on RetinaNet and Transfer Learning for Personal Information Protection in Document Image. Appl. Sci., 2021, vol. 11, iss. 20, article no. 9528. DOI: 10.3390/app11209528.

Shemiakina, J. et al. A Method of Image Quality Assessment for Text Recognition on Camera-Captured and Projectively Distorted Documents. Mathematics, 2021, vol. 9, iss. 17, article no. 2155. DOI: 10.3390/math9172155.

De Jager, C. et al. Business Process Automation: A Workflow Incorporating Optical Character Recognition and Approximate String and Pattern Matching for Solving Practical Industry Problems. Appl. Syst. Innov., 2019, vol. 2, no. 4, article no. 33. DOI: 10.3390/asi2040033.

Sasmitha Kumari Sahu et al. Manual character recognition with OCR. Project, 2021. DOI: 10.13140/RG.2.2.32608.81927.

Masud, K. I. et al. A New Approach of Cryptography for Data Encryption and Decryption. 5th International Conference on Computing and Informatics (ICCI), 2022, pp. 918-922. DOI: 10.1109/ICEARS53579.2022.9751932.

William, P. et al. Assessment of Hybrid Cryptographic Algorithm for Secure Sharing of Textual and Pictorial Content. International Conference on Electronics and Renewable Systems (ICEARS), 2022, pp. 918-922. DOI: 10.1109/ICEARS 53579.2022. 9751932.

Ahamed, M. S., Asiful, Mustafa H. A Secure QR Code System for Sharing Personal Confidential Information. International Conference on Computer, Communication, Chemical, Materials and Electronic Engineering (IC4ME2), 2019, pp. 1-4. DOI: 10.1109/IC4ME247184.2019.9036521.

Pastukhov, D. F. et al. Some Methods of QR code Transmission using Steganography. World of transport and transportation, 2019, vol. 17, Iss. 3, pp. 16–39.

Rituraj, R. et al. QR code image steganography (LSB BIT) with secret image (MSB BIT) using AES cryptography and JPEG compression. International Journal of Recent Scientific Research, 2019, vol. 9, Iss. 7, pp. 27820-27826.

Yudin, O. et al. Efficiency Assessment of the Steganographic Coding Method with Indirect Integration of Critical Information. IEEE International Conference on Advanced Trends in Information Theory (ATIT), 2019, pp. 36-40. DOI: 10.1109/ATIT49449.2019. 9030473.

Li, F., Krivenko, S., Lukin, V. Two-step provsding of desired quality in lossy image compression by spiht. Radioelektronni i komp'uterni sistemi – Radioelectronic and computer systems, 2020, no. 2(94), pp. 22-32. DOI: 10.32620/reks.2020.2.02.

Wazirali, R. et al. Objective Quality Metrics in Correlation with Subjective Quality Metrics for Steganography. Asia-Pacific Conference on Computer Aided System Engineering, 2015, pp. 238-245, DOI: 10.1109/APCASE.2015.49.

Python Developer's Guide. Available at: http://python.org (аccessed 17.01.2022).

OpenCV Tutorials − Image Processing (imgproc module). Available at: https://opencv.org/ (аccessed 17.01.2022).

Python-tesseract − Optical character recognition (OCR) tool for Python. Available at: https://pypi.org/project/pytesseract/.(аccessed 17.01.2022).




DOI: https://doi.org/10.32620/reks.2022.2.13

Refbacks

  • There are currently no refbacks.