UDC 004.724
A SPECIALIZED METHOD OF TEXT RECOGNITION FOR AUTOMATIC PROCESSING OF PASSPORT DATA
P. S. Drugov, Junior researcher of SPIIRAS, St. Petersburg, Russia; Master’s student of ETU, St. Petersburg, Russia;
orcid.org/0000-0002-0319-0554, e-mail: This email address is being protected from spambots. You need JavaScript enabled to view it.
E. E. Usina, Junior researcher of SPIIRAS, St. Petersburg, Russia; Master’s student of SUAI, St. Petersburg, Russia;
orcid.org/0000-0001-9745-0216, e-mail: This email address is being protected from spambots. You need JavaScript enabled to view it.
Smart environments, being developed for further implementation within existing enterprise architectures, require a crucial prerequisite to be met: adding of identity data of new visitors into enterprise database. The aim of this paper is to develop a method for text recognition on the pages of internal passport of the Russian Federation. The aim of the developed method presented here consists in the processing of page content (on the example of Russian internal passport) including, particularly, the following operations: image magnification, object (passport page) segmentation, image processing with filters and text recognition. To test the method a vertical stand with space for placing a document on top was designed accompanied by a relevant source code in Python programming language. The tests performed within this research revealed, that the highest success rate in text recognition is achieved when the distance from a camera to a document is 250 mm (88,8 %); whereas by camera focus adjustment and distance increase the recognition outcomes were significantly worse, 66,7 % and 22,2 % respectively. Implementing such solution into enterprise operation would allow avoiding human mistake factor in data transfer (caused by poor attention or fatigue of employees) and to optimize the input of customer data into database.
Key words: text recognition, OCR, Tesseract, optical character recognition, OpenCV, image processing, domestic passport, smart environment.