Abstract

    Open Access Research Article Article ID: TCSIT-6-139

    Study of using hybrid deep neural networks in character extraction from images containing text

    P Preethi, HR Mamatha and Hrishikesh Viswanath*

    Character segmentation from epigraphical images helps the optical character recognizer (OCR) in training and recognition of old regional scripts. The scripts or characters present in the images are illegible and may have complex and noisy background texture. In this paper, we present an automated way of segmenting and extracting characters on digitized inscriptions. To achieve this, machine learning models are employed to discern between correctly segmented characters and partially segmented ones. The proposed method first recursively crops the document by sliding a window across the image from top to bottom to extract the content within the window. This results in a number of small images for classification. The segments are classified into character and non-character class based on the features within them. The model was tested on a wide range of input images having irregular, inconsistently spaced, hand written and inscribed characters.

    Keywords:

    Published on: Aug 4, 2021 Pages: 45-52

    Full Text PDF Full Text HTML DOI: 10.17352/tcsit.000039
    CrossMark Publons Harvard Library HOLLIS Search IT Semantic Scholar Get Citation Base Search Scilit OAI-PMH ResearchGate Academic Microsoft GrowKudos Universite de Paris UW Libraries SJSU King Library SJSU King Library NUS Library McGill DET KGL BIBLiOTEK JCU Discovery Universidad De Lima WorldCat VU on WorldCat

    Indexing/Archiving

    Pinterest on TCSIT