In the classification stage, the prediction by partial matching (PPM) compression-based method is applied as a classifier to recognize the Arabic text. Relying on the extracted features, a character segmentation technique employed to segment-connected Arabic words into characters is introduced.
In the second stage, a new chain code representation technique using an agent-based model for the features extraction from non-dotted Arabic text images is proposed. In the pre-processing stage, a novel thinning algorithm is applied in order to produce skeletons for the Arabic text images.
Unlike other typical Arabic OCR systems, in the developed one, the feature extraction stage is performed prior to the character segmentation stage. It is divided into four stages: pre-processing, feature extraction as well as character segmentation and classification. In this work, the implementation of a printed Arabic OCR system is described. This research aims at developing an effective printed Arabic OCR system. Despite the important number of works studying the Arabic OCR, the latter still faces numerous challenges due to the special characteristics of the Arabic script.
As far as the Arabic language is concerned, the need to extend digital Arabic content on the Internet has recently motivated researchers to focus on the Arabic text recognition. Optical character recognition (OCR) is widely used in various real-world applications, such as digitizing learning resources, to assist visually impaired people and transform printed resources into electronic media.