Key words:Arabic characters – Optical character recognition – Segmentation – Cursive script – Degraded text recognition
Springer Online Journal Archives 1860-2000
Abstract. In recognizing cursive scripts, a major undertaking is segmenting cursive words into characters and isolating merged characters. The segmentation is usually the pivotal stage in the system to which a sizable portion of processing is devoted and a considerable share of recognition errors is attributed. The most notable feature of Arabic writing is its cursiveness. Compared to other features, the cursiveness of Arabic words poses the most difficult problem for recognition algorithms. In this work, we describe the design and implementation of an Arabic word recognition system. To recognize a word, the system does not segment it into characters in advance; rather, it recognizes the input word by detecting a set of “shape primitives” on the word. It then matches the regions of the word (represented by the detected primitives) with a set of symbol models. A spatial arrangement of symbol models that are matched to regions of the word, then, becomes the description of the recognized word. Since the number of potential arrangements of all symbol models is combinatorially large, the system imposes a set of constraints that pertain to word structure and spatial consistency. The system searches the space made up of the arrangements that satisfy the constraints, and tries to maximize the a posteriori\/ probability of the arrangement of symbol models. We measure the accuracy of the system not only on words but on isolated characters as well. For isolated characters, it has a recognition rate of 99.7% for synthetically degraded symbols and 94.1% for scanned symbols. For isolated words the system has a recognition rate of 99.4% for noise-free words, 95.6% for synthetically degraded words, and 73% for scanned words.
Type of Medium: