Integrated segmentation and recognition of connected Ottoman script


Creative Commons License

Yalniz I. Z., Altingovde İ. S., Gudukbay U., Ulusoy O.

OPTICAL ENGINEERING, cilt.48, sa.11, 2009 (SCI-Expanded) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 48 Sayı: 11
  • Basım Tarihi: 2009
  • Doi Numarası: 10.1117/1.3262346
  • Dergi Adı: OPTICAL ENGINEERING
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus
  • Anahtar Kelimeler: optical character recognition, historical document analysis, connected scripts, information retrieval, ARABIC CHARACTER-RECOGNITION, INVERTED FILES, RETRIEVAL, DOCUMENTS
  • Orta Doğu Teknik Üniversitesi Adresli: Hayır

Özet

We propose a novel context-sensitive segmentation and recognition method for connected letters in Ottoman script. This method first extracts a set of segments from a connected script and determines the candidate letters to which extracted segments are most similar. Next, a function is defined for scoring each different syntactically correct sequence of these candidate letters. To find the candidate letter sequence that maximizes the score function, a directed acyclic graph is constructed. The letters are finally recognized by computing the longest path in this graph. Experiments using a collection of printed Ottoman documents reveal that the proposed method provides >90% precision and recall figures in terms of character recognition. In a further set of experiments, we also demonstrate that the framework can be used as a building block for an information retrieval system for digital Ottoman archives. (C) 2009 Society of Photo-Optical Instrumentation Engineers. [DOI: 10.1117/1.3262346]