A hierarchical representation of form documents for identification and retrieval


Duygulu P., Atalay V.

International Journal on Document Analysis and Recognition, vol.5, no.1, pp.17-27, 2003 (SCI-Expanded) identifier

  • Publication Type: Article / Article
  • Volume: 5 Issue: 1
  • Publication Date: 2003
  • Doi Number: 10.1007/s100320100077
  • Journal Name: International Journal on Document Analysis and Recognition
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, PASCAL, Applied Science & Technology Source, Compendex, Computer & Applied Sciences, INSPEC
  • Page Numbers: pp.17-27
  • Middle East Technical University Affiliated: Yes

Abstract

In this paper, we present a logical representation for form documents to be used for identification and retrieval. A hierarchical structure is proposed to represent the structure of a form by using lines and the XY-tree approach. The approach is top-down and no domain knowledge such as the preprinted data or filled-in data is used. Geometrical modifications and slight variations are handled by this representation. Logically identical forms are associated to the same or similar hierarchical structure. Identification and the retrieval of similar forms are performed by computing the edit distances between the generated trees. © 2002 Springer-Verlag Berlin Heidelberg.