A hierarchical representation of form documents for identification and retrieval


Duygulu P., Atalay V.

International Journal on Document Analysis and Recognition, vol.5, no.1, pp.17-27, 2003 (Journal Indexed in SCI Expanded) identifier

  • Publication Type: Article / Article
  • Volume: 5 Issue: 1
  • Publication Date: 2003
  • Doi Number: 10.1007/s100320100077
  • Title of Journal : International Journal on Document Analysis and Recognition
  • Page Numbers: pp.17-27

Abstract

In this paper, we present a logical representation for form documents to be used for identification and retrieval. A hierarchical structure is proposed to represent the structure of a form by using lines and the XY-tree approach. The approach is top-down and no domain knowledge such as the preprinted data or filled-in data is used. Geometrical modifications and slight variations are handled by this representation. Logically identical forms are associated to the same or similar hierarchical structure. Identification and the retrieval of similar forms are performed by computing the edit distances between the generated trees. © 2002 Springer-Verlag Berlin Heidelberg.