Enhancing content management systems with semantic capabilities


Tezin Türü: Yüksek Lisans

Tezin Yürütüldüğü Kurum: Orta Doğu Teknik Üniversitesi, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü, Türkiye

Tezin Onay Tarihi: 2012

Öğrenci: SUAT GÖNÜL

Danışman: FEHİME NİHAN ÇİÇEKLİ

Özet:

Content Management Systems (CMS) generally store data in a way that the content is distributed among several relational database tables or stored in files as a whole without any distinctive characteristics. These storage mechanisms cannot provide the management of semantic information about the data. They lack semantic retrieval, search and browsing of the stored content. To enhance non-semantic CMSes with advanced semantic features, the semantics within the CMS itself and additional semantic information related with the actual managed content should also be taken into account. However, extracting implicit knowledge from the legacy CMSes, lifting to a semantic content management system environment and providing semantic operations on the content is a challenging task which includes adoption of several latest advancements in information extraction (IE), information retrieval (IR) and Semantic Web areas. In this study, we propose an integrative approach including automatic lifting of content from legacy systems, automatic annotation of data with the information retrieved from the Linked Open Data (LOD) cloud and several semantic operations on the content in terms of storage and search. We use a simple RDF path language to create custom, semantic indexes and filter annotations obtained from LOD cloud in a way that is eligible for specific use cases. Filtered annotations are materialized along with the actual content of document in dedicated indexes. This semantix indexing infrastructure allows semantically meaningful search facilities on top of it. We realize our approach in the scope of Apache Stanbol project, which is a subproject developed in the scope of IKS project, by focusing on document storage and retrival parts of it. We evaluate our approach in healthcare domain with different domain ontologies (SNOMED/CT, ART, RXNORM) in addition to DBpedia as parts of LOD cloud which are used annotate documents and content obtained from different health portals.