Semantic concept recognition from structured and unstructured inputs within cyber security domain


Tezin Türü: Yüksek Lisans

Tezin Yürütüldüğü Kurum: Orta Doğu Teknik Üniversitesi, Enformatik Enstitüsü, Bilişim Sistemleri Anabilim Dalı, Türkiye

Tezin Onay Tarihi: 2015

Öğrenci: ALP GÖKHAN HOŞSUCU

Danışman: NAZİFE BAYKAL

Özet:

Linked data initiative has been quite successful in terms of publishing and interlinking data over ontological structures. The success is due to answering semantically rich queries over highly structured data. The utilization of linked data structures are widely used in various domains to solve the problem of producing domain specific knowledge which can be interpreted by automated agents without any human interference. Cyber security field is one of the domains that suffer from the excessiveness of the raw data and lacking of the knowledge which constantly requires incorporation of subject matter experts in security analyzes or reasoning processes. The principle aim of this study is to propose an automated approach for cyber-security related knowledge base generation from scratch by utilizing from both structured and unstructured domain related data. The proposed approach is based on the automatic extraction of significant phrases and conversion of them into semantic concepts within the scope of already existing cyber security databases CWE, CPE, VVS and CCE. The system utilizes this raw data, differentiates the structured and unstructured parts which are processed in different modules for knowledge extraction. These concepts are represented in RDF format which includes all the relationships between entities to construct ontology for cyber security domain. To enhance the knowledge extraction process, NLP oriented approaches including Key Phrase Extraction methodologies are used and data augmentation techniques are applied to the concepts by interlinking them to the entities in Freebase and Wikipedia indexes. As a consequence of these operation series, a modular system is developed which is capable of extracting knowledge from the given cyber security related data. This accumulated knowledge constitutes a basis for cyber-security ontology which can be used for further vulnerability identification and prevention.