IEEE 15th International Conference on Data Mining Workshops (ICDMW), New-Jersey, Amerika Birleşik Devletleri, 14 - 17 Kasım 2015, ss.33-39
Prominence of social media such as Twitter and Facebook led to a huge collection of data over which event detection provides useful results. An important dimension of event detection is location estimation for detected events. Social media provides a variety of clues for location, such as geographical annotation from smart devices, location field in the user profile and the content of the message. Among these clues, message content needs more effort for processing, yet it is generally more informative. In this paper, we focus on extraction of location names, i.e., toponym recognition, from social media messages. We propose a a hybrid system, which uses both rule based and machine learning based techniques to extract toponyms from tweets. Conditional Random Fields (CRF) is used as the machine learning tool and features such as Part-of-Speech tags and conjunction window are defined in order to construct a CRF model for toponym recognition. In the rule based part, regular expressions are used in order to define some of the toponym recognition patterns as well as to provide a simple level of normalization in order to handle the informality in the text. Experimental results show that the proposed method has higher toponym recognition ratio in comparison to the previous studies.