Toponym Recognition in Social Media for Estimating the Location of Events


Sagcan M., KARAGÖZ P.

IEEE 15th International Conference on Data Mining Workshops (ICDMW), New-Jersey, Amerika Birleşik Devletleri, 14 - 17 Kasım 2015, ss.33-39 identifier identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Doi Numarası: 10.1109/icdmw.2015.167
  • Basıldığı Şehir: New-Jersey
  • Basıldığı Ülke: Amerika Birleşik Devletleri
  • Sayfa Sayıları: ss.33-39
  • Anahtar Kelimeler: toponym extraction, social media, location estimation, event
  • Orta Doğu Teknik Üniversitesi Adresli: Evet

Özet

Prominence of social media such as Twitter and Facebook led to a huge collection of data over which event detection provides useful results. An important dimension of event detection is location estimation for detected events. Social media provides a variety of clues for location, such as geographical annotation from smart devices, location field in the user profile and the content of the message. Among these clues, message content needs more effort for processing, yet it is generally more informative. In this paper, we focus on extraction of location names, i.e., toponym recognition, from social media messages. We propose a a hybrid system, which uses both rule based and machine learning based techniques to extract toponyms from tweets. Conditional Random Fields (CRF) is used as the machine learning tool and features such as Part-of-Speech tags and conjunction window are defined in order to construct a CRF model for toponym recognition. In the rule based part, regular expressions are used in order to define some of the toponym recognition patterns as well as to provide a simple level of normalization in order to handle the informality in the text. Experimental results show that the proposed method has higher toponym recognition ratio in comparison to the previous studies.