Points of interest (POI) extraction from social media


Tezin Türü: Yüksek Lisans

Tezin Yürütüldüğü Kurum: Orta Doğu Teknik Üniversitesi, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü, Türkiye

Tezin Onay Tarihi: 2017

Öğrenci: İSMAİL TALHA YILMAZ

Danışman: PINAR KARAGÖZ

Özet:

A point of interest (POI) is a particular location point that is useful or interesting for people such as restaurants, museums, parks and hotels. POIs are mostly used on location based social media applications, especially for place recommendation. Social media users share the places they like and discovering such new POIs has importance for understanding the taste and preference of city citizens and for understanding the city. Therefore, detecting which word can be POI in a social media message is an important problem. This process of retrieving POIs from a text is called POI extraction. In this work, we propose methods to extract POIs from microblogs. We explore both machine learning and artificial neural network based approaches. As machine learning approach, we use Conditional Random Fields (CRF) for sequential tagging. We investigate the effect of various additional features such as sentiment of tweets, POI density and population density of the location where the tweet was posted. We also use built-in features of CRF. As a hybrid approach, we generate word embeddings by Word2vec and apply K-Nearest Neighbors classification algorithm on the vectors constructed. Finally we construct a deep, feed-forward neural network to extract POIs from microblog text. These techniques are applied on a collection of tweets in Turkish, posted by users from Ankara. Experimental results show that CRF constructed with POI density feature outperforms CRF with other feature sets along with other neural network approaches in terms of POI extraction accuracy