SPAM DETECTION BY USING NETWORK AND TEXT EMBEDDING APPROACHES


Tezin Türü: Yüksek Lisans

Tezin Yürütüldüğü Kurum: Orta Doğu Teknik Üniversitesi, Mühendislik Fakültesi, Endüstri Mühendisliği Bölümü, Türkiye

Tezin Onay Tarihi: 2019

Tezin Dili: İngilizce

Öğrenci: CENNET MERVE YILMAZ

Eş Danışman: Ahmet Onur Durahim

Danışman: Cem İyigün

Özet:

Authenticity and reliability of the information spread over the cyberspace is becoming increasingly important, especially in e-commerce. This is because potential customers check reviews and customer feedbacks online before making a purchasing decision. Although this information is easily accessible through related websites, lack of verification of the authenticity of these reviews raises concerns about their reliability. Besides, fraudulent users disseminate disinformation to deceive people into acting against their interest. So, detection of fake and unreliable reviews is a crucial problem that must be addressed. In this study, we analyze and compare three different spam review detection approaches, DocRep, NodeRep and SPR2EP, that utilize review text only, network information only and the one that is proposed in this study that incorporates knowledge extracted from the textual content of the reviews with information obtained by exploiting the underlying reviewer-product network structure, respectively. One of the important contributions of this study is the proposed framework, SPR2EP, is that it benefits from both review text and network information. In SPR2EP approach, first, feature vectors are learned for each review, reviewer and product by utilizing state-of-the-art algorithms developed for learning document and node embeddings, and then these are fed into a classifier to identify opinion spam. It minimizes the feature engineering effort. The effectiveness of our framework approaches that utilize network embeddings over existing techniques on detecting spam reviews is demonstrated in three different data sets containing online reviews.