Characterizing, Predicting, and Handling Web Search Queries That Match Very Few or No Results

Sarigil, Erdem; Altingovde, İSMAİL; BLANCO, Roi; Barla Cambazoglu, B.; ÖZCAN, Rifat; Ulusoy, Ozgur

doi:10.1002/asi.23955

Characterizing, Predicting, and Handling Web Search Queries That Match Very Few or No Results

Sarigil E., Altingovde I. S., BLANCO R., Barla Cambazoglu B., ÖZCAN R., Ulusoy O.

JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, cilt.69, sa.2, ss.256-270, 2018 (SCI-Expanded, SSCI, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 69 Sayı: 2
Basım Tarihi: 2018
Doi Numarası: 10.1002/asi.23955
Dergi Adı: JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Social Sciences Citation Index (SSCI), Scopus
Sayfa Sayıları: ss.256-270
Açık Arşiv Koleksiyonu: AVESİS Açık Erişim Koleksiyonu
Orta Doğu Teknik Üniversitesi Adresli: Evet

Özet

A non-negligible fraction of user queries end up with very few or even no matching results in leading commercial web search engines. In this work, we provide a detailed characterization of such queries and show that search engines try to improve such queries by showing the results of related queries. Through a user study, we show that these query suggestions are usually perceived as relevant. Also, through a query log analysis, we show that the users are dissatisfied after submitting a query that match no results at least 88.5% of the time. As a first step towards solving these no-answer queries, we devised a large number of features that can be used to identify such queries and built machine-learning models. These models can be useful for scenarios such as the mobile- or meta-search, where identifying a query that will retrieve no results at the client device (i.e., even before submitting it to the search engine) may yield gains in terms of the bandwidth usage, power consumption, and/or monetary costs. Experiments over query logs indicate that, despite the heavy skew in class sizes, our models achieve good prediction quality, with accuracy (in terms of area under the curve) up to 0.95.