Diversity-aware strategies for static index pruning


Yigit-Sert S., ALTINGÖVDE İ. S., Ulusoy Ö.

Information Processing and Management, cilt.61, sa.5, 2024 (SCI-Expanded) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 61 Sayı: 5
  • Basım Tarihi: 2024
  • Doi Numarası: 10.1016/j.ipm.2024.103795
  • Dergi Adı: Information Processing and Management
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Social Sciences Citation Index (SSCI), Scopus, FRANCIS, Periodicals Index Online, ABI/INFORM, Applied Science & Technology Source, Business Source Elite, Business Source Premier, Communication Abstracts, Compendex, Computer & Applied Sciences, EBSCO Education Source, Education Abstracts, Information Science and Technology Abstracts, INSPEC, Library and Information Science Abstracts, Library Literature and Information Science, Library, Information Science & Technology Abstracts (LISTA), Linguistics & Language Behavior Abstracts, MLA - Modern Language Association Database, zbMATH
  • Anahtar Kelimeler: Query processing efficiency, Query result diversity, Static index pruning
  • Orta Doğu Teknik Üniversitesi Adresli: Evet

Özet

Static index pruning aims to remove redundant parts of an index to reduce the file size and query processing time. In this paper, we focus on the impact of index pruning on the topical diversity of query results obtained over these pruned indexes, due to the emergence of diversity as an important metric of quality in modern search systems. We hypothesize that typical index pruning strategies are likely to harm result diversity, as the latter dimension has been vastly overlooked while designing and evaluating such methods. As a remedy, we introduce three novel diversity-aware pruning strategies aimed at maintaining the diversity effectiveness of query results. In addition to other widely used features, our strategies exploit document clustering methods and word-embeddings to assess the possible impact of index elements on the topical diversity, and to guide the pruning process accordingly. Our thorough experimental evaluations verify that typical index pruning strategies lead to a substantial decline (i.e., up to 50% for some metrics) in the diversity of the results obtained over the pruned indexes. Our diversity-aware approaches remedy such losses to a great extent, and yield more diverse query results, for which scores of the various diversity metrics are closer to those obtained over the full index. Specifically, our best-performing strategy provides gains in result diversity reaching up to 2.9%, 3.0%, 7.5%, and 3.9% wrt. the strongest baseline, in terms of the ERR-IA, α-nDCG, P-IA, and ST-Recall metrics (at the cut-off value of 20), respectively. The proposed strategies also yield better scores in terms of an entropy-based fairness metric, confirming the correlation between topical diversity and fairness in this setup.