Faster Dynamic Pruning via Reordering of Documents in Inverted Indexes


Yafay E., Altingövde I. S.

46th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2023, Taipei, Taiwan, 23 - 27 July 2023, pp.2001-2005 identifier identifier

  • Publication Type: Conference Paper / Full Text
  • Doi Number: 10.1145/3539618.3591987
  • City: Taipei
  • Country: Taiwan
  • Page Numbers: pp.2001-2005
  • Keywords: dynamic pruning, inverted index, search engines
  • Middle East Technical University Affiliated: Yes

Abstract

Widely used dynamic pruning algorithms (such as MaxScore, WAND and BMW) keep track of the k-th highest score (i.e., heap threshold) among the documents that are scored so far, to avoid scoring the documents that cannot get into the top-k result list. Obviously, the faster the heap threshold converges to its final value, the larger will be the number of skipped documents and hence, the efficiency gains of the pruning algorithms. In this paper, we tailor approaches that reorder the documents in the inverted index based on their access counts and ranks for previous queries. By storing such frequently retrieved documents at front of the postings lists, we aim to compute the heap threshold earlier during the query processing. Our approach yields substantial speedups (up to 1.33x) for all three dynamic pruning algorithms and outperforms two strong baselines that have been employed for document reordering in the literature.