Document Reordering for Efficient Dynamic Pruning in Traditional and Learned Sparse Retrieval

Yafay, Erman; ALTINGÖVDE, İSMAİL

doi:10.1109/access.2026.3690983

Document Reordering for Efficient Dynamic Pruning in Traditional and Learned Sparse Retrieval

Yafay E., ALTINGÖVDE İ. S.

IEEE Access, cilt.14, ss.70175-70187, 2026 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 14
Basım Tarihi: 2026
Doi Numarası: 10.1109/access.2026.3690983
Dergi Adı: IEEE Access
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, INSPEC, Directory of Open Access Journals
Sayfa Sayıları: ss.70175-70187
Anahtar Kelimeler: dynamic pruning, Information retrieval, inverted index, neural retrieval, search engines
Orta Doğu Teknik Üniversitesi Adresli: Evet

Özet

Large scale Web search engines conduct first-stage retrieval using a traditional inverted index, on which they are expected to execute billions of queries on a very tight budget, i.e., less than 200 milliseconds. During query processing, the postings lists corresponding to each query term are traversed in parallel, i.e., applying document-at-a-time (DaaT) strategy, and a min-heap of top-k documents is employed to store the top-ranked documents. Dynamic pruning algorithms accelerate DaaT query processing by skipping documents that cannot enter the top-k results. Their efficiency, however, depends on how quickly the heap threshold increases during scoring. This paper explores an alternative route to faster pruning: rather than estimating the threshold in advance, we reorder documents within postings lists based on their frequency, position or score in past query results, allowing the threshold to rise more rapidly. We show that the tailored Hit Count (HC) reordering - derived from document frequency in historical query logs - achieves significant speedups for both traditional and learned sparse models. Specifically, HC provides up to a 1.12× speedup for BM25-based scoring on ClueWeb09B and SPLADE-based scoring on MS MARCO, with only a modest increase in index size. This demonstrates that our lightweight document reordering approach yields consistent efficiency gains in both traditional and neural information retrieval settings.