Uniformity and Independence of H3 Hash Functions for Bloom Filters


Koltuk F., SCHMİDT Ş. E.

IEEE Transactions on Computers, cilt.73, sa.8, ss.1913-1923, 2024 (SCI-Expanded) identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 73 Sayı: 8
  • Basım Tarihi: 2024
  • Doi Numarası: 10.1109/tc.2024.3398426
  • Dergi Adı: IEEE Transactions on Computers
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, Aerospace Database, Applied Science & Technology Source, Business Source Elite, Business Source Premier, Communication Abstracts, Compendex, Computer & Applied Sciences, INSPEC, Metadex, zbMATH, Civil Engineering Abstracts
  • Sayfa Sayıları: ss.1913-1923
  • Anahtar Kelimeler: Bloom filters, hash function uniformity and independence, universal hash functions, vector subspaces in GF(2)
  • Orta Doğu Teknik Üniversitesi Adresli: Evet

Özet

In this paper, we investigate the effects of violating the conditions of hash function uniformity and/or independence on the false positive probability of Bloom Filters (BF). To this end, we focus on hash functions of the H3 family with a partitioned memory organization for fast hardware implementations of BFs. We first introduce a dependence metric that quantifies hash function uniformity and independence. We then state and prove the necessary and sufficient conditions on the BF parameters for constructing uniform and independent hash functions. Finally, we derive an analytical expression for the exact false positive probability of a BF with hash functions that are not necessarily uniform or independent. We verify our expression with a hardware test bench and explore the effects of losing uniformity and independence through an experimental study that systematically sweeps different dependence metric values and numbers of hash functions. We demonstrate the effects of violating hash function uniformity and independence on the stated target false positive probability for selected previous works in the literature. As an important finding, we show that uniformity of individual hash functions is essential, whereas limited dependencies between hash functions can be tolerated without a negative effect on the false positive probability.