A Fast and Accurate Hardware String Matching Module with Bloom Filters

Zengin, Salih; SCHMİDT, ŞENAN

doi:10.1109/tpds.2016.2577033

A Fast and Accurate Hardware String Matching Module with Bloom Filters

Atıf İçin Kopyala

Zengin S., SCHMİDT Ş. E.

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, cilt.28, sa.2, ss.305-317, 2017 (SCI-Expanded)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 28 Sayı: 2
Basım Tarihi: 2017
Doi Numarası: 10.1109/tpds.2016.2577033
Dergi Adı: IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus
Sayfa Sayıları: ss.305-317
Orta Doğu Teknik Üniversitesi Adresli: Evet

Özet

Many fields of computing such as Deep Packet Inspection (DPI) employ string matching modules (SMM) that search for a given set of positive strings in their input. An SMM is expected to produce correct outcomes while scanning the input data at high rates. Furthermore the string sets that are searched for are usually large and their sizes increase steadily. Bloom Filters (BFs) are hashing data structures which are fast but their false positive results require further processing. That is, their speed can be exploited for Standard Bloom Filter SMMs (SBFs) as long as the positive probability is low. Multiple BFs in parallel can further increase the throughput. In this paper, we propose the Double Bloom Filter SMM (DBF) which achieves a higher throughput than the SBF and maintains a high throughput even for large positive probabilities. The second Bloom Filter of DBF stores a small enough subset of the positive strings such that its false positive probability is approximately zero. We develop an analytical model of the DBF and show that the throughput advantage of DBF over SBF becomes more prominent if the positive probability and the fraction of matches in the second Bloom Filter increase. Accordingly, we propose a heuristic algorithm that stores the strings that are more frequently matched in the second Bloom Filter according to localities identified in the input. Our numerical results are obtained using realistic values from an FPGA implementation and are validated by SystemC simulations.