[Bro-Dev] bloomfilter_counting_init parameterization ?
Matthias Vallentin
vallentin at icir.org
Tue May 3 07:18:24 PDT 2016
> global bloomfilter_counting_init: function(k: count , cells: count ,
> max: count , name: string &default=""): opaque of bloomfilter ;
The counting Bloomfilter is very similar to a regular Bloom filter,
except that the underlying bit vector now consists of "cells," i.e.,
sequences of bits that represents a counter. With 4 bits per cells, you
can count from 0 to 2^4-1 = 15. Consider this scenario:
+----+----+----+----+----+
|0101|0000|0010|0000|0010|
+----+----+----+----+----+
In terms of variable names of the function signature, we have cells=5
and max=15. The number of hash functions to use, k, is independent of
the cell sizing. The first cell has counter value 5, the third and fifth
a value of 2. If you have k=3 hash functions and your hash function
yields those indexes, the counting Bloom filter computes the minimum
over all 3 cell values (=2) and returns that as the counter value.
Unlike for the basic Bloom filter, there is no closed-form solution to
the Bloom Error, i.e., the probability of a false positive occurring.
That's why the interface is a bit more low-level.
> What should be the length of the cells for storing 65536 IPs ?
This really depends on two things: (1) the maximum counter value you
need, and (2) your memory constrains. The counting Bloom filter is a bit
more wasteful, especially for heavy-tailed distributions, which are very
common in network traffic.
Matthias
More information about the bro-dev
mailing list