[Bro-Dev] bloomfilter_counting_init parameterization ?

Matthias Vallentin vallentin at icir.org
Tue May 3 07:18:24 PDT 2016


> global bloomfilter_counting_init: function(k: count , cells: count ,
> max: count , name: string &default=""): opaque of bloomfilter ;

The counting Bloomfilter is very similar to a regular Bloom filter,
except that the underlying bit vector now consists of "cells," i.e.,
sequences of bits that represents a counter. With 4 bits per cells, you
can count from 0 to 2^4-1 = 15. Consider this scenario:

    +----+----+----+----+----+
    |0101|0000|0010|0000|0010|
    +----+----+----+----+----+

In terms of variable names of the function signature, we have cells=5
and max=15. The number of hash functions to use, k, is independent of
the cell sizing. The first cell has counter value 5, the third and fifth
a value of 2. If you have k=3 hash functions and your hash function
yields those indexes, the counting Bloom filter computes the minimum
over all 3 cell values (=2) and returns that as the counter value.

Unlike for the basic Bloom filter, there is no closed-form solution to
the Bloom Error, i.e., the probability of a false positive occurring.
That's why the interface is a bit more low-level.

> What should be the length of the cells for storing 65536 IPs ? 

This really depends on two things: (1) the maximum counter value you
need, and (2) your memory constrains. The counting Bloom filter is a bit
more wasteful, especially for heavy-tailed distributions, which are very
common in network traffic. 

    Matthias


More information about the bro-dev mailing list