[Bro] strategies for ingesting large number of PCAP files
M. Aaron Bossert
mabossert at gmail.com
Mon Jul 31 13:50:07 PDT 2017
All,
I am working with a storm topology to process a large number of PCAP files
which can be of variable sizes, but tend to be in the range of 100MB to
200MB, give or take. My current batch to work on contains about 42K
files...I am aiming to process with as much parallelism as possible while
avoiding the issue of sessions that span more than one file (so you know
why I am doing this)
my main constraints/focus:
1. take advantage of large number of cores (56) and RAM (~750GB) on my
node(s)
2. Avoid disk as much as possible (I have relatively slow spinning
disks, though quite a few of them that can be addressed individually, which
could mitigate the disk IO bottleneck to some degree)
3. Prioritize completeness above all else...get as many sessions
reconstructed as possible by stitching the packets back together in one of
the ways below...or another if you folks have a better idea...
my thinking...and hope for suggestions on the best approach...or a
completely different one if you have a better solution:
1. run mergecap and setup bro to run as a cluster and hope for the best
1. *upside*: relatively simple and lowest level of effort
2. *downside*: not sure it will scale the way I want. I'd prefer to
isolate Bro to running on no more than two nodes in my
cluster...each node
has 56 cores and ~750GB RAM. Also, it will be one more hack to have to
work into my Storm topology
2. use Storm topology (or something else) to re-write packets to
individual files based on SIP/DIP/SPORT/DPORT or similar
1. *upside*: this will ensure a certain level of parallelism and keep
the logic inside my topology where I can control it to the greatest extent
2. *downside*: This seems like it is horribly inefficient because I
will have to read the PCAP files twice: once to split and once again when
Bro get them, and again to read the Bro logs (if I don't get the Kafka
plugins to do what I want). Also, this will require some sort of load
balancing to ensure that IP's that represent a disproportionate
percentage
of traffic don't gum up the works, nor do IP's that have
relatively little
traffic take up too many resources. My thought here is to simply keep
track of approximate file sizes and send IP's in rough balance (though
still always sending any given IP/port pair to the same file).
Also, this
makes me interact with the HDD's at least three times (once to read PCAP,
next to write PCAP, again to read Bro logs, which is undesirable)
3. Use Storm topology or TCP replay (or similar) to read in PCAP files,
then write to virtual interfaces (a pool setup manually) so that Bro can
simply listen on each interface and process as appropriate.
1. *upside*: Seems like this could be the most efficient option as it
probably avoids disk the most, seems like it could scale very well, and
would support clustering by simply creating pools of interfaces
on multiple
nodes, session-ization takes care of itself and I just need to
tell Bro to
wait longer for packets to show up so it doesn't think the interface went
dead if there are lulls is traffic
2. *downside*: Most complex of the bunch and I am uncertain of my
ability to preserve timestamps when sending the packets over the
interface
to Bro
4. Extend Bro to not only write directly to Kafka topics, but also to
read from them such that I could use one of the methods above to split
traffic and load balance and then have Bro simply spit out logs to another
topic of my choosing
1. *upside*: This could be the most elegant solution because it will
allow me to handle failures and hiccups using Kafka offsets
2. *downside*: This is easily the most difficult to implement for me
as I have not messed with extending Bro at all.
Any suggestions or feedback would be greatly appreciated! Thanks in
advance...
Aaron
P.S. sorry for the verbose message...but was hoping to give as complete a
problem/solution statement as I can
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ICSI.Berkeley.EDU/pipermail/bro/attachments/20170731/e8ddea26/attachment.html
More information about the Bro
mailing list