[Bro] Experiences with Bro and FreeBSD 8.2 Zerocopy BPF
jmellander at lbl.gov
Tue May 3 16:40:15 PDT 2011
We were eager to explore Zerocopy BPF and after making sure bro was fully
functional, we changed to 0-copy via:
We should have known we were in for trouble when tcpdump then immediately
began coredumping on exit. We installed the latest and greatest tcpdump and
libpcap (v 1.1.1) via FreeBSD ports, and had the same user-experience. The
following is offered in the hope that others may avoid the special type of
fun that we enjoyed - keep in mind this fun is only to be had when 0 copy is
1. As previously mentioned, tcpdump coredumps, gdb indicates that it
tried to call free() upon exit, presumably trying to free a kernel-owned
buffer. Didn't debug it any more, but it was a portent of things to come.
Later found a patch for this issue at
2. Bro failed to run with 0-copy - quite a bit of dithering indicated
that it was freezing at pcap_next(), which reads the next packet from the
3. Wrote a test program using pcap_next() - it fails under 0-copy after
several hundred packets. Well, since tcpdump does work (except for the
coredump), lets see what its doing:
4. tcpdump is working using pcap_next_ex() instead of pcap_next(), so I
wrote a replacement pcap_next() in terms of pcap_next_ex(), and it correctly
5. grafted replacement pcap_next() into bro, and the user experience was
the same :-(
6. Lots of debugging using various cutlery on bro, eventually libpcap
came into focus as a potential culprit
7. sliced and diced the 0-copy code inside of libpcap - found a few
places where improvements could be made (but that's a different story),
which gave quite a bit of insight into its innerds - here's a presentation
8. Ran bro with a simplified policy of just conn, tcp & vlan - (our
packets at this point in our network are vlan tagged) - it worked!
9. Ran again with our policy, it freezes!
10. After a somewhat binary search of policy, discovered that remote.bro
causes zero-copy to freeze. So, after all that, it turns out that bro works
with libpcap-1.1.1 on 0-copy, but it took a lot to figure that out.
So turning off the remote communication fixes the issue in the short term,
but doesn't solve it for us, since broctl uses the same mechanism :-(
Haven't finished debugging yet, but it appears that broccoli may be causing
the issue on 0-copy - when it becomes clearer, I will send more.
This is written in the hopes that folks won't be tearing their hair out,
like us, as they go forward in this direction. If anyone has any
suggestions, etc. (particularly in going forward with solving this problem),
I would appreciate it.
Hope this helps,
BTW - it appears on zero copy that net.bpf.maxbufsize & net.bpf.bufsize are
limited to 2 megs in size - they can be bigger but apparently it won't be
used, per netstat -B, which is your friend when debugging these issues.
BTW #2: zerocopy seems to be worth doing, especially at high bandwidth's
that we're moving up to, so its important to us to solve this.
BTW #3: the problem doesn't just manifest on a hi-speed link - I pointed Bro
towards our management port (100M), and it failed in the same way, so its
not a capacity issue.
BTW #4: There's no special config other than setting the sysctl to turn on
0-copy - libpcap detects that it is running 0-copy and follows a different
code path, but the API is the same - except that the issue we've been having
(and the coredump of tcpdump) indicates that 0-copy is not quite fully
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Bro