[Bro] Update on using PF_RING/TNAPI with Bro
Sunjeet Singh
sstattla at gmail.com
Wed Dec 15 13:41:36 PST 2010
Hi,
I have managed to get TNAPI/PF_RING configured and working with
PF_RING-aware libpcap. http://www.ntop.org/TNAPI.html
Looks like this will be very well suited to the Multiprocessing version
of Bro.
1. At the device driver level, RSS functionality (also, Flow Director in
Intel) allows packets multiplexed to different Receiver Queues (and also
allows packets belonging to a particular connection be sent to the same
RX_Queue) on an I/OAT-supporting network card.
2. By virtue of TNAPI, these multiple RX_Queues get polled concurrently
(by one kernel thread per queue), and sent to PF_RING (along with
information about which queue the packet came from).
3. PF_RING provides a user API which can be used by user-applications
like Bro to directly read from the multiple RX_Queues of a network
interface by using notation like eth0 at 1, eth0 at 2, etc. for RX_Queues 1
and 2 belonging to interface eth0. By assigning one thread to one
RX_Queue, we ensure that all packets from one connection are being
processed by the same core.
PF_RING and TNAPI can be used to drastically improve the performance of
any multiprocessing application, but need to be properly tuned and used
by the application. Performance stems from the fact that for Bro, the
packets can bypass the kernel's network stack altogether; one thread
polling per RX_Queue thanks to TNAPI; and PF_RING avoiding the mmap from
Kernel space to User space by directly copying payloads from RX_Queue rings.
Configuration wise, it took a bit of work to change Bro's configure
files to use a PF_RING-aware libpcap instead of the libpcap that Bro
ships with. When running TNAPI and PF_RING, there is a clear performance
improvement in the kernel's ability to receive packets at a higher
packet rate (results on the TNAPI website, I also verified). But using
PF_RING with the existing Bro leads to a performance degradation of Bro
because Bro runs on one user-thread, and when all these packets reach
user-space on different user-threads, they need to be processed by the
core that is running Bro. But from my knowledge on TNAPI/PF_RING and
intuition, multi-threaded Bro can be adapted to PF_RING and will lead to
huge gains in performance.
Here's the summary of results of a brief experiment that I performed on
a 8-core Intel Xeon with32 GB RAM running on Linux and with an Intel
82598EB 10Gbps ethernet card:
Goal: Compare conventional Bro installation against Bro with TNAPI and
PF_RING (I called it Bro-Ring)
Conclusion: Bro-Ring shows a performance drop.
Observations: The values in the table show for varying packet-rates, how
many packets were accepted by the machine running Bro (rest were lost).
Packets/sec Bro-Ring Bro
34000 1368791 1368003
50000 1368546 1367707
65,000 1368614
120000
1224761
130000
1168734
166000 596667
170000 561702
171000 681104
173000 618100
175000 740137
178000 864706
210000
753700
215000
728450
230000 494637
240000
636287
(Note: there was a difference in tcpreplay's input parameter packet-rate
and the actual packet rate achieved, so I could not supply exact values
for packet rate)
Sunjeet Singh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ICSI.Berkeley.EDU/pipermail/bro/attachments/20101215/e75e9ee8/attachment.html
More information about the Bro
mailing list