[Bro] Update on using PF_RING/TNAPI with Bro

Wed Dec 15 13:41:36 PST 2010

Hi,

I have managed to get TNAPI/PF_RING configured and working with 
PF_RING-aware libpcap. http://www.ntop.org/TNAPI.html
Looks like this will be very well suited to the Multiprocessing version 
of Bro.

1. At the device driver level, RSS functionality (also, Flow Director in 
Intel) allows packets multiplexed to different Receiver Queues (and also 
allows packets belonging to a particular connection be sent to the same 
RX_Queue) on an I/OAT-supporting network card.
2. By virtue of TNAPI, these multiple RX_Queues get polled concurrently 
(by one kernel thread per queue), and sent to PF_RING (along with 
information about which queue the packet came from).
3. PF_RING provides a user API which can be used by user-applications 
like Bro to directly read from the multiple RX_Queues of a network 
interface by using notation like eth0 at 1, eth0 at 2, etc. for RX_Queues 1 
and 2 belonging to interface eth0. By assigning one thread to one 
RX_Queue, we ensure that all packets from one connection are being 
processed by the same core.

PF_RING and TNAPI can be used to drastically improve the performance of 
any multiprocessing application, but need to be properly tuned and used 
by the application. Performance stems from the fact that for Bro, the 
packets can bypass the kernel's network stack altogether; one thread 
polling per RX_Queue thanks to TNAPI; and PF_RING avoiding the mmap from 
Kernel space to User space by directly copying payloads from RX_Queue rings.

Configuration wise, it took a bit of work to change Bro's configure 
files to use a PF_RING-aware libpcap instead of the libpcap that Bro 
ships with. When running TNAPI and PF_RING, there is a clear performance 
improvement in the kernel's ability to receive packets at a higher 
packet rate (results on the TNAPI website, I also verified). But using 
PF_RING with the existing Bro leads to a performance degradation of Bro 
because Bro runs on one user-thread, and when all these packets reach 
user-space on different user-threads, they need to be processed by the 
core that is running Bro. But from my knowledge on TNAPI/PF_RING and 
intuition, multi-threaded Bro can be adapted to PF_RING and will lead to 
huge gains in performance.

Here's the summary of results of a brief experiment that I performed on 
a 8-core Intel Xeon with32 GB RAM running on Linux and with an Intel 
82598EB 10Gbps ethernet card:

Goal: Compare conventional Bro installation against Bro with TNAPI and 
PF_RING (I called it Bro-Ring)
Conclusion: Bro-Ring shows a performance drop.
Observations: The values in the table show for varying packet-rates, how 
many packets were accepted by the machine running Bro (rest were lost).

Packets/sec 	Bro-Ring 	Bro
34000 	1368791 	1368003
50000 	1368546 	1367707
65,000 	1368614 	
120000 	
	1224761
130000 	
	1168734
166000 	596667 	
170000 	561702 	
171000 	681104 	
173000 	618100 	
175000 	740137 	
178000 	864706 	
210000 	
	753700
215000 	
	728450
230000 	494637 	
240000 	
	636287

(Note: there was a difference in tcpreplay's input parameter packet-rate 
and the actual packet rate achieved, so I could not supply exact values 
for packet rate)

Sunjeet Singh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ICSI.Berkeley.EDU/pipermail/bro/attachments/20101215/e75e9ee8/attachment.html