[Bro] Workers fail to process traffic due to a PF_RING problem

Tritium Cat tritium.cat at gmail.com
Thu Mar 21 13:32:26 PDT 2013


Hi all,

I've noticed a problem with the cluster where some workers do not start.  I
can tell this by looking at the stats from broctl using "netstats",
"status", and "ps.bro".

It seems there is a problem with how PF_RING is used or PF_RING itself.
 (Or maybe my setup.)  Has anyone else encountered this problem and started
trying to isolate it ?

Restarting the worker seems to make it "work" again.  This phenomenon seems
to happen almost every time I start the cluster.  Some times, on some
nodes, more than one worker is affected.

If it helps to know I am using PF_RING from SVN 2013-03-19 and have
experienced the issue with all previous versions.  Bro is 2.1-380.

I searched the bug/problem tracker at http://tracker.bro.org/bro without
result.  If this is something not resolved by mailing list and worth
tracking in a ticket I will set it up.


Thanks,

--TC




Examples.  Via netstats, notice worker-1-4 is lame.


[BroControl] > netstats
worker-1-1: 1363895818.282736 recvd=29542788 dropped=4 link=29542788
worker-1-10: 1363895818.482747 recvd=20389244 dropped=1 link=20389244
worker-1-11: 1363895818.682289 recvd=24803977 dropped=1 link=24803977
worker-1-12: 1363895818.882953 recvd=28730644 dropped=1 link=28730644
worker-1-13: 1363895819.082850 recvd=19810612 dropped=0 link=19810612
worker-1-14: 1363895819.290962 recvd=22651710 dropped=0 link=22651710
worker-1-15: 1363895819.490876 recvd=27415776 dropped=0 link=27415776
worker-1-16: 1363895819.694541 recvd=21634742 dropped=0 link=21634742
worker-1-17: 1363895819.895422 recvd=20572973 dropped=0 link=20572973
worker-1-18: 1363895820.095018 recvd=25490613 dropped=2 link=25490613
worker-1-19: 1363895820.298648 recvd=19699362 dropped=0 link=19699362
worker-1-2: 1363895820.499099 recvd=23931030 dropped=1 link=23931030
worker-1-20: 1363895820.699632 recvd=21769411 dropped=0 link=21769411
worker-1-3: 1363895820.899525 recvd=21604270 dropped=1 link=21604270
worker-1-4: 1363895821.102857 recvd=0 dropped=0 link=0
worker-1-5: 1363895821.307124 recvd=22320056 dropped=0 link=22320056
(..cut..)


Find what PID worker-1-4 is using by checking broctl "status".

[BroControl] > status
Name       Type       Host       Status        Pid    Peers  Started
(...cut...)
worker-1-4 worker    10.1.1.1  running       17618  2      21 Mar 12:20:21
(...cut...)



Go check the PF_RING stats for PID 17618


root at bro:/home/bro# cat /proc/net/pf_ring/17618-eth5.9
Bound Device(s)    : eth5
Active             : 1
Breed              : Non-DNA
Sampling Rate      : 1
Capture Direction  : RX+TX
Socket Mode        : RX+TX
Appl. Name         : <unknown>
IP Defragment      : No
BPF Filtering      : Enabled
# Sw Filt. Rules   : 0
# Hw Filt. Rules   : 0
Poll Pkt Watermark : 1
Num Poll Calls     : 16161864
Channel Id Mask    : 0xFFFFFFFF
Cluster Id         : 20
Slot Version       : 15 [5.5.3]
Min Num Slots      : 6966
Bucket Len         : 9600
Slot Len           : 9632 [bucket+header]
Tot Memory         : 67108864
Tot Packets        : 0
Tot Pkt Lost       : 0
Tot Insert         : 0
Tot Read           : 0
Insert Offset      : 0
Remove Offset      : 0
TX: Send Ok        : 0
TX: Send Errors    : 0
Reflect: Fwd Ok    : 0
Reflect: Fwd Errors: 0
Num Free Slots     : 6966



No packets huh.  Must be something with how PF_RING is used or PF_RING
itself.  What does restarting the worker do ?


[BroControl] > restart worker-1-4
stopping ...
stopping worker-1-4 ...
starting ...
starting worker-1-4 ...

[BroControl] > status
Name       Type       Host       Status        Pid    Peers  Started
(...cut...)
worker-1-4 worker     10.1.1.1 running       18854  2      21 Mar 12:58:00
(...cut...)

[BroControl] > netstats
(...cut...)
worker-1-4: 1363896589.166826 recvd=6413632 dropped=112989 link=6413632
(...cut...)



On checking the PF_RING stats again it looks like things are working now.
 There was a brief moment of "dropped packets" during the restart but that
counter has not incremented since.


root at bro:/home/bro# cat /proc/net/pf_ring/18854-eth5.21
Bound Device(s)    : eth5
Active             : 1
Breed              : Non-DNA
Sampling Rate      : 1
Capture Direction  : RX+TX
Socket Mode        : RX+TX
Appl. Name         : <unknown>
IP Defragment      : No
BPF Filtering      : Enabled
# Sw Filt. Rules   : 0
# Hw Filt. Rules   : 0
Poll Pkt Watermark : 1
Num Poll Calls     : 6637605
Channel Id Mask    : 0xFFFFFFFF
Cluster Id         : 20
Slot Version       : 15 [5.5.3]
Min Num Slots      : 6966
Bucket Len         : 9600
Slot Len           : 9632 [bucket+header]
Tot Memory         : 67108864
Tot Packets        : 7711193
Tot Pkt Lost       : 112989
Tot Insert         : 7598204
Tot Read           : 7598197
Insert Offset      : 4454256
Remove Offset      : 4446288
TX: Send Ok        : 0
TX: Send Errors    : 0
Reflect: Fwd Ok    : 0
Reflect: Fwd Errors: 0
Num Free Slots     : 6959
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ICSI.Berkeley.EDU/pipermail/bro/attachments/20130321/50c5ec4b/attachment.html 


More information about the Bro mailing list