[Bro] Debugging high-cpu on a single worker

Dave Crawford bro at pingtrip.com
Thu Feb 23 09:47:19 PST 2017


I’m seeing a repeatable issue where the first worker of ten always has a higher CPU usage.

 11  3014 bro        20   0 2078M 1250M  502M R 97.4  1.0 18h34:48 /opt/bro/bin/bro -i netmap::bro}0 -U .status -p broctl -p broctl-live -p
 14  3005 bro        20   0 1885M 1056M  501M R 83.5  0.8 13h09:44 /opt/bro/bin/bro -i netmap::bro}4 -U .status -p broctl -p broctl-live -p
 18  3013 bro        20   0 1746M  983M  501M R 67.3  0.8 13h25:41 /opt/bro/bin/bro -i netmap::bro}8 -U .status -p broctl -p broctl-live -p
 17  3009 bro        20   0 2139M  977M  502M S 63.5  0.8 13h11:37 /opt/bro/bin/bro -i netmap::bro}7 -U .status -p broctl -p broctl-live -p
 19  3008 bro        20   0 2168M 1333M  501M S 63.0  1.0 13h30:12 /opt/bro/bin/bro -i netmap::bro}9 -U .status -p broctl -p broctl-live -p
 13  3006 bro        20   0 2164M 1329M  502M S 62.5  1.0 13h16:05 /opt/bro/bin/bro -i netmap::bro}3 -U .status -p broctl -p broctl-live -p
 12  3012 bro        20   0 2194M 1287M  501M R 60.6  1.0 13h15:14 /opt/bro/bin/bro -i netmap::bro}2 -U .status -p broctl -p broctl-live -p
 15  3011 bro        20   0 2177M  982M  502M R 60.6  0.8 13h03:26 /opt/bro/bin/bro -i netmap::bro}5 -U .status -p broctl -p broctl-live -p
 16  3007 bro        20   0 2317M 1237M  502M R 60.1  1.0 13h12:35 /opt/bro/bin/bro -i netmap::bro}6 -U .status -p broctl -p broctl-live -p
 20  3010 bro        20   0 2328M  991M  501M R 58.2  0.8 13h14:40 /opt/bro/bin/bro -i netmap::bro}1 -U .status -p broctl -p broctl-live -p

Even after a fresh restart the “bro}0” worker is about double the CPU usage compared to the other workers:

 11  1799 bro        20   0 1422M  668M  501M S 73.5  0.5  0:12.28 /opt/bro/bin/bro -i netmap::bro}0 -U .status -p broctl -p broctl-live -p
 15  1813 bro        20   0 1420M  530M  364M S 37.9  0.4  0:07.04 /opt/bro/bin/bro -i netmap::bro}5 -U .status -p broctl -p broctl-live -p
 17  1831 bro        20   0 1420M  527M  362M R 37.5  0.4  0:06.92 /opt/bro/bin/bro -i netmap::bro}7 -U .status -p broctl -p broctl-live -p
 12  1805 bro        20   0 1424M  525M  356M R 36.5  0.4  0:07.04 /opt/bro/bin/bro -i netmap::bro}2 -U .status -p broctl -p broctl-live -p
 18  1830 bro        20   0 1420M  514M  350M R 35.1  0.4  0:06.67 /opt/bro/bin/bro -i netmap::bro}8 -U .status -p broctl -p broctl-live -p
 19  1826 bro        20   0 1421M  533M  367M S 34.6  0.4  0:07.15 /opt/bro/bin/bro -i netmap::bro}9 -U .status -p broctl -p broctl-live -p
 20  1832 bro        20   0 1421M  529M  363M R 34.6  0.4  0:07.06 /opt/bro/bin/bro -i netmap::bro}1 -U .status -p broctl -p broctl-live -p
 14  1827 bro        20   0 1420M  530M  365M S 34.6  0.4  0:06.87 /opt/bro/bin/bro -i netmap::bro}4 -U .status -p broctl -p broctl-live -p
 13  1802 bro        20   0 1418M  520M  356M S 34.6  0.4  0:06.79 /opt/bro/bin/bro -i netmap::bro}3 -U .status -p broctl -p broctl-live -p
 16  1824 bro        20   0 1417M  502M  339M R 33.1  0.4  0:06.60 /opt/bro/bin/bro -i netmap::bro}6 -U .status -p broctl -p broctl-live -p


A quick spot check of an hours worth of conn logs shows the traffic is balanced as expected:

MID_EXT-1 = 397479
MID_EXT-2 = 409771
...
MID_EXT-10 = 408699


The real issue is that its the only worker thats reporting capture_loss, which I assume is due to hitting 100% CPU usage:

1487866049.759038       600.000033      MID_EXT-2       4713    4710856 0.100046
1487866049.759333       600.000015      MID_EXT-9       5427    5193821 0.10449
1487866049.760596       600.000068      MID_EXT-5       5768    5548436 0.103957
1487866049.762628       600.000107      MID_EXT-8       5190    5224691 0.099336
1487866049.764048       600.000125      MID_EXT-3       5036    4947221 0.101795
1487866111.312071       600.000666      MID_EXT-6       8007    5959706 0.134352
1487866111.313803       600.000803      MID_EXT-4       6581    4981572 0.132107
1487866111.314788       600.000126      MID_EXT-7       6579    4905261 0.134121
1487866111.327390       600.000047      MID_EXT-10      7693    6369836 0.120772
1487866111.584908       600.000566      MID_EXT-1       823616  4227324 19.483153

Any tips on how to debug/profile further would be greatly appreciated.

Thanks!
-Dave
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ICSI.Berkeley.EDU/pipermail/bro/attachments/20170223/7ee2d280/attachment.html 


More information about the Bro mailing list