[Bro] Debugging high-cpu on a single worker
Dave Crawford
bro at pingtrip.com
Thu Feb 23 09:47:19 PST 2017
I’m seeing a repeatable issue where the first worker of ten always has a higher CPU usage.
11 3014 bro 20 0 2078M 1250M 502M R 97.4 1.0 18h34:48 /opt/bro/bin/bro -i netmap::bro}0 -U .status -p broctl -p broctl-live -p
14 3005 bro 20 0 1885M 1056M 501M R 83.5 0.8 13h09:44 /opt/bro/bin/bro -i netmap::bro}4 -U .status -p broctl -p broctl-live -p
18 3013 bro 20 0 1746M 983M 501M R 67.3 0.8 13h25:41 /opt/bro/bin/bro -i netmap::bro}8 -U .status -p broctl -p broctl-live -p
17 3009 bro 20 0 2139M 977M 502M S 63.5 0.8 13h11:37 /opt/bro/bin/bro -i netmap::bro}7 -U .status -p broctl -p broctl-live -p
19 3008 bro 20 0 2168M 1333M 501M S 63.0 1.0 13h30:12 /opt/bro/bin/bro -i netmap::bro}9 -U .status -p broctl -p broctl-live -p
13 3006 bro 20 0 2164M 1329M 502M S 62.5 1.0 13h16:05 /opt/bro/bin/bro -i netmap::bro}3 -U .status -p broctl -p broctl-live -p
12 3012 bro 20 0 2194M 1287M 501M R 60.6 1.0 13h15:14 /opt/bro/bin/bro -i netmap::bro}2 -U .status -p broctl -p broctl-live -p
15 3011 bro 20 0 2177M 982M 502M R 60.6 0.8 13h03:26 /opt/bro/bin/bro -i netmap::bro}5 -U .status -p broctl -p broctl-live -p
16 3007 bro 20 0 2317M 1237M 502M R 60.1 1.0 13h12:35 /opt/bro/bin/bro -i netmap::bro}6 -U .status -p broctl -p broctl-live -p
20 3010 bro 20 0 2328M 991M 501M R 58.2 0.8 13h14:40 /opt/bro/bin/bro -i netmap::bro}1 -U .status -p broctl -p broctl-live -p
Even after a fresh restart the “bro}0” worker is about double the CPU usage compared to the other workers:
11 1799 bro 20 0 1422M 668M 501M S 73.5 0.5 0:12.28 /opt/bro/bin/bro -i netmap::bro}0 -U .status -p broctl -p broctl-live -p
15 1813 bro 20 0 1420M 530M 364M S 37.9 0.4 0:07.04 /opt/bro/bin/bro -i netmap::bro}5 -U .status -p broctl -p broctl-live -p
17 1831 bro 20 0 1420M 527M 362M R 37.5 0.4 0:06.92 /opt/bro/bin/bro -i netmap::bro}7 -U .status -p broctl -p broctl-live -p
12 1805 bro 20 0 1424M 525M 356M R 36.5 0.4 0:07.04 /opt/bro/bin/bro -i netmap::bro}2 -U .status -p broctl -p broctl-live -p
18 1830 bro 20 0 1420M 514M 350M R 35.1 0.4 0:06.67 /opt/bro/bin/bro -i netmap::bro}8 -U .status -p broctl -p broctl-live -p
19 1826 bro 20 0 1421M 533M 367M S 34.6 0.4 0:07.15 /opt/bro/bin/bro -i netmap::bro}9 -U .status -p broctl -p broctl-live -p
20 1832 bro 20 0 1421M 529M 363M R 34.6 0.4 0:07.06 /opt/bro/bin/bro -i netmap::bro}1 -U .status -p broctl -p broctl-live -p
14 1827 bro 20 0 1420M 530M 365M S 34.6 0.4 0:06.87 /opt/bro/bin/bro -i netmap::bro}4 -U .status -p broctl -p broctl-live -p
13 1802 bro 20 0 1418M 520M 356M S 34.6 0.4 0:06.79 /opt/bro/bin/bro -i netmap::bro}3 -U .status -p broctl -p broctl-live -p
16 1824 bro 20 0 1417M 502M 339M R 33.1 0.4 0:06.60 /opt/bro/bin/bro -i netmap::bro}6 -U .status -p broctl -p broctl-live -p
A quick spot check of an hours worth of conn logs shows the traffic is balanced as expected:
MID_EXT-1 = 397479
MID_EXT-2 = 409771
...
MID_EXT-10 = 408699
The real issue is that its the only worker thats reporting capture_loss, which I assume is due to hitting 100% CPU usage:
1487866049.759038 600.000033 MID_EXT-2 4713 4710856 0.100046
1487866049.759333 600.000015 MID_EXT-9 5427 5193821 0.10449
1487866049.760596 600.000068 MID_EXT-5 5768 5548436 0.103957
1487866049.762628 600.000107 MID_EXT-8 5190 5224691 0.099336
1487866049.764048 600.000125 MID_EXT-3 5036 4947221 0.101795
1487866111.312071 600.000666 MID_EXT-6 8007 5959706 0.134352
1487866111.313803 600.000803 MID_EXT-4 6581 4981572 0.132107
1487866111.314788 600.000126 MID_EXT-7 6579 4905261 0.134121
1487866111.327390 600.000047 MID_EXT-10 7693 6369836 0.120772
1487866111.584908 600.000566 MID_EXT-1 823616 4227324 19.483153
Any tips on how to debug/profile further would be greatly appreciated.
Thanks!
-Dave
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ICSI.Berkeley.EDU/pipermail/bro/attachments/20170223/7ee2d280/attachment.html
More information about the Bro
mailing list