[Bro] BRO Logger crashing due to large DNS log files
Azoff, Justin S
jazoff at illinois.edu
Wed Aug 22 07:58:35 PDT 2018
> On Aug 22, 2018, at 10:48 AM, Ron McClellan <Ron_McClellan at ao.uscourts.gov> wrote:
>
> Justin,
>
> Got good news and solid progress with your help. BRO is running on both boxes and hasn't crashed since 10pm last night. If I read the data about NUMA from my systems, I don't really need to split the load between 2 workers as you did, right?
If you can get another NIC so each box has 2, then you could divide the workers between each NIC and NUMA node. Otherwise it doesn't matter so much.
> I'm working on tuning some now and also trying to address the really high lag (500) that I'm still seeing. Currently seeing some loss on it, but will continue to tune and see what if I can get that under control. Let me know if you need help testing the doctor script.
>
> Ron
>
> 1534948572.682908 900.000005 worker-1-8 60904 532647 11.434214
> 1534948572.692674 900.000072 worker-1-13 67152 216975 30.949188
> 1534948572.688750 900.000028 worker-1-18 70383 235710 29.859997
> 1534948572.705484 900.000037 worker-1-24 57008 201189 28.335545
> 1534948572.682147 900.000099 worker-1-5 61878 194825 31.760811
> 1534948572.699536 900.000061 worker-1-16 76385 256671 29.759887
> 1534948572.682829 900.000080 worker-1-29 52464 188150 27.884135
> 1534948572.683536 900.000049 worker-1-4 110222 314119 35.08925
>
> [root at aosoc current]# broctl netstats
> worker-1-1: 1534949053.166850 recvd=813997 dropped=0 link=813997
> worker-1-2: 1534949053.366803 recvd=873351 dropped=0 link=873353
> worker-1-3: 1534949053.567778 recvd=1770808 dropped=0 link=1770810
> worker-1-4: 1534949053.767852 recvd=865443 dropped=0 link=865449
> worker-1-5: 1534949053.968873 recvd=349355 dropped=0 link=349361
> worker-1-6: 1534949054.168785 recvd=1152160 dropped=0 link=1152161
> worker-1-7: 1534949054.368825 recvd=1358553 dropped=0 link=1358553
> worker-1-8: 1534949054.569808 recvd=345267 dropped=0 link=345272
> worker-1-9: 1534949054.769982 recvd=856725 dropped=0 link=856732
> worker-1-10: 1534949054.969811 recvd=351148 dropped=0 link=351148
> worker-1-11: 1534949055.170855 recvd=883897 dropped=0 link=883897
> worker-1-12: 1534949055.370950 recvd=820117 dropped=0 link=820125
> worker-1-13: 1534949055.571899 recvd=1132465 dropped=0 link=1132473
> worker-1-14: 1534949055.771751 recvd=823249 dropped=0 link=823249
> worker-1-15: 1534949055.972921 recvd=754342 dropped=0 link=754343
> worker-1-16: 1534949056.173778 recvd=822102 dropped=0 link=822106
> worker-1-17: 1534949056.373806 recvd=570905 dropped=0 link=570911
> worker-1-18: 1534949056.573815 recvd=1033845 dropped=0 link=1033846
> worker-1-19: 1534949056.774737 recvd=648977 dropped=0 link=649001
> worker-1-20: 1534949056.974823 recvd=816836 dropped=0 link=816838
> worker-1-21: 1534949057.175858 recvd=423896 dropped=0 link=423901
> worker-1-22: 1534949057.375894 recvd=761794 dropped=0 link=761796
> worker-1-23: 1534949057.576737 recvd=415151 dropped=0 link=415153
> worker-1-24: 1534949057.776887 recvd=604342 dropped=0 link=604349
> worker-1-25: 1534949057.978046 recvd=911772 dropped=0 link=911785
> worker-1-26: 1534949058.177749 recvd=358386 dropped=0 link=358395
> worker-1-27: 1534949058.379062 recvd=1283463 dropped=0 link=1283465
> worker-1-28: 1534949058.578751 recvd=364801 dropped=0 link=364807
> worker-1-29: 1534949058.778735 recvd=930041 dropped=0 link=930042
> worker-1-30: 1534949058.979938 recvd=857963 dropped=0 link=857967
If you're seeing a high percentage of capture loss but netstats is showing 0 dropped packets that means one of two things:
* Something still isn't right with the load balancing. It could be that your NIC isn't doing symmetric hashing properly.
* There's an issue with the traffic upstream of bro.
A bunch of the checks that bro-doctor does can help diagnose this, but you'd need to re-enable the conn.log
—
Justin Azoff
More information about the Bro
mailing list