[Bro] BRO Logger crashing due to large DNS log files
Ron McClellan
Ron_McClellan at ao.uscourts.gov
Wed Aug 22 07:48:01 PDT 2018
Justin,
Got good news and solid progress with your help. BRO is running on both boxes and hasn't crashed since 10pm last night. If I read the data about NUMA from my systems, I don't really need to split the load between 2 workers as you did, right? I'm working on tuning some now and also trying to address the really high lag (500) that I'm still seeing. Currently seeing some loss on it, but will continue to tune and see what if I can get that under control. Let me know if you need help testing the doctor script.
Ron
# cat capture_loss.log
#separator \x09
#set_separator ,
#empty_field (empty)
#unset_field -
#path capture_loss
#open 2018-08-22-10-01-21
#fields ts ts_delta peer gaps acks percent_lost
#types time interval string count count double
1534946481.938006 900.000084 worker-1-20 33 696 4.741379
1534946481.941548 900.000000 worker-1-24 20 2722 0.734754
1534946481.938533 900.000059 worker-1-21 630 40222 1.566307
1534946481.938396 900.000070 worker-1-9 89 1470 6.054422
1534946481.941452 900.000044 worker-1-8 156 1821 8.566722
1534946481.941323 900.000017 worker-1-12 1062 232679 0.456423
1534946481.939547 900.000037 worker-1-27 1023 216063 0.473473
1534946481.937269 900.000040 worker-1-10 749 5465 13.705398
1534946481.937517 900.000111 worker-1-3 87 15720 0.553435
1534946481.941367 900.000649 worker-1-16 117 2187 5.349794
1534946481.939451 900.000079 worker-1-7 870 195358 0.445336
1534946481.940450 900.000041 worker-1-5 111 626 17.731629
1534946481.931345 900.000019 worker-1-4 44 885 4.971751
1534946481.941268 900.000074 worker-1-17 131 1641 7.982937
1534946481.946945 900.000039 worker-1-18 189 1350 14.0
1534946481.941532 900.000083 worker-1-25 118 9414 1.253452
1534946481.942680 900.000094 worker-1-30 1375 2635 52.182163
1534946481.937385 900.000074 worker-1-1 1050 232183 0.452229
1534946481.939621 900.000062 worker-1-26 20 1973 1.013685
1534946481.942331 900.000127 worker-1-2 1236 240350 0.51425
1534946481.938535 900.000003 worker-1-29 133 2923 4.55012
1534946481.938737 900.000077 worker-1-13 1463 223976 0.653195
1534946481.937868 900.000121 worker-1-15 278 2360 11.779661
1534946481.937738 900.000006 worker-1-28 36 765 4.705882
1534946481.940076 900.000039 worker-1-23 43 3749 1.146973
1534946481.940530 900.000008 worker-1-22 1151 4798 23.989162
1534946481.944632 900.000030 worker-1-19 510 88481 0.576395
1534946481.937329 900.000045 worker-1-6 891 146039 0.610111
1534946481.938533 900.000095 worker-1-14 206 2276 9.050967
1534946481.937384 900.000074 worker-1-11 222 2176 10.202206
1534947381.938548 900.000013 worker-1-29 1135 241449 0.470079
1534947381.942682 900.000002 worker-1-30 399 13150 3.034221
1534947381.939458 900.000007 worker-1-7 332 66504 0.499218
1534947381.937742 900.000004 worker-1-28 31 711 4.360056
1534947381.940622 900.000092 worker-1-22 77 1728 4.456019
1534947381.938073 900.000067 worker-1-20 103 2343 4.396073
1534947381.941622 900.000074 worker-1-24 90 7394 1.217203
1534947381.941549 900.000017 worker-1-25 1259 235553 0.534487
1534947381.941454 900.000087 worker-1-16 231 5455 4.234647
1534947381.942399 900.000068 worker-1-2 69 1293 5.336427
1534947381.941324 900.000056 worker-1-17 152 759 20.02635
1534947381.931395 900.000050 worker-1-4 1310 240018 0.545792
1534947381.938810 900.000073 worker-1-13 109 17301 0.630021
1534947381.938606 900.000073 worker-1-14 305 2184 13.965201
1534947381.937398 900.000069 worker-1-6 67 3465 1.933622
1534947381.940457 900.000007 worker-1-5 118 1280 9.21875
1534947381.937470 900.000085 worker-1-1 24 1581 1.518027
1534947381.940195 900.000119 worker-1-23 189 20872 0.905519
1534947381.937614 900.000097 worker-1-3 1167 213001 0.547885
1534947381.944751 900.000119 worker-1-19 160 4249 3.765592
1534947381.937943 900.000075 worker-1-15 593 2541 23.337269
1534947381.947066 900.000121 worker-1-18 809 160344 0.50454
1534947381.939548 900.000001 worker-1-27 219 2612 8.38438
1534947381.938628 900.000095 worker-1-21 302 1627 18.56177
1534947381.937326 900.000057 worker-1-10 107 1763 6.0692
1534947381.938497 900.000101 worker-1-9 1599 238664 0.66998
1534947381.941398 900.000075 worker-1-12 201 2936 6.846049
1534947381.937399 900.000015 worker-1-11 1382 236433 0.584521
1534947381.939677 900.000056 worker-1-26 52 1100 4.727273
1534947381.941453 900.000001 worker-1-8 224 1601 13.991255
1534948281.939548 900.000090 worker-1-7 1088 235524 0.461949
1534948281.941678 900.000129 worker-1-25 202 32683 0.618058
1534948281.947198 900.000132 worker-1-18 284 6208 4.574742
1534948281.937477 900.000079 worker-1-6 70 14679 0.476872
1534948281.937532 900.000062 worker-1-1 57 1621 3.516348
1534948281.937477 900.000078 worker-1-11 71 24940 0.284683
1534948281.938938 900.000128 worker-1-13 111 12288 0.90332
1534948281.941679 900.000057 worker-1-24 731 121315 0.602564
1534948281.938621 900.000015 worker-1-14 1056 230109 0.458913
1534948281.942751 900.000069 worker-1-30 34 448 7.589286
1534948281.938548 900.000000 worker-1-29 219 1033 21.200387
1534948281.941325 900.000001 worker-1-17 671 111097 0.603977
1534948281.937348 900.000022 worker-1-10 145 1917 7.563902
1534948281.938055 900.000112 worker-1-15 859 187429 0.458307
1534948281.939622 900.000074 worker-1-27 50 3453 1.448016
1534948281.931396 900.000001 worker-1-4 193 3759 5.134344
1534948281.937780 900.000038 worker-1-28 230 6086 3.779165
1534948281.938109 900.000036 worker-1-20 1135 230316 0.492801
1534948281.938512 900.000015 worker-1-9 44 3888 1.131687
1534948281.940323 900.000128 worker-1-23 30 1212 2.475248
1534948281.939677 900.000000 worker-1-26 165 6336 2.604167
1534948281.940527 900.000070 worker-1-5 96 5162 1.859744
1534948281.937736 900.000122 worker-1-3 1123 249305 0.450452
1534948281.941454 900.000001 worker-1-8 67 1910 3.507853
1534948281.940679 900.000057 worker-1-22 115 4310 2.668213
1534948281.938677 900.000049 worker-1-21 25 2141 1.167679
1534948281.944879 900.000128 worker-1-19 29 1637 1.771533
1534948281.942454 900.000055 worker-1-2 36 2033 1.770782
1534948281.941453 900.000055 worker-1-12 26 991 2.623613
1534948281.941454 900.000000 worker-1-16 1127 230791 0.488321
cat capture_loss.log
#separator \x09
#set_separator ,
#empty_field (empty)
#unset_field -
#path capture_loss
#open 2018-08-22-10-06-13
#fields ts ts_delta peer gaps acks percent_lost
#types time interval string count count double
1534946772.685666 900.000108 worker-1-9 71276 209039 34.096987
1534946772.682117 900.000110 worker-1-20 43286 430827 10.047188
1534946772.686758 900.000020 worker-1-22 58337 172653 33.788582
1534946772.689750 900.000013 worker-1-17 61579 422200 14.585268
1534946772.683422 900.000599 worker-1-4 62846 224500 27.993764
1534946772.692533 900.000076 worker-1-13 56519 190555 29.660203
1534946772.684749 900.000086 worker-1-15 41612 129870 32.041272
1534946772.684889 900.000230 worker-1-27 76559 187163 40.904987
1534946772.683731 900.000001 worker-1-25 74450 188407 39.515517
1534946772.681934 900.000111 worker-1-5 50253 153355 32.769065
1534946772.682021 900.000012 worker-1-28 52191 151854 34.369197
1534946772.682825 900.000074 worker-1-8 52037 190660 27.293087
1534946772.699409 900.000084 worker-1-16 88137 266670 33.050962
1534946772.685734 900.000100 worker-1-30 51271 238600 21.488265
1534946772.682739 900.000022 worker-1-6 66273 250566 26.449319
1534946772.682741 900.000063 worker-1-26 49902 153687 32.46989
1534946772.681960 900.000006 worker-1-1 89188 255018 34.973218
1534946772.682631 900.000622 worker-1-29 60705 210476 28.841768
1534946772.681953 900.000075 worker-1-2 38281 125211 30.573192
1534946772.682673 900.000007 worker-1-3 67450 187531 35.967387
1534946772.686732 900.000060 worker-1-23 55932 191885 29.148709
1534946772.681828 900.000005 worker-1-7 66947 445007 15.044033
1534946772.681886 900.000007 worker-1-11 48944 138084 35.445091
1534946772.693528 900.000000 worker-1-14 65762 188557 34.876456
1534946772.681885 900.000006 worker-1-10 62149 428124 14.516589
1534946772.685697 900.000017 worker-1-21 48039 147640 32.53793
1534946772.683753 900.000022 worker-1-19 59660 157172 37.958415
1534946772.705397 900.000127 worker-1-24 71820 223813 32.089289
1534946772.688718 900.000117 worker-1-18 48410 452562 10.696877
1534946772.685511 900.000137 worker-1-12 46673 145455 32.087587
1534947672.682048 900.000114 worker-1-5 68107 180382 37.757093
1534947672.683025 900.000286 worker-1-6 45761 183027 25.002322
1534947672.685750 900.000053 worker-1-21 50836 422213 12.040368
1534947672.683879 900.000126 worker-1-19 53010 178899 29.631244
1534947672.693643 900.000115 worker-1-14 92038 425392 21.636044
1534947672.682825 900.000084 worker-1-26 55076 176437 31.215675
1534947672.682008 900.000123 worker-1-10 73148 207138 35.313656
1534947672.699475 900.000066 worker-1-16 72461 223957 32.354872
1534947672.684952 900.000063 worker-1-27 47858 167864 28.509984
1534947672.686884 900.000126 worker-1-22 65305 192727 33.884718
1534947672.681973 900.000020 worker-1-2 60511 181325 33.37157
1534947672.682136 900.000176 worker-1-1 109592 280275 39.101597
1534947672.682749 900.000118 worker-1-29 64164 192112 33.399267
1534947672.689756 900.000006 worker-1-17 61667 166246 37.093825
1534947672.683803 900.000072 worker-1-25 56366 464877 12.124928
1534947672.682152 900.000035 worker-1-20 49701 148229 33.529876
1534947672.685826 900.000092 worker-1-30 54071 160228 33.746287
1534947672.684823 900.000074 worker-1-15 60758 204305 29.738871
1534947672.685527 900.000016 worker-1-12 51410 166297 30.914569
1534947672.688722 900.000004 worker-1-18 73693 218226 33.76912
1534947672.682082 900.000061 worker-1-28 62184 198747 31.288019
1534947672.686826 900.000094 worker-1-23 57861 221752 26.092662
1534947672.682903 900.000078 worker-1-8 48482 219779 22.059432
1534947672.685711 900.000045 worker-1-9 53372 172244 30.986275
1534947672.692602 900.000069 worker-1-13 62358 502957 12.398277
1534947672.682167 900.000281 worker-1-11 48767 198101 24.617241
1534947672.705447 900.000050 worker-1-24 55112 186729 29.51443
1534947672.682731 900.000058 worker-1-3 56891 162845 34.935675
1534947672.683487 900.000065 worker-1-4 78602 255868 30.719746
1534947672.681880 900.000052 worker-1-7 51099 541967 9.428434
1534948572.682094 900.000086 worker-1-10 82032 524780 15.631693
1534948572.693667 900.000024 worker-1-14 85369 297217 28.722785
1534948572.682472 900.000499 worker-1-2 53654 221056 24.271678
1534948572.686886 900.000002 worker-1-22 55666 467706 11.901921
1534948572.685008 900.000056 worker-1-27 86916 263647 32.966808
1534948572.682279 900.000127 worker-1-20 89828 256003 35.088651
1534948572.682223 900.000087 worker-1-1 62337 344970 18.070267
1534948572.685750 900.000000 worker-1-21 70389 510644 13.784359
1534948572.684880 900.000057 worker-1-15 67459 206447 32.676183
1534948572.685740 900.000029 worker-1-9 57163 227031 25.1785
1534948572.682752 900.000021 worker-1-3 61958 204039 30.365763
1534948572.682835 900.000010 worker-1-26 54506 196350 27.759613
1534948572.683153 900.000128 worker-1-6 60501 190365 31.781577
1534948572.682183 900.000016 worker-1-11 63835 191625 33.312459
1534948572.682208 900.000126 worker-1-28 91876 284589 32.28375
1534948572.683828 900.000025 worker-1-25 44239 139128 31.797338
1534948572.685880 900.000054 worker-1-30 55616 172434 32.2535
1534948572.689884 900.000128 worker-1-17 69725 178142 39.140124
1534948572.681961 900.000081 worker-1-7 53776 220472 24.391306
1534948572.683937 900.000058 worker-1-19 50184 163270 30.736816
1534948572.685538 900.000011 worker-1-12 60185 260306 23.120865
1534948572.686889 900.000063 worker-1-23 59788 194439 30.748975
1534948572.682908 900.000005 worker-1-8 60904 532647 11.434214
1534948572.692674 900.000072 worker-1-13 67152 216975 30.949188
1534948572.688750 900.000028 worker-1-18 70383 235710 29.859997
1534948572.705484 900.000037 worker-1-24 57008 201189 28.335545
1534948572.682147 900.000099 worker-1-5 61878 194825 31.760811
1534948572.699536 900.000061 worker-1-16 76385 256671 29.759887
1534948572.682829 900.000080 worker-1-29 52464 188150 27.884135
1534948572.683536 900.000049 worker-1-4 110222 314119 35.08925
[root at aosoc current]# broctl netstats
worker-1-1: 1534949053.166850 recvd=813997 dropped=0 link=813997
worker-1-2: 1534949053.366803 recvd=873351 dropped=0 link=873353
worker-1-3: 1534949053.567778 recvd=1770808 dropped=0 link=1770810
worker-1-4: 1534949053.767852 recvd=865443 dropped=0 link=865449
worker-1-5: 1534949053.968873 recvd=349355 dropped=0 link=349361
worker-1-6: 1534949054.168785 recvd=1152160 dropped=0 link=1152161
worker-1-7: 1534949054.368825 recvd=1358553 dropped=0 link=1358553
worker-1-8: 1534949054.569808 recvd=345267 dropped=0 link=345272
worker-1-9: 1534949054.769982 recvd=856725 dropped=0 link=856732
worker-1-10: 1534949054.969811 recvd=351148 dropped=0 link=351148
worker-1-11: 1534949055.170855 recvd=883897 dropped=0 link=883897
worker-1-12: 1534949055.370950 recvd=820117 dropped=0 link=820125
worker-1-13: 1534949055.571899 recvd=1132465 dropped=0 link=1132473
worker-1-14: 1534949055.771751 recvd=823249 dropped=0 link=823249
worker-1-15: 1534949055.972921 recvd=754342 dropped=0 link=754343
worker-1-16: 1534949056.173778 recvd=822102 dropped=0 link=822106
worker-1-17: 1534949056.373806 recvd=570905 dropped=0 link=570911
worker-1-18: 1534949056.573815 recvd=1033845 dropped=0 link=1033846
worker-1-19: 1534949056.774737 recvd=648977 dropped=0 link=649001
worker-1-20: 1534949056.974823 recvd=816836 dropped=0 link=816838
worker-1-21: 1534949057.175858 recvd=423896 dropped=0 link=423901
worker-1-22: 1534949057.375894 recvd=761794 dropped=0 link=761796
worker-1-23: 1534949057.576737 recvd=415151 dropped=0 link=415153
worker-1-24: 1534949057.776887 recvd=604342 dropped=0 link=604349
worker-1-25: 1534949057.978046 recvd=911772 dropped=0 link=911785
worker-1-26: 1534949058.177749 recvd=358386 dropped=0 link=358395
worker-1-27: 1534949058.379062 recvd=1283463 dropped=0 link=1283465
worker-1-28: 1534949058.578751 recvd=364801 dropped=0 link=364807
worker-1-29: 1534949058.778735 recvd=930041 dropped=0 link=930042
worker-1-30: 1534949058.979938 recvd=857963 dropped=0 link=857967
-----Original Message-----
From: Azoff, Justin S <jazoff at illinois.edu>
Sent: Tuesday, August 21, 2018 9:46 PM
To: Ron McClellan <Ron_McClellan at ao.uscourts.gov>
Cc: bro at bro.org
Subject: Re: [Bro] BRO Logger crashing due to large DNS log files
> On Aug 21, 2018, at 6:10 PM, Ron McClellan <Ron_McClellan at ao.uscourts.gov> wrote:
> I finished most of your recommendations, just need to rebuild bro, but was going to let it run over night and see how it is running now. I really appreciate all the help.
Great! There may be more things to fix, but once that load balancing is working properly things will be in a lot better shape.
This was really helpful to see as well:
> ]# hwloc-ls -p
> Machine (256GB total)
> NUMANode P#0 (128GB)
> Package P#0 + L3 (25MB)
> L2 (1024KB) + L1d (32KB) + L1i (32KB) + Core P#0
> PU P#0 <----
> PU P#36
> L2 (1024KB) + L1d (32KB) + L1i (32KB) + Core P#1
> PU P#1 <---
> PU P#37
You have CPU 0,1,2,3.. on the same numa node, but every box I have puts 0,2,4... on one and 1,3,5... on the other.
Machine (64GB total)
NUMANode P#0 (32GB)
Package P#0 + L3 (14MB)
L2 (1024KB) + L1d (32KB) + L1i (32KB) + Core P#0
PU P#0 <---
PU P#20
L2 (1024KB) + L1d (32KB) + L1i (32KB) + Core P#4
PU P#2 <---
PU P#22
All the more reason for me to get bro-doctor to do this analysis and confirm the proper pin_cpus values are being used.
—
Justin Azoff
More information about the Bro
mailing list