[Bro] BRO Logger crashing due to large DNS log files

Ron McClellan Ron_McClellan at ao.uscourts.gov
Wed Aug 22 07:48:01 PDT 2018


Justin,

	Got good news and solid progress with your help.  BRO is running on both boxes and hasn't crashed since 10pm last night.    If I read the data about NUMA from my systems, I don't really need to split the load between 2 workers as you did, right?  I'm working on tuning some now and also trying to address the really high lag (500) that I'm still seeing.  Currently seeing some loss on it, but will continue to tune and see what if I can get that under control.  Let me know if you need help testing the doctor script.

Ron 


# cat capture_loss.log
#separator \x09
#set_separator  ,
#empty_field    (empty)
#unset_field    -
#path   capture_loss
#open   2018-08-22-10-01-21
#fields ts      ts_delta        peer    gaps    acks    percent_lost
#types  time    interval        string  count   count   double
1534946481.938006       900.000084      worker-1-20     33      696     4.741379
1534946481.941548       900.000000      worker-1-24     20      2722    0.734754
1534946481.938533       900.000059      worker-1-21     630     40222   1.566307
1534946481.938396       900.000070      worker-1-9      89      1470    6.054422
1534946481.941452       900.000044      worker-1-8      156     1821    8.566722
1534946481.941323       900.000017      worker-1-12     1062    232679  0.456423
1534946481.939547       900.000037      worker-1-27     1023    216063  0.473473
1534946481.937269       900.000040      worker-1-10     749     5465    13.705398
1534946481.937517       900.000111      worker-1-3      87      15720   0.553435
1534946481.941367       900.000649      worker-1-16     117     2187    5.349794
1534946481.939451       900.000079      worker-1-7      870     195358  0.445336
1534946481.940450       900.000041      worker-1-5      111     626     17.731629
1534946481.931345       900.000019      worker-1-4      44      885     4.971751
1534946481.941268       900.000074      worker-1-17     131     1641    7.982937
1534946481.946945       900.000039      worker-1-18     189     1350    14.0
1534946481.941532       900.000083      worker-1-25     118     9414    1.253452
1534946481.942680       900.000094      worker-1-30     1375    2635    52.182163
1534946481.937385       900.000074      worker-1-1      1050    232183  0.452229
1534946481.939621       900.000062      worker-1-26     20      1973    1.013685
1534946481.942331       900.000127      worker-1-2      1236    240350  0.51425
1534946481.938535       900.000003      worker-1-29     133     2923    4.55012
1534946481.938737       900.000077      worker-1-13     1463    223976  0.653195
1534946481.937868       900.000121      worker-1-15     278     2360    11.779661
1534946481.937738       900.000006      worker-1-28     36      765     4.705882
1534946481.940076       900.000039      worker-1-23     43      3749    1.146973
1534946481.940530       900.000008      worker-1-22     1151    4798    23.989162
1534946481.944632       900.000030      worker-1-19     510     88481   0.576395
1534946481.937329       900.000045      worker-1-6      891     146039  0.610111
1534946481.938533       900.000095      worker-1-14     206     2276    9.050967
1534946481.937384       900.000074      worker-1-11     222     2176    10.202206
1534947381.938548       900.000013      worker-1-29     1135    241449  0.470079
1534947381.942682       900.000002      worker-1-30     399     13150   3.034221
1534947381.939458       900.000007      worker-1-7      332     66504   0.499218
1534947381.937742       900.000004      worker-1-28     31      711     4.360056
1534947381.940622       900.000092      worker-1-22     77      1728    4.456019
1534947381.938073       900.000067      worker-1-20     103     2343    4.396073
1534947381.941622       900.000074      worker-1-24     90      7394    1.217203
1534947381.941549       900.000017      worker-1-25     1259    235553  0.534487
1534947381.941454       900.000087      worker-1-16     231     5455    4.234647
1534947381.942399       900.000068      worker-1-2      69      1293    5.336427
1534947381.941324       900.000056      worker-1-17     152     759     20.02635
1534947381.931395       900.000050      worker-1-4      1310    240018  0.545792
1534947381.938810       900.000073      worker-1-13     109     17301   0.630021
1534947381.938606       900.000073      worker-1-14     305     2184    13.965201
1534947381.937398       900.000069      worker-1-6      67      3465    1.933622
1534947381.940457       900.000007      worker-1-5      118     1280    9.21875
1534947381.937470       900.000085      worker-1-1      24      1581    1.518027
1534947381.940195       900.000119      worker-1-23     189     20872   0.905519
1534947381.937614       900.000097      worker-1-3      1167    213001  0.547885
1534947381.944751       900.000119      worker-1-19     160     4249    3.765592
1534947381.937943       900.000075      worker-1-15     593     2541    23.337269
1534947381.947066       900.000121      worker-1-18     809     160344  0.50454
1534947381.939548       900.000001      worker-1-27     219     2612    8.38438
1534947381.938628       900.000095      worker-1-21     302     1627    18.56177
1534947381.937326       900.000057      worker-1-10     107     1763    6.0692
1534947381.938497       900.000101      worker-1-9      1599    238664  0.66998
1534947381.941398       900.000075      worker-1-12     201     2936    6.846049
1534947381.937399       900.000015      worker-1-11     1382    236433  0.584521
1534947381.939677       900.000056      worker-1-26     52      1100    4.727273
1534947381.941453       900.000001      worker-1-8      224     1601    13.991255
1534948281.939548       900.000090      worker-1-7      1088    235524  0.461949
1534948281.941678       900.000129      worker-1-25     202     32683   0.618058
1534948281.947198       900.000132      worker-1-18     284     6208    4.574742
1534948281.937477       900.000079      worker-1-6      70      14679   0.476872
1534948281.937532       900.000062      worker-1-1      57      1621    3.516348
1534948281.937477       900.000078      worker-1-11     71      24940   0.284683
1534948281.938938       900.000128      worker-1-13     111     12288   0.90332
1534948281.941679       900.000057      worker-1-24     731     121315  0.602564
1534948281.938621       900.000015      worker-1-14     1056    230109  0.458913
1534948281.942751       900.000069      worker-1-30     34      448     7.589286
1534948281.938548       900.000000      worker-1-29     219     1033    21.200387
1534948281.941325       900.000001      worker-1-17     671     111097  0.603977
1534948281.937348       900.000022      worker-1-10     145     1917    7.563902
1534948281.938055       900.000112      worker-1-15     859     187429  0.458307
1534948281.939622       900.000074      worker-1-27     50      3453    1.448016
1534948281.931396       900.000001      worker-1-4      193     3759    5.134344
1534948281.937780       900.000038      worker-1-28     230     6086    3.779165
1534948281.938109       900.000036      worker-1-20     1135    230316  0.492801
1534948281.938512       900.000015      worker-1-9      44      3888    1.131687
1534948281.940323       900.000128      worker-1-23     30      1212    2.475248
1534948281.939677       900.000000      worker-1-26     165     6336    2.604167
1534948281.940527       900.000070      worker-1-5      96      5162    1.859744
1534948281.937736       900.000122      worker-1-3      1123    249305  0.450452
1534948281.941454       900.000001      worker-1-8      67      1910    3.507853
1534948281.940679       900.000057      worker-1-22     115     4310    2.668213
1534948281.938677       900.000049      worker-1-21     25      2141    1.167679
1534948281.944879       900.000128      worker-1-19     29      1637    1.771533
1534948281.942454       900.000055      worker-1-2      36      2033    1.770782
1534948281.941453       900.000055      worker-1-12     26      991     2.623613
1534948281.941454       900.000000      worker-1-16     1127    230791  0.488321

cat capture_loss.log
#separator \x09
#set_separator  ,
#empty_field    (empty)
#unset_field    -
#path   capture_loss
#open   2018-08-22-10-06-13
#fields ts      ts_delta        peer    gaps    acks    percent_lost
#types  time    interval        string  count   count   double
1534946772.685666       900.000108      worker-1-9      71276   209039  34.096987
1534946772.682117       900.000110      worker-1-20     43286   430827  10.047188
1534946772.686758       900.000020      worker-1-22     58337   172653  33.788582
1534946772.689750       900.000013      worker-1-17     61579   422200  14.585268
1534946772.683422       900.000599      worker-1-4      62846   224500  27.993764
1534946772.692533       900.000076      worker-1-13     56519   190555  29.660203
1534946772.684749       900.000086      worker-1-15     41612   129870  32.041272
1534946772.684889       900.000230      worker-1-27     76559   187163  40.904987
1534946772.683731       900.000001      worker-1-25     74450   188407  39.515517
1534946772.681934       900.000111      worker-1-5      50253   153355  32.769065
1534946772.682021       900.000012      worker-1-28     52191   151854  34.369197
1534946772.682825       900.000074      worker-1-8      52037   190660  27.293087
1534946772.699409       900.000084      worker-1-16     88137   266670  33.050962
1534946772.685734       900.000100      worker-1-30     51271   238600  21.488265
1534946772.682739       900.000022      worker-1-6      66273   250566  26.449319
1534946772.682741       900.000063      worker-1-26     49902   153687  32.46989
1534946772.681960       900.000006      worker-1-1      89188   255018  34.973218
1534946772.682631       900.000622      worker-1-29     60705   210476  28.841768
1534946772.681953       900.000075      worker-1-2      38281   125211  30.573192
1534946772.682673       900.000007      worker-1-3      67450   187531  35.967387
1534946772.686732       900.000060      worker-1-23     55932   191885  29.148709
1534946772.681828       900.000005      worker-1-7      66947   445007  15.044033
1534946772.681886       900.000007      worker-1-11     48944   138084  35.445091
1534946772.693528       900.000000      worker-1-14     65762   188557  34.876456
1534946772.681885       900.000006      worker-1-10     62149   428124  14.516589
1534946772.685697       900.000017      worker-1-21     48039   147640  32.53793
1534946772.683753       900.000022      worker-1-19     59660   157172  37.958415
1534946772.705397       900.000127      worker-1-24     71820   223813  32.089289
1534946772.688718       900.000117      worker-1-18     48410   452562  10.696877
1534946772.685511       900.000137      worker-1-12     46673   145455  32.087587
1534947672.682048       900.000114      worker-1-5      68107   180382  37.757093
1534947672.683025       900.000286      worker-1-6      45761   183027  25.002322
1534947672.685750       900.000053      worker-1-21     50836   422213  12.040368
1534947672.683879       900.000126      worker-1-19     53010   178899  29.631244
1534947672.693643       900.000115      worker-1-14     92038   425392  21.636044
1534947672.682825       900.000084      worker-1-26     55076   176437  31.215675
1534947672.682008       900.000123      worker-1-10     73148   207138  35.313656
1534947672.699475       900.000066      worker-1-16     72461   223957  32.354872
1534947672.684952       900.000063      worker-1-27     47858   167864  28.509984
1534947672.686884       900.000126      worker-1-22     65305   192727  33.884718
1534947672.681973       900.000020      worker-1-2      60511   181325  33.37157
1534947672.682136       900.000176      worker-1-1      109592  280275  39.101597
1534947672.682749       900.000118      worker-1-29     64164   192112  33.399267
1534947672.689756       900.000006      worker-1-17     61667   166246  37.093825
1534947672.683803       900.000072      worker-1-25     56366   464877  12.124928
1534947672.682152       900.000035      worker-1-20     49701   148229  33.529876
1534947672.685826       900.000092      worker-1-30     54071   160228  33.746287
1534947672.684823       900.000074      worker-1-15     60758   204305  29.738871
1534947672.685527       900.000016      worker-1-12     51410   166297  30.914569
1534947672.688722       900.000004      worker-1-18     73693   218226  33.76912
1534947672.682082       900.000061      worker-1-28     62184   198747  31.288019
1534947672.686826       900.000094      worker-1-23     57861   221752  26.092662
1534947672.682903       900.000078      worker-1-8      48482   219779  22.059432
1534947672.685711       900.000045      worker-1-9      53372   172244  30.986275
1534947672.692602       900.000069      worker-1-13     62358   502957  12.398277
1534947672.682167       900.000281      worker-1-11     48767   198101  24.617241
1534947672.705447       900.000050      worker-1-24     55112   186729  29.51443
1534947672.682731       900.000058      worker-1-3      56891   162845  34.935675
1534947672.683487       900.000065      worker-1-4      78602   255868  30.719746
1534947672.681880       900.000052      worker-1-7      51099   541967  9.428434
1534948572.682094       900.000086      worker-1-10     82032   524780  15.631693
1534948572.693667       900.000024      worker-1-14     85369   297217  28.722785
1534948572.682472       900.000499      worker-1-2      53654   221056  24.271678
1534948572.686886       900.000002      worker-1-22     55666   467706  11.901921
1534948572.685008       900.000056      worker-1-27     86916   263647  32.966808
1534948572.682279       900.000127      worker-1-20     89828   256003  35.088651
1534948572.682223       900.000087      worker-1-1      62337   344970  18.070267
1534948572.685750       900.000000      worker-1-21     70389   510644  13.784359
1534948572.684880       900.000057      worker-1-15     67459   206447  32.676183
1534948572.685740       900.000029      worker-1-9      57163   227031  25.1785
1534948572.682752       900.000021      worker-1-3      61958   204039  30.365763
1534948572.682835       900.000010      worker-1-26     54506   196350  27.759613
1534948572.683153       900.000128      worker-1-6      60501   190365  31.781577
1534948572.682183       900.000016      worker-1-11     63835   191625  33.312459
1534948572.682208       900.000126      worker-1-28     91876   284589  32.28375
1534948572.683828       900.000025      worker-1-25     44239   139128  31.797338
1534948572.685880       900.000054      worker-1-30     55616   172434  32.2535
1534948572.689884       900.000128      worker-1-17     69725   178142  39.140124
1534948572.681961       900.000081      worker-1-7      53776   220472  24.391306
1534948572.683937       900.000058      worker-1-19     50184   163270  30.736816
1534948572.685538       900.000011      worker-1-12     60185   260306  23.120865
1534948572.686889       900.000063      worker-1-23     59788   194439  30.748975
1534948572.682908       900.000005      worker-1-8      60904   532647  11.434214
1534948572.692674       900.000072      worker-1-13     67152   216975  30.949188
1534948572.688750       900.000028      worker-1-18     70383   235710  29.859997
1534948572.705484       900.000037      worker-1-24     57008   201189  28.335545
1534948572.682147       900.000099      worker-1-5      61878   194825  31.760811
1534948572.699536       900.000061      worker-1-16     76385   256671  29.759887
1534948572.682829       900.000080      worker-1-29     52464   188150  27.884135
1534948572.683536       900.000049      worker-1-4      110222  314119  35.08925

[root at aosoc current]# broctl netstats
 worker-1-1: 1534949053.166850 recvd=813997 dropped=0 link=813997
 worker-1-2: 1534949053.366803 recvd=873351 dropped=0 link=873353
 worker-1-3: 1534949053.567778 recvd=1770808 dropped=0 link=1770810
 worker-1-4: 1534949053.767852 recvd=865443 dropped=0 link=865449
 worker-1-5: 1534949053.968873 recvd=349355 dropped=0 link=349361
 worker-1-6: 1534949054.168785 recvd=1152160 dropped=0 link=1152161
 worker-1-7: 1534949054.368825 recvd=1358553 dropped=0 link=1358553
 worker-1-8: 1534949054.569808 recvd=345267 dropped=0 link=345272
 worker-1-9: 1534949054.769982 recvd=856725 dropped=0 link=856732
worker-1-10: 1534949054.969811 recvd=351148 dropped=0 link=351148
worker-1-11: 1534949055.170855 recvd=883897 dropped=0 link=883897
worker-1-12: 1534949055.370950 recvd=820117 dropped=0 link=820125
worker-1-13: 1534949055.571899 recvd=1132465 dropped=0 link=1132473
worker-1-14: 1534949055.771751 recvd=823249 dropped=0 link=823249
worker-1-15: 1534949055.972921 recvd=754342 dropped=0 link=754343
worker-1-16: 1534949056.173778 recvd=822102 dropped=0 link=822106
worker-1-17: 1534949056.373806 recvd=570905 dropped=0 link=570911
worker-1-18: 1534949056.573815 recvd=1033845 dropped=0 link=1033846
worker-1-19: 1534949056.774737 recvd=648977 dropped=0 link=649001
worker-1-20: 1534949056.974823 recvd=816836 dropped=0 link=816838
worker-1-21: 1534949057.175858 recvd=423896 dropped=0 link=423901
worker-1-22: 1534949057.375894 recvd=761794 dropped=0 link=761796
worker-1-23: 1534949057.576737 recvd=415151 dropped=0 link=415153
worker-1-24: 1534949057.776887 recvd=604342 dropped=0 link=604349
worker-1-25: 1534949057.978046 recvd=911772 dropped=0 link=911785
worker-1-26: 1534949058.177749 recvd=358386 dropped=0 link=358395
worker-1-27: 1534949058.379062 recvd=1283463 dropped=0 link=1283465
worker-1-28: 1534949058.578751 recvd=364801 dropped=0 link=364807
worker-1-29: 1534949058.778735 recvd=930041 dropped=0 link=930042
worker-1-30: 1534949058.979938 recvd=857963 dropped=0 link=857967



-----Original Message-----
From: Azoff, Justin S <jazoff at illinois.edu> 
Sent: Tuesday, August 21, 2018 9:46 PM
To: Ron McClellan <Ron_McClellan at ao.uscourts.gov>
Cc: bro at bro.org
Subject: Re: [Bro] BRO Logger crashing due to large DNS log files


> On Aug 21, 2018, at 6:10 PM, Ron McClellan <Ron_McClellan at ao.uscourts.gov> wrote:
> 	I finished most of your recommendations, just need to rebuild bro, but was going to let it run over night and see how it is running now.  I really appreciate all the help.

Great!  There may be more things to fix, but once that load balancing is working properly things will be in a lot better shape.

This was really helpful to see as well:

> ]# hwloc-ls -p
> Machine (256GB total)
>  NUMANode P#0 (128GB)
>    Package P#0 + L3 (25MB)
>      L2 (1024KB) + L1d (32KB) + L1i (32KB) + Core P#0
>        PU P#0  <----
>        PU P#36
>      L2 (1024KB) + L1d (32KB) + L1i (32KB) + Core P#1
>        PU P#1  <---
>        PU P#37

You have CPU 0,1,2,3.. on the same numa node, but every box I have puts 0,2,4... on one and 1,3,5... on the other.

Machine (64GB total)
NUMANode P#0 (32GB)
  Package P#0 + L3 (14MB)
    L2 (1024KB) + L1d (32KB) + L1i (32KB) + Core P#0
      PU P#0  <---
      PU P#20
    L2 (1024KB) + L1d (32KB) + L1i (32KB) + Core P#4
      PU P#2  <---
      PU P#22

All the more reason for me to get bro-doctor to do this analysis and confirm the proper pin_cpus values are being used.

— 
Justin Azoff




More information about the Bro mailing list