[Zeek] Why does my logger keep crashing - bro version 2.6.3

william de ping bill.de.ping at gmail.com
Wed Oct 2 04:02:02 PDT 2019


Hi,

Can you please share your entire node.cfg file ?

It looks like you've added 3 more workers. I would check if the CPUs you
are pinning has a direct PCI lane to the NIC you are listening on.
Check the numa node the NIC is attached to and make sure you are pinning
the correct CPUs first

B


On Fri, Sep 27, 2019 at 7:27 PM Kayode Enwerem <
Kayode_Enwerem at ao.uscourts.gov> wrote:

> Looks like setting up 2 loggers resolved the issue of my logger crashing
> but my Dropped packets are pretty high on my workers. Can someone assist me
> with how I can reduce my dropped packets.
>
>
>
> cat capture_loss.log
>
> #separator \x09
>
> #set_separator  ,
>
> #empty_field    (empty)
>
> #unset_field    -
>
> #path   capture_loss
>
> #open   2019-09-27-12-05-05
>
> #fields ts      ts_delta        peer    gaps    acks    percent_lost
>
> #types  time    interval        string  count   count   double
>
> 1569600304.774215       900.000013      worker-1-1      126463  3246542
> 3.895314
>
> 1569600304.783703       900.000064      worker-1-3      106904  4465333
> 2.394088
>
> 1569600304.785983       900.000212      worker-1-11     123729  3768503
> 3.28324
>
> 1569600304.802244       900.000098      worker-1-14     144154  3584013
> 4.022139
>
> 1569600304.823378       900.000095      worker-1-18     137507  3503583
> 3.924754
>
> 1569600304.892559       900.000470      worker-1-13     148904  3448544
> 4.31788
>
> 1569600305.010986       900.000030      worker-1-8      174213  3409819
> 5.109157
>
> 1569600305.938686       901.043465      worker-1-15     509268  1072199
> 47.497526
>
> 1569600304.806850       900.000047      worker-1-22     591232  1234893
> 47.877185
>
> 1569601204.762382       900.000786      worker-1-16     120086  4491072
> 2.673883
>
> 1569601204.774220       900.000005      worker-1-1      127257  3461349
> 3.676515
>
> 1569601204.802447       900.000203      worker-1-14     125481  3171663
> 3.956316
>
> 1569601204.884438       900.000029      worker-1-19     125037  3566663
> 3.505714
>
> 1569601204.891746       900.000015      worker-1-23     120553  3078889
> 3.915471
>
> 1569601205.110098       900.000139      worker-1-10     108016  3442813
> 3.137434
>
> 1569601205.938906       900.000220      worker-1-15     565536  1156759
> 48.8897
>
> 1569601218.120290       900.000047      worker-1-6      456312  753749
> 60.538986
>
>
>
> Below are some of my settings:
>
>
>
> I have 23 workers defined and I pinned CPU.
>
> [worker-1]
>
> type=worker
>
> host=localhost
>
> interface=af_packet::ens2f0
>
> lb_method=custom
>
> #lb_method=pf_ring
>
> lb_procs=23
>
> pin_cpus=5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27
>
> af_packet_fanout_id=25
>
> af_packet_fanout_mode=AF_Packet::FANOUT_HASH
>
>
>
> Can someone assist me with this.
>
>
>
> Thanks.
>
>
>
>
>
> *From:* william de ping <bill.de.ping at gmail.com>
> *Sent:* Wednesday, September 25, 2019 4:00 AM
> *To:* Kayode Enwerem <Kayode_Enwerem at ao.uscourts.gov>
> *Cc:* zeek at zeek.org
> *Subject:* Re: [Zeek] Why does my logger keep crashing - bro version 2.6.3
>
>
>
> Hi
>
>
>
> Try using the None writer instead of the ASCII one.
>
> In local.bro add :
>
> redef Log::default_writer=Log::WRITER_NONE;
>
>
>
> If the logger instance still crashes then the issue is not related to an
> IO bottleneck.
>
>
>
> B
>
>
>
> On Tue, Sep 24, 2019 at 7:49 PM Kayode Enwerem <
> Kayode_Enwerem at ao.uscourts.gov> wrote:
>
> Thanks for your response.
>
> I do see the following OOM message in my system logs on the logger process
> ID:
> Sep 23 18:48:00 kernel: Out of memory: Kill process 10439 (bro) score 787
> or sacrifice child
> Sep 23 18:48:00 kernel: Killed process 10439 (bro), UID 0,
> total-vm:301983900kB, anon-rss:195261772kB, file-rss:2592kB, shmem-rss:0kB
>
> Wonder why its taking so much memory, I have 251G and 99G swap on this
> server.
> total        used        free      shared  buff/cache   available
> Mem:           251G         66G        185G        4.2M        488M
> 184G
> Swap:           99G        1.1G         98G
>
> Below is the output of "broctl diag logger", ran after the logger crashed.
>
>  [logger]
>
> No core file found.
>
> Bro 2.6.3
> Linux 3.10.0-1062.1.1.el7.x86_64
>
> Bro plugins:
> Bro::AF_Packet - Packet acquisition via AF_Packet (dynamic, version 1.4)
>
> ==== No reporter.log
>
> ==== stderr.log
> /usr/local/bro/share/broctl/scripts/run-bro: line 110: 10439 Killed
>           nohup "$mybro" "$@"
>
> ==== stdout.log
> max memory size         (kbytes, -m) unlimited
> data seg size           (kbytes, -d) unlimited
> virtual memory          (kbytes, -v) unlimited
> core file size          (blocks, -c) unlimited
>
> ==== .cmdline
> -U .status -p broctl -p broctl-live -p local -p logger local.bro broctl
> base/frameworks/cluster broctl/auto
>
> ==== .env_vars
>
> PATH=/usr/local/bro/bin:/usr/local/bro/share/broctl/scripts:/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/bro/bin
>
> BROPATH=/logs/bro/spool/installed-scripts-do-not-touch/site::/logs/bro/spool/installed-scripts-do-not-touch/auto:/usr/local/bro/share/bro:/usr/local/bro/share/bro/policy:/usr/local/bro/share/bro/site
> CLUSTER_NODE=logger
>
> ==== .status
> RUNNING [net_run]
>
> ==== No prof.log
>
> ==== No packet_filter.log
>
> ==== No loaded_scripts.log
>
> Thoughts? Any suggestions.
>
> -----Original Message-----
> From: Vlad Grigorescu <vlad at es.net>
> Sent: Monday, September 23, 2019 10:20 AM
> To: Kayode Enwerem <Kayode_Enwerem at ao.uscourts.gov>
> Cc: william de ping <bill.de.ping at gmail.com>; zeek at zeek.org
> Subject: Re: [Zeek] Why does my logger keep crashing - bro version 2.6.3
>
> The logger is threaded, so seeing CPU > 100% is not necessarily a problem.
>
> Have you tried running "broctl diag logger" to see why the logger is
> crashing? Do you have any messages in your system logs about processing
> being killed for out of memory (OOM)?
>
>   --Vlad
>
> On Mon, Sep 23, 2019 at 1:32 PM Kayode Enwerem <
> Kayode_Enwerem at ao.uscourts.gov> wrote:
> >
> > Thanks for your response. The CPU usage for the logger is at 311%. (look
> below).
> >
> >
> >
> > broctl top
> >
> > Name         Type    Host             Pid     VSize  Rss  Cpu   Cmd
> >
> > logger       logger  localhost        22867    12G     9G 311%  bro
> >
> >
> >
> > I wasn’t aware that you could set up multiple loggers, I tried checking
> the docs to see if that was an option. Does anyone know how to do this?
> >
> >
> >
> > From: william de ping <bill.de.ping at gmail.com>
> > Sent: Sunday, September 22, 2019 6:42 AM
> > To: Kayode Enwerem <Kayode_Enwerem at ao.uscourts.gov>
> > Cc: zeek at zeek.org
> > Subject: Re: [Zeek] Why does my logger keep crashing - bro version
> > 2.6.3
> >
> >
> >
> > Hi,
> >
> >
> >
> > I would try to monitor the cpu \ mem usage of the logger instance.
> >
> > Try running broctl top, my guess is that you will see that the logger
> process will have a very high cpu usage.
> >
> >
> >
> > I know of an option to have multiple loggers but I am not sure how to
> set it up.
> >
> >
> >
> > Are you writing to a file ?
> >
> >
> >
> > B
> >
> >
> >
> > On Thu, Sep 19, 2019 at 7:14 PM Kayode Enwerem <
> Kayode_Enwerem at ao.uscourts.gov> wrote:
> >
> > Hello,
> >
> >
> >
> > Why does my logger keep crashing? Can someone please help me with this.
> I have provided some system information below:
> >
> >
> >
> > I am running bro version 2.6.3
> >
> >
> >
> > Output of broctl status. The logger is crashed but the manager, proxy
> and workers are still running.
> >
> > broctl status
> >
> > Name         Type    Host             Status    Pid    Started
> >
> > logger       logger  localhost        crashed
> >
> > manager      manager localhost        running   17356  09 Sep 15:42:24
> >
> > proxy-1      proxy   localhost        running   17401  09 Sep 15:42:25
> >
> > worker-1-1   worker  localhost        running   17573  09 Sep 15:42:27
> >
> > worker-1-2   worker  localhost        running   17569  09 Sep 15:42:27
> >
> > worker-1-3   worker  localhost        running   17572  09 Sep 15:42:27
> >
> > worker-1-4   worker  localhost        running   17587  09 Sep 15:42:27
> >
> > worker-1-5   worker  localhost        running   17619  09 Sep 15:42:27
> >
> > worker-1-6   worker  localhost        running   17614  09 Sep 15:42:27
> >
> > worker-1-7   worker  localhost        running   17625  09 Sep 15:42:27
> >
> > worker-1-8   worker  localhost        running   17646  09 Sep 15:42:27
> >
> > worker-1-9   worker  localhost        running   17671  09 Sep 15:42:27
> >
> > worker-1-10  worker  localhost        running   17663  09 Sep 15:42:27
> >
> > worker-1-11  worker  localhost        running   17679  09 Sep 15:42:27
> >
> > worker-1-12  worker  localhost        running   17685  09 Sep 15:42:27
> >
> > worker-1-13  worker  localhost        running   17698  09 Sep 15:42:27
> >
> > worker-1-14  worker  localhost        running   17703  09 Sep 15:42:27
> >
> > worker-1-15  worker  localhost        running   17710  09 Sep 15:42:27
> >
> > worker-1-16  worker  localhost        running   17717  09 Sep 15:42:27
> >
> > worker-1-17  worker  localhost        running   17720  09 Sep 15:42:27
> >
> > worker-1-18  worker  localhost        running   17727  09 Sep 15:42:27
> >
> > worker-1-19  worker  localhost        running   17728  09 Sep 15:42:27
> >
> > worker-1-20  worker  localhost        running   17731  09 Sep 15:42:27
> >
> >
> >
> > Here’s my node.cfg settings
> >
> > [logger]
> >
> > type=logger
> >
> > host=localhost
> >
> >
> >
> > [manager]
> >
> > type=manager
> >
> > host=localhost
> >
> >
> >
> > [proxy-1]
> >
> > type=proxy
> >
> > host=localhost
> >
> >
> >
> > [worker-1]
> >
> > type=worker
> >
> > host=localhost
> >
> > interface=af_packet::ens2f0
> >
> > lb_method=custom
> >
> > #lb_method=pf_ring
> >
> > lb_procs=20
> >
> > pin_cpus=6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25
> >
> > af_packet_fanout_id=25
> >
> > af_packet_fanout_mode=AF_Packet::FANOUT_HASH
> >
> >
> >
> > Heres more information on my CPU. 32 CPUs, model name – AMD, CPU max
> > MHz is 2800.0000
> >
> > Architecture:          x86_64
> >
> > CPU op-mode(s):        32-bit, 64-bit
> >
> > Byte Order:            Little Endian
> >
> > CPU(s):                32
> >
> > On-line CPU(s) list:   0-31
> >
> > Thread(s) per core:    2
> >
> > Core(s) per socket:    8
> >
> > Socket(s):             2
> >
> > NUMA node(s):          4
> >
> > Vendor ID:             AuthenticAMD
> >
> > CPU family:            21
> >
> > Model:                 2
> >
> > Model name:            AMD Opteron(tm) Processor 6386 SE
> >
> > Stepping:              0
> >
> > CPU MHz:               1960.000
> >
> > CPU max MHz:           2800.0000
> >
> > CPU min MHz:           1400.0000
> >
> > BogoMIPS:              5585.93
> >
> > Virtualization:        AMD-V
> >
> > L1d cache:             16K
> >
> > L1i cache:             64K
> >
> > L2 cache:              2048K
> >
> > L3 cache:              6144K
> >
> > NUMA node0 CPU(s):     0,2,4,6,8,10,12,14
> >
> > NUMA node1 CPU(s):     16,18,20,22,24,26,28,30
> >
> > NUMA node2 CPU(s):     1,3,5,7,9,11,13,15
> >
> > NUMA node3 CPU(s):     17,19,21,23,25,27,29,31
> >
> >
> >
> > Would also like to know how I can reduce my packet loss. Below is the
> > output of broctl netstats
> >
> > broctl netstats
> >
> > worker-1-1: 1568908277.861813 recvd=12248845422 dropped=5171188999
> > link=17420313882
> >
> > worker-1-2: 1568908298.313954 recvd=8636707266 dropped=971489
> > link=8637678939
> >
> > worker-1-3: 1568908278.425888 recvd=11684808853 dropped=5617381647
> > link=17302473791
> >
> > worker-1-4: 1568908285.731130 recvd=12567242226 dropped=4339688288
> > link=16907212802
> >
> > worker-1-5: 1568908298.363911 recvd=8620499351 dropped=24595149
> > link=8645095758
> >
> > worker-1-6: 1568908298.372892 recvd=8710565757 dropped=1731022
> > link=8712297432
> >
> > worker-1-7: 1568908298.266010 recvd=9065207444 dropped=53523232
> > link=9118737229
> >
> > worker-1-8: 1568908286.935607 recvd=11377790124 dropped=3680887247
> > link=15058934491
> >
> > worker-1-9: 1568908298.419657 recvd=8931903322 dropped=39696184
> > link=8971604219
> >
> > worker-1-10: 1568908298.478576 recvd=8842874030 dropped=2501252
> > link=8845376352
> >
> > worker-1-11: 1568908298.506649 recvd=8692769329 dropped=2253413
> > link=8695025626
> >
> > worker-1-12: 1568908298.520830 recvd=8749977028 dropped=2314733
> > link=8752293714
> >
> > worker-1-13: 1568908298.544573 recvd=9101243757 dropped=1779460
> > link=9103025399
> >
> > worker-1-14: 1568908291.370011 recvd=10876925726 dropped=775722632
> > link=11652810353
> >
> > worker-1-15: 1568908298.579721 recvd=8503097394 dropped=1420699
> > link=8504520066
> >
> > worker-1-16: 1568908298.594942 recvd=8515164266 dropped=1840977
> > link=8517006779
> >
> > worker-1-17: 1568908298.646966 recvd=10666567717 dropped=466489754
> > link=11133059283
> >
> > worker-1-18: 1568908298.671246 recvd=9023603573 dropped=2037607
> > link=9025642263
> >
> > worker-1-19: 1568908298.704675 recvd=8907784186 dropped=1164594
> > link=8908950238
> >
> > worker-1-20: 1568908298.718084 recvd=9140525444 dropped=2028593
> > link=9142555259
> >
> >
> >
> > Thanks,
> >
> > _______________________________________________
> > Zeek mailing list
> > zeek at zeek.org
> > http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/zeek
> >
> > _______________________________________________
> > Zeek mailing list
> > zeek at zeek.org
> > http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/zeek
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ICSI.Berkeley.EDU/pipermail/zeek/attachments/20191002/52eddc77/attachment-0001.html 


More information about the Zeek mailing list