[Bro] Logging and memory leak

Hovsep Levi hovsep.sanjay.levi at gmail.com
Fri Feb 3 16:48:39 PST 2017


No, no custom scripts.  I found the cluster overwhelmed again today with
massive virtual memory usage and an hour after restarting it the same
condition returns.

I'm using a single logger since the last time.  It seems when using a Kafka
only export a single logger works fine as the event timestamps arriving at
Kafka are near realtime.  (the "ts" for conn, http, etc.)

The logger is using 47761, manager is 47762.  I took a sample of 5000
packets for each during the high memory usage and it looks like the manager
is still receiving logs of some sort dealing with x509 certificates.


tcpdump -A -r Bro__Manager_port_47662.pcap | egrep -io '[A-Za-z_:-]{10,}' |
sort | uniq -c | sort -rn | head -10
reading from file Bro__Manager_port_47662.pcap, link-type EN10MB (Ethernet)
2852 certificate
2064 Notice::begin_suppressionA
1800 Conn::IN_ORIG
 739 Authentication
 685 Identifier
 681 Encipherment
 660 validation
 646 Corporation



Config:

event bro_init()
        Analyzer::disable_analyzer(Analyzer::ANALYZER_SYSLOG);
@load misc/loaded-scripts
@load tuning/defaults
@load misc/stats
@load misc/capture-loss
@load frameworks/software/vulnerable
@load frameworks/software/version-changes
@load-sigs frameworks/signatures/detect-windows-shells
@load protocols/ftp/software
@load protocols/smtp/software
@load protocols/ssh/software
@load protocols/http/software
@load protocols/dns/detect-external-names
@load protocols/ftp/detect
@load protocols/conn/known-hosts
@load protocols/conn/known-services
@load protocols/ssl/known-certs
@load protocols/ssl/validate-certs
@load protocols/ssl/log-hostcerts-only
@load protocols/ssh/geo-data
@load protocols/ssh/detect-bruteforcing
@load protocols/ssh/interesting-hostnames
@load protocols/http/detect-sqli
@load frameworks/files/hash-all-files
@load frameworks/files/detect-MHR
@load frameworks/intel/seen
@load frameworks/intel/do_notice
@load local-intel.bro
@load Bro/Kafka/logs-to-kafka.bro






Snapshots before the first restart at


@ Wed Feb  1 19:57:28 UTC 2017

[bro at mgr /opt/bro]$ bin/broctl top manager logger-1 && echo "" && date
Name         Type    Host             Pid     Proc    VSize  Rss  Cpu   Cmd
logger-1     logger  10.1.1.1   25469   parent  849M   330M 171%  bro
logger-1     logger  10.1.1.1   25523   child   458M    69M  38%  bro
manager      manager 10.1.1.1   25685   child   494M   261M 100%  bro
manager      manager 10.1.1.1   25543   parent    9G     1G  27%  bro


@ Fri Feb  3 04:05:04 UTC 2017

[bro at mgr /opt/bro]$ bin/broctl top manager logger-1 && echo "" && date
Name         Type    Host             Pid     Proc    VSize  Rss  Cpu   Cmd
logger-1     logger  10.1.1.1   25469   parent  793M   284M  99%  bro
logger-1     logger  10.1.1.1   25523   child   466M    80M  25%  bro
manager      manager 10.1.1.1   25685   child   494M   261M 100%  bro
manager      manager 10.1.1.1   25543   parent    9G     1G  35%  bro


@ Fri Feb  3 21:15:51 UTC 2017

[bro at mgr /opt/bro]$ bin/broctl top manager logger-1 && echo "" && date
Name         Type    Host             Pid     Proc    VSize  Rss  Cpu   Cmd
logger-1     logger  10.1.1.1   25469   parent  813M   316M 178%  bro
logger-1     logger  10.1.1.1   25523   child   466M    83M  38%  bro
manager      manager 10.1.1.1   25685   child     8G     8G 100%  bro
manager      manager 10.1.1.1   25543   parent  1222G   87G  99%  bro





last pid: 33713;  load averages:  7.35,  6.19,
5.69
up 6+20:45:54  21:32:43
49 processes:  4 running, 45 sleeping
CPU:  8.0% user,  0.3% nice,  5.7% system,  0.2% interrupt, 85.8% idle
Mem: 31G Active, 69G Inact, 20G Wired, 1884K Cache, 5167M Free
ARC: 14G Total, 2327M MFU, 11G MRU, 155K Anon, 58M Header, 555M Other
Swap: 12G Total, 17M Used, 12G Free

  PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME    WCPU
COMMAND
25469 bro         172  20    0  1089M   591M select 11 118.3H 258.45% bro
25685 bro           1 108    5  9174M  9035M CPU22  22  63.0H 100.00% bro
25543 bro           7  20    0  1275G 93626M uwait   2  21.1H 100.00% bro
25523 bro           1  92    5   466M 85900K CPU3    3  18.0H  51.27% bro
33713 bro           1  78    0 97192K 17284K CPU32  32   0:02  21.09%
python2.7
33709 bro           1  52    0 97192K 17300K piperd 31   0:01   2.69%
python2.7





I restarted the cluster at 21:42 and see the same behavior almost
immediately, within an hour:

@ Fri Feb  3 22:27:38 UTC 2017

[bro at mgr /opt/bro]$ bin/broctl top manager logger-1 && echo "" && date
Name         Type    Host             Pid     Proc    VSize  Rss  Cpu   Cmd
logger-1     logger  10.1.1.1   37525   parent  647M   124M  15%  bro
logger-1     logger  10.1.1.1   37576   child   458M    65M   2%  bro
manager      manager 10.1.1.1   37653   child   506M   199M 100%  bro
manager      manager 10.1.1.1   37600   parent  250G    18G 100%  bro



@ Sat Feb  4 00:22:01 UTC 2017

[bro at mgr /opt/bro]$ bin/broctl top manager logger-1 && echo "" && date
Name         Type    Host             Pid     Proc    VSize  Rss  Cpu   Cmd
logger-1     logger  10.1.1.1   37525   parent  663M   134M  14%  bro
logger-1     logger  10.1.1.1   37576   child   458M    65M   2%  bro
manager      manager 10.1.1.1   37653   child   506M   237M 100%  bro
manager      manager 10.1.1.1   37600   parent  640G    46G  38%  bro



last pid: 75833;  load averages:  2.31,  2.47,
2.57
up 6+23:49:38  00:36:27
47 processes:  2 running, 45 sleeping
CPU:  1.2% user,  0.2% nice,  2.4% system,  0.1% interrupt, 96.1% idle
Mem: 20G Active, 26G Inact, 22G Wired, 1752K Cache, 57G Free
ARC: 8159M Total, 2242M MFU, 5323M MRU, 64K Anon, 49M Header, 545M Other
Swap: 12G Total, 16M Used, 12G Free

  PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME    WCPU
COMMAND
37653 bro           1 108    5   506M   237M CPU33  33 172:00 100.00% bro
37600 bro           7  20    0   640G 47116M uwait  25 143:23  27.88% bro
37525 bro         162  20    0   663M   134M select 45 106:45  12.60% bro
37576 bro           1  25    5   458M 67404K select  6   5:03   1.46% bro



@ Sat Feb  4 00:38:50 UTC 2017

Name         Type    Host             Pid     Proc    VSize  Rss  Cpu   Cmd
logger-1     logger  10.1.1.1   37525   parent  663M   134M  13%  bro
logger-1     logger  10.1.1.1   37576   child   458M    65M   1%  bro
manager      manager 10.1.1.1   37653   child   506M   237M  99%  bro
manager      manager 10.1.1.1   37600   parent  640G    46G  37%  bro




With pure Kafka export there's very few logs on the manager filesystem.  I
don't think this is related to the manager issue but fwiw there's a steady
stream of kafka errors even though logging seems to be working:

>From /opt/bro_data/logs/current/stderr.log -> (/opt/bro_data/spool/logger-1)

%3|1486168762.913|ERROR|rdkafka#producer-15| 10.1.1.5:9092/bootstrap:
Receive failed: Disconnected
%3|1486168773.922|FAIL|rdkafka#producer-18| 10.1.1.5:9092/bootstrap:
Receive failed: Disconnected
%3|1486168773.922|ERROR|rdkafka#producer-18| 10.1.1.5:9092/bootstrap:
Receive failed: Disconnected
%3|1486168775.662|FAIL|rdkafka#producer-19| 10.1.1.5:9092/bootstrap:
Receive failed: Disconnected
%3|1486168775.663|ERROR|rdkafka#producer-19| 10.1.1.5:9092/bootstrap:
Receive failed: Disconnected
%3|1486168778.818|FAIL|rdkafka#producer-21| 10.1.1.5:9092/bootstrap:
Receive failed: Disconnected
%3|1486168778.819|ERROR|rdkafka#producer-21| 10.1.1.5:9092/bootstrap:
Receive failed: Disconnected
(...)
%3|1486169053.779|ERROR|rdkafka#producer-8| 10.1.1.5:9092/bootstrap:
Receive failed: Disconnected
%3|1486169082.658|FAIL|rdkafka#producer-23| 10.1.1.5:9092/bootstrap:
Receive failed: Disconnected
%3|1486169082.659|ERROR|rdkafka#producer-23| 10.1.1.5:9092/bootstrap:
Receive failed: Disconnected

[bro at mgr /opt/bro_data/logs]$ date -r 1486169082
Sat Feb  4 00:44:42 UTC 2017

[bro at mgr /opt/bro_data/logs]$ date
Sat Feb  4 00:45:08 UTC 2017
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ICSI.Berkeley.EDU/pipermail/bro/attachments/20170204/21b01db4/attachment-0001.html 


More information about the Bro mailing list