[Bro] BRO Logger crashing due to large DNS log files

Azoff, Justin S jazoff at illinois.edu
Mon Aug 20 07:30:35 PDT 2018


> On Aug 19, 2018, at 11:12 AM, Ron McClellan <Ron_McClellan at ao.uscourts.gov> wrote:
> 
> All,
>  
>                 Having an issue with the bro logger crashing due to large volumes of DNS log traffic, 20-30GB an hour.

Is it actually crashing?  Are you getting a crash report at all?  From the filenames you listed it looks more like log rotation is failing.

>  This is completely a local configuration, on a system with super-fast flash storage, 64 cores, 256GB RAM running BRO 2.5.4.  If I disable DNS logging, everything works fine without issue.  When I enable it, I get the results below.  I thought it might be an issue with gzipping the old logs, so I replaced the standard gzip with pigz and I can manually compress the 30+ gig files in seconds, so don’t think that is the issue.

It could be related to the gzipping.  The way log rotation works is not great.. all log files get compressed at the same time which can cause some thrashing.

If you set

    compresslogs = 0

in broctl.cfg so that broctl does not gzip the logs at all, does the problem go away?

You could do something like that, and then run a script like:

while true; do
    for f in /usr/local/bro/logs/201*/*.log ; do
        gzip $f
    done
    sleep 60
done

to compress the logs in the background serially.

Another thing to keep an eye on is if your logger is able to keep up with the volume of data.  This script is a plugin for munin, but you can run it directly:

#!/usr/bin/env python
import os
import sys
import time

DEFAULT_LOG = "/usr/local/bro/logs/current/dns.log"

def config():
    print """
graph_category network

graph_title Bro log lag
graph_vlabel lag
graph_args --base 1000 --vertical-label seconds --lower-limit 0
graph_info The bro log lag

lag.label lag
lag.info log message lag in seconds
lag.min 0
lag.warning 0:15
lag.critical 0:60
""".strip()

    return 0

def get_latest_time(fn):
    f = open(fn)

    f.seek(-4096, os.SEEK_END)
    end = f.read().splitlines()[1:-1] #ignore possibly incomplete first and last lines
    times = [line.split()[0] for line in end]
    timestamps = map(float, times)
    latest = max(timestamps)
    return latest

def lag(fn):
    lag = 500
    for x in range(3):
        try :
            latest = get_latest_time(fn)
            now = time.time()
            lag = now - latest
            break
        except (IOError, ValueError):
            #File could be rotating, wait and try again
            time.sleep(5)
    print "lag.value %f" % lag

if __name__ == "__main__":

    filename = os.getenv("BRO_LAG_FILENAME", DEFAULT_LOG)

    if sys.argv[1:] and sys.argv[1] == 'config':
        config()
    else:
        lag(filename)

It will output something like

lag.value 2.919352

A normal value should be about 5, anything under 20 is probably ok.  If it's 500 and climbing, that's a problem.

Also..

> -rw-r--r--. 1 root root 6.8G Aug 18 12:00 weird-18-08-18_11.00.00.log
> -rw-r--r--. 1 root root 2.5G Aug 18 12:18 weird-18-08-18_12.00.00.log

That's a LOT of weird.log, what's going on there?

— 
Justin Azoff





More information about the Bro mailing list