[Bro] Bro cluster requirements and manager logging backlog bug

Azoff, Justin S jazoff at illinois.edu
Tue Dec 20 13:46:25 PST 2016


> On Dec 20, 2016, at 4:09 PM, Hovsep Levi <hovsep.sanjay.levi at gmail.com> wrote:
> 
> 
>  
> Can you check the following:
> 
> after bro has been running for a bit:
> 
>     wc -l *.log | sort -n
> 
> to show which log files are the largest
> 
> [bro at mgr /opt/bro_data/logs/current]$ wc -l *.log | sort -n
>   101981 known_hosts.log
>   106543 software.log
>   492146 x509.log
>   714576 ssl.log
>   795818 dns.log
>   886360 http.log
>   985519 weird.log
>  1936147 files.log
>  5121874 conn.log
>  11277601 total

Your weird.log is 1/5 the size of your conn.log and larger than your http log.. You should look into which weird name is showing up so much, you may have a serious problem with your tap configuration.  That is not normal at all.

>  
> the output of this command:
> 
>     top -b -n 1 -H -o TIME |grep bro:|head -n 20
> 
> or just run top and press H.  That should show all the bro logging threads (it works on linux at least)  They may show up truncated but it's enough to tell them apart.
> 
> 
> 
> [bro at mgr /opt/bro_data/logs/current]$ top -n -H -o time | grep bro
>  5672 bro        100   10 21076K  2796K RUN    19   6:35  62.79% gzip
>  5858 bro         95    5   510M   257M RUN     8   3:26  61.38% bro
>  5785 bro         95    5  2373M  2058M CPU11  11   3:20  59.08% bro
>  5743 bro         87    0  7897M  7743M RUN    14   3:19  52.78% bro{bro}
>  5743 bro         88    0  7897M  7743M CPU0    0   3:18  55.18% bro{bro}
>  5816 bro         40    0  5298M  1158M nanslp 23   1:59  23.29% bro{bro}
>  5743 bro         37    0  7897M  7743M uwait   4   1:32  23.58% bro{bro}

Ah, I guess that doesn't work on freebsd, It would have output thread names like

bro: conn/Log
bro: dns/Log

It looks like the bulk of your logging load is coming from files+conn and weird though, so what you can do is cut down the volume of those logs to get your cpu to be happy.

>  
> What model/count CPU does your manager have?
> 
> 
> Four of these with 32 GB per NUMA node.
> 
> Processor Information
>         Socket Designation: CPU1
>         Type: Central Processor
>         Family: Opteron 6200

Ah!!! This is part of your problem.  Every site we have worked with in the past year or so that was having serious manager performance issues was using the crazy high core count AMD systems.  While they perform well when you have a heavily threaded task (And I bet they do, we have an entire supercomputer filled with 40,000 of them) the bro logger only has few heavyweight threads and just does not work well on these processors.

That said, you can probably get this working acceptably though.  There are two options for this:

* filter some noisy log lines, which will cause them to not be logged at all.
* split heavy streams into multiple log files, which will let the logger process dedicate a logging thread to each part.

I would start by trying some of these config fragments that split log files apart:

# Split files log into files and files_certs log
event bro_init()
{
    Log::remove_default_filter(Files::LOG);
    Log::add_filter(Files::LOG, [
        $name = "files-split",
        $path_func(id: Log::ID, path: string, rec: Files::Info) = {
            if (rec?$mime_type && rec$mime_type == "application/pkix-cert")
                return "files_certs";
            return "files";
        }
    ]);
}

(you can probably just ignore those lines completely since the x509 log is more useful)

#Split conn into conn and conn_dns
event bro_init()
{

    Log::remove_default_filter(Conn::LOG);
    Log::add_filter(Conn::LOG, [
        $name = "conn-split",
        $path_func(id: Log::ID, path: string, rec: Conn::Info) = {
            if (rec?$service && "dns" in rec$service)
                return "conn_dns";
       
            return "conn";
        }
    ]);
}

#Split http.log into directions
event bro_init()
{
    Log::remove_default_filter(HTTP::LOG);
    Log::add_filter(HTTP::LOG, [
        $name = "http-directions",
        $path_func(id: Log::ID, path: string, rec: HTTP::Info) = {
            local l = Site::is_local_addr(rec$id$orig_h);
            local r = Site::is_local_addr(rec$id$resp_h);

            if(l && r)
                return "http_internal";
            if (l)
                return "http_outbound";
            else
                return "http_inbound";
        }
    ]);
}

You could also do the directions thing for the conn.log as well.

If your network is anything like ours, your conn.log is 90% scan attempts to tcp port 23 from IoT devices, splitting that out to a separate log file of filtering it entirely would probably help more than anything.

You can also take a look at the filter* scripts that are at https://github.com/michalpurzynski/bro-gramming



-- 
- Justin Azoff






More information about the Bro mailing list