[Bro] Best Way to Grab Unique Domains and IPs for Rotation Interval

Jason Batchelor jxbatchelor at gmail.com
Mon Jan 26 07:35:03 PST 2015

Hello all:

I would like to have a rotated pair of files that simply logs unique IPs
and domains seen for a given interval (in my case 30 minutes). In
evaluating ways to do this - I noted the known_hosts.bro script to be a
great reference point, so I cooked something up that I thought would do
this for me.

My script is as follows (it is heavily based on the known_hosts script).:
# Jason Batchelor
# 1/26/2015
# Log unique IPs and domains for a given interval

@load base/utils/directions-and-hosts
module Unique;
export {
        ## The logging stream identifiers.
        redef enum Log::ID += { IPS_LOG };
        redef enum Log::ID += { DOMAINS_LOG };
        ## The record type which contains the column fields of the
unique_ips log.
        type UniqueIpInfo: record {
                ## The timestamp at which the host was detected.
                ts:      time &log;
                ## The address that was detected
                host:    addr &log;
        type UniqueDomainInfo: record {
                ts:             time &log;
                domain: string &log;
        # When the expire interval refreshes
        const UNIQUE_EXPIRE_INTERVAL = 30min &redef;
        ## The set of all known addresses/doamins to store for preventing
        ## logging of addresses.  It can also be used from other scripts to
        ## inspect if an address has been seen in use.
        ## Maintain the list of known domains/ips for 30 mins so that the
        ## of each individual address is logged each time logs are rotated.
        global unique_ips: set[addr] &create_expire=UNIQUE_EXPIRE_INTERVAL
&synchronized &redef;
        global unique_domains: set[string]
&create_expire=UNIQUE_EXPIRE_INTERVAL &synchronized &redef;
        ## An event that can be handled to access the
        ## record as it is sent on to the logging framework.
        global log_unique_ips: event(rec: UniqueIpInfo);
        global log_unique_domains: event(rec: UniqueDomainInfo);
event bro_init()
        Log::create_stream(Unique::IPS_LOG, [$columns=UniqueIpInfo,
        Log::create_stream(Unique::DOMAINS_LOG, [$columns=UniqueDomainInfo,
event new_connection(c: connection) &priority=5
        local id = c$id;
        for ( host in set(id$orig_h, id$resp_h) )
                if ( host !in unique_ips )
                        add unique_ips[host];
                        Log::write(Unique::IPS_LOG, [$ts=network_time(),
event dns_request(c: connection, msg: dns_msg, query: string, qtype: count,
qclass: count) &priority=5
                if ( query !in unique_domains )
                        add unique_domains[query];
[$ts=network_time(), $domain=query]);

My efforts appear to be in vain however, because while this works as
designed against a standalone pcap file - it does not produce 100% reliable
results when run on the wire.

I encountered the following issues:

* Duplicate IP addresses and domains being logged to the same file despite
the scripted logic designed to prevent this. I confirmed this by simply
grepping for certain IPs and domains in the generated log file. There were
two that appeared - where in reality it should be one.

grep login.yahoo.com unique_domains.log | awk 'BEGIN {FS=OFS="\t"}
{$1=strftime("%D %T",$1)} {print}'
01/26/15 15:01:23       login.yahoo.com
01/26/15 15:09:04       login.yahoo.com

 FWIW - There were 10 requests for that domain in the dns.log, so there are
some dupes not making it in, but clearly not all. Similar behavior was
present for IPs as well.

* Some IPs/Domains do not appear to be getting logged outright. I parsed
out unique IPs from the conn.log, sorted and uniqued them, then stacked
them up against output from those generated from my script and there were
significant differences in filesizes. The output from the sorted/uniqued
conn.log was much greater and I can only assume that represents data gaps.

I was curious and decided to check out the log file for known_hosts. I
noted here as well the same kind of issues I was experiancing. Where
duplicate host records were being written.

 grep '$' known_hosts.log | awk 'BEGIN {FS=OFS="\t"}
{$1=strftime("%D %T",$1)} {print}'
01/26/15 15:00:17
01/26/15 15:00:17
01/26/15 15:00:17
01/26/15 15:00:17
01/26/15 15:00:17

Is there a better way for me to be doing this? My criteria is simple - I
want a list of unique IPs and domains for a given time interval. I don't
care if the connection was established or not. Ideally, if it is in the
conn.log/dns.log at some point durring the log interval it should be in one
of my logs once as well.

I would ideally like to be doing this with Bro. Right now I have a
horrendous cron/bash command that is executed against archived conn/dns
logs to get me the data I need. I really don't like that approach and would
even be willing to settle for kicking off my horrendous bash command from
within a Bro script if there were some 'log complete' event for my two log

Any assistance is much appreciated!

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ICSI.Berkeley.EDU/pipermail/bro/attachments/20150126/8075a07b/attachment.html 

More information about the Bro mailing list