[Bro] minor documentation error

Castle, Shane scastle at bouldercounty.org
Mon Dec 31 10:46:50 PST 2012

I think I may have this script working correctly now. There were several errors in the original script: the first sort, the last sort, and in the awk script.

Here is the final, I believe correct version:

bro-cut id.orig_h orig_bytes < conn.log             \
    | sort -t '.' -k 1,1n -k 2,2n -k 3,3n -k 4,4n   \
    | awk 'BEGIN { size=0;host="" }                 \
           { if (host != $1) {                      \
                 if (size != 0)                     \
                     print host, size;              \
                  host=$1;                          \
                  if ($2 != "-")                    \
                     size=$2                        \
              } else                                \
                  if ($2 != "-")                    \
                     size += $2                     \
            }                                       \
            END {                                   \
                if (size != 0)                      \
                     print host, size               \
                }'                                  \
    | sort -rnk 2                                   \
    | head -n 10

Note the "print" command in the awk script. Originally, it was "print $1, size". This is incorrect since it will print the *current* field and not the *last* field, causing the sum for that host to be associated with the next address rather than the last one. The first sort has been changed so that it will do what we really want, and the last sort has been changed to sort reverse numerically. I added in the test for the bytes to be "-", but that might be superfluous.

My old PA senses were tweaked by the lack of variable initialization, and the first assignment to size glared at me as well. As it was originally written, the first time the IP address changed, the size would be set to zero and the first value of orig_bytes would be thrown away. Testing has shown that the above script works correctly.

Shane Castle
Data Security Mgr, Boulder County IT

-----Original Message-----
From: bro-bounces at bro-ids.org [mailto:bro-bounces at bro-ids.org] On Behalf Of Castle, Shane
Sent: Monday, December 31, 2012 10:33
To: Liam Randall; bro at bro-ids.org
Subject: Re: [Bro] minor documentation error

I found another issue with this script. The Unix/POSIX sort command will not sort IP addresses correctly unless it is told to explicitly: 
"sort -t '.' -k 1,1n -k 2,2n -k 3,3n -k 4,4n". This defect causes the script to lie about who is using how many bytes.

If you want a nice example, just access a reasonably busy Bro system, go to one of the compressed log directories, and try:

"zcat conn.*.gz | bro-cut id.orig_h orig_bytes | sort | less"

You will see it sorting addresses like and the same. This causes the subsequent awk script to fail rather badly.

And that brings up another point: many times the orig_bytes field will be nonnumeric, containing a "-" or a blank instead of a number. I don't know how the awk script deals with these, offhand. I am trying to find out, and create a true toptalkers script that really works.

Shane Castle
Data Security Mgr, Boulder County IT

-----Original Message-----
From: bro-bounces at bro-ids.org [mailto:bro-bounces at bro-ids.org] On Behalf Of Liam Randall
Sent: Friday, December 28, 2012 18:11
To: bro at bro-ids.org
Subject: [Bro] minor documentation error

Came up on the SO list.




Solution for:


What are the top 10 hosts (originators) that send the most traffic?


The final sort should be "sort-rnk 2"


Credits Shane Castle


Happy Holidays All,



Bro mailing list
bro at bro-ids.org

More information about the Bro mailing list