[Bro] Help with searching logs

Castle, Shane scastle at bouldercounty.org
Wed Apr 3 13:15:18 PDT 2013


Just for grins, I worked on the conversation angle a bit. Here's what I came up with as a first hack.

$ bro-cut id.orig_h id.resp_h orig_bytes resp_bytes </nsm/bro/logs/current/conn.log | awk -f convlist.awk | sort -rnk 2 | head -20

convlist.awk:
{
tot=0;
if ($3 != "-") {tot=$3};
if ($4 != "-") {tot+=$4};
print $1".."$2, tot;
}

Output:
166.213.168.9..192.168.11.30 6369575986
166.213.168.8..192.168.11.30 4304514612
192.168.61.21..193.120.199.16 4294967294
70.208.5.91..192.168.13.45 3532074861
192.168.56.166..23.7.65.224 3067067778
166.213.168.22..192.168.11.30 1981760165
174.232.193.204..172.31.251.32 1934543198
166.213.168.22..192.168.11.30 1853700164
166.213.168.22..192.168.11.30 1848753206
166.213.168.22..192.168.11.30 1757683563
166.213.168.22..192.168.11.30 1706006024
166.213.168.22..192.168.11.30 1657117082
166.213.168.22..192.168.11.30 1612028622
166.213.168.22..192.168.11.30 1607600859
166.213.168.22..192.168.11.30 1584767941
166.213.168.22..192.168.11.30 1543389093
166.213.168.22..192.168.11.30 1533513278
166.213.168.22..192.168.11.30 1467372828
166.213.168.22..192.168.11.30 1254028676
192.168.56.166..23.7.65.224 1214734460

I used ".." to join the two IP addresses together so if I wanted to I could extract and sort without changing the original sort command very much. Essentially, this shows the top 20 separate conversations since the current conn.log was started. Notice that several of them have the same endpoints. If I start summarizing using "sort -t '.' <etc>", then I get

$ bro-cut id.orig_h id.resp_h orig_bytes resp_bytes </nsm/bro/logs/current/conn.log | awk -f convlist.awk | sort -t '.' -k 1,1n -k 2,2n -k 3,3n -k 4,4n -k 6,6n -k 7,7n -k 8,8n -k 9,9n | awk -f toptalk.awk | sort -rnk 2 | head -20
166.213.168.22..192.168.11.30 45782918899
166.213.168.9..192.168.11.30 14376535154
192.168.56.166..23.7.65.224 9097513513
172.31.251.13..218.30.26.68 5826064097
172.31.251.171..192.168.14.81 4321601956
166.213.168.8..192.168.11.30 4304514612
192.168.61.21..193.120.199.16 4294967294
70.208.5.91..192.168.13.45 3532111771
192.168.21.103..23.7.65.224 3059490250
166.213.168.30..192.168.11.30 2285013919
166.137.181.202..192.168.13.45 2083142285
174.232.193.204..172.31.251.32 1934845607
172.31.251.13..218.30.26.70 1198383801
192.168.226.96..72.21.81.253 1175467069
161.97.154.10..172.31.251.32 1137167682
174.239.96.134..192.168.13.45 920925068
192.168.171.25..204.107.64.58 488202352
172.31.251.71..192.168.12.116 477085260
192.168.6.97..74.125.225.161 459930229
172.31.251.13..192.168.12.73 412551956

So, it can be done. Interesting exercise. Note that the "toptalk.awk" script referenced in the command is simply the awk script from before (see below in this email thread), created as a separate file.

Just so you know, the third line from the top in the output is to a Twitter address. I need to find out wtf is going on there. 9GB from/to Twitter? Really?

-- 
Shane Castle
Data Security Mgr, Boulder County IT

-----Original Message-----
From: Michael Bower [mailto:mbower2 at gmail.com] 
Sent: Wednesday, April 03, 2013 09:35
To: Castle, Shane
Cc: Bro Mailing List
Subject: RE: [Bro] Help with searching logs

Thanks! This will give me something to go on.

On Apr 3, 2013 11:30 AM, "Castle, Shane" <scastle at bouldercounty.org> wrote:


	Seems as though an awk update tightened some of the syntax restrictions. This script works:
	
	bro-cut id.orig_h orig_bytes < conn.log             \
	    | sort -t '.' -k 1,1n -k 2,2n -k 3,3n -k 4,4n   \
	    | awk 'BEGIN { size=0;host="" }                 \
	           { if (host != $1) {                      \
	                 if (size != 0)                     \
	                     print host, size;              \
	                  host=$1;                          \
	                  if ($2 != "-") {                  \
	                     size=$2 }                      \
	                  else {                            \
	                     size=0; }                      \
	              } else {                              \
	                  if ($2 != "-")                    \
	                     size += $2 }                   \
	            }                                       \
	            END {                                   \
	                if (size != 0)                      \
	                     print host, size               \
	                }'                                  \
	    | sort -rnk 2                                   \
	    | head -n 10
	
	The conversations could be done by a script that takes conn.log as input, merges orig_h and resp_h into one field while adding together their bytes, sorting on that, then putting the result through the same awk script. The output might look something like this:
	
	1.2.3.4|5.6.7.8 123456789
	...
	
	depending on what you choose for your conjoining character. Also, note that if there are multiple conversations between two IP addresses then you will be adding up all those conversations and presenting the result as one line - that might not be exactly what you are looking for.
	
	Welcome to the world of IDS data mining. Sharpen your awk/sort/sql/perl/bash skills - they will come in very handy.
	
	--
	Shane Castle
	Data Security Mgr, Boulder County IT
	
	
	-----Original Message-----
	From: bro-bounces at bro.org [mailto:bro-bounces at bro.org] On Behalf Of Castle, Shane
	Sent: Wednesday, April 03, 2013 08:59
	To: 'Michael Bower'; 'bro at bro.org'
	Subject: Re: [Bro] Help with searching logs
	
	Hm, I get a syntax error in that script now. Let me figure this out...
	
	--
	Shane Castle
	Data Security Mgr, Boulder County IT
	
	
	-----Original Message-----
	From: bro-bounces at bro.org [mailto:bro-bounces at bro.org] On Behalf Of Castle, Shane
	Sent: Wednesday, April 03, 2013 08:31
	To: 'Michael Bower'; 'bro at bro.org'
	Subject: Re: [Bro] Help with searching logs
	
	The script is lying to you. Here's the correct script:
	
	bro-cut id.orig_h orig_bytes < conn.log             \
	    | sort -t '.' -k 1,1n -k 2,2n -k 3,3n -k 4,4n   \
	    | awk 'BEGIN { size=0;host="" }                 \
	           { if (host != $1) {                      \
	                 if (size != 0)                     \
	                     print host, size;              \
	                  host=$1;                          \
	                  if ($2 != "-")                    \
	                     size=$2                        \
	                  else                              \
	                     size=0                         \
	              } else                                \
	                  if ($2 != "-")                    \
	                     size += $2                     \
	            }                                       \
	            END {                                   \
	                if (size != 0)                      \
	                     print host, size               \
	                }'                                  \
	    | sort -rnk 2                                   \
	    | head -n 10
	
	Since this script summarizes, having timestamps in there would not be useful. And, if you want to change the logic to responder, change "orig" to "resp" in the first line. Yes, it would be nice to have the top conversations, not just the top talkers, which would combine both orig and resp, but I'm not sure the result would justify the work.
	
	A few months ago I went through this and we were hoping that the doc would have been changed to show a correct script, but it has not been, apparently.
	
	--
	Shane Castle
	Data Security Mgr, Boulder County IT
	
	-----Original Message-----
	From: bro-bounces at bro.org [mailto:bro-bounces at bro.org] On Behalf Of Michael Bower
	Sent: Tuesday, April 02, 2013 18:19
	To: bro at bro.org
	Subject: [Bro] Help with searching logs
	
	-----BEGIN PGP SIGNED MESSAGE-----
	Hash: SHA512
	
	
	Im still learning, so bare with me.  I ran the following command:
	
	bro-cut id.orig_h orig_bytes < conn.log             \
	    | sort                                          \
	    | awk '{ if (host != $1) {                      \
	                 if (size != 0)                     \
	                     print $1, size;                \
	                  host=$1;                          \
	                  size=0                            \
	              } else                                \
	                  size += $2                        \
	            }                                       \
	            END {                                   \
	                if (size != 0)                      \
	                     print $1, size                 \
	                }'                                  \
	    | sort -rnk 2                                     \
	    | head -n 10
	
	
	This worked well to show me the top 10 hosts (originators).  What Im
	trying to do is show the top 10 hosts and the time (ts).  Maybe show
	the resp_bytes field too, if that is possible.  Any help would be
	greatly appreciated.
	
	Thanks!
	- --
	
	Mike
	
	
	-----BEGIN PGP SIGNATURE-----
	Comment: GPGTools - http://gpgtools.org
	
	iQEcBAEBCgAGBQJRW3WBAAoJEIAKCPjZh/yXUF4H/RhFuVQy6bT3Z8Z1k2oMDBGt
	TYFAfsyeXcnf9dOl3NFGEIlifjDMZ/gK5kBVWo/FYSHGWHrCT0+ICcsjwLroRP/E
	rn1StPS7ozlSiY2ZJSG0UAUCZX0HZ0ujvmNo8UvnoAR75cORq8Y08cU4XpLjqhxc
	d4xu3G+HnhyzjKAiG6xtqDpK2Z3bnjJzyWEqZCoYDzNqtcYnrxYjcKa0kX9rBhUr
	uV6upZ9OHIdf25EYCVfjDrKPSUaRhSAnTVtYE0+OQRA0OPpnG3rLWFSK2yjkTbNG
	AzKXfhJZ0PWmUWkeD6Bzf2TCNqfyLchNSScm2atA/dhTRBV3JhHIhwIcejXr6sk=
	=23Kd
	-----END PGP SIGNATURE-----
	_______________________________________________
	Bro mailing list
	bro at bro-ids.org
	http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/bro
	
	_______________________________________________
	Bro mailing list
	bro at bro-ids.org
	http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/bro
	
	_______________________________________________
	Bro mailing list
	bro at bro-ids.org
	http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/bro
	





More information about the Bro mailing list