[Bro] Quick smtp-url-extraction question

Aashish Sharma asharma at lbl.gov
Thu Aug 14 07:30:52 PDT 2014


OK. Here is smtp-url-extraction scripts attached with this email. I apologize for the delays in sending. 

These scripts have been running for >  1 1/2 years so I can say they are fairly stable and should not cause any issues. 

1) Please configure site.bro (attached) as per your site specifics and add it to your site/local.bro file.

2) If you are running bro-2.2 or below please use: smtp-url-extraction.bro 

3) if you are running bro-2.3, use smtp-url-extraction-bloom.bro - it uses bloom filters to check against URL's in the http stream. So its less taxing on memory compared to (2). 

This script should log urls embedded in smtp traffic into a file called smtpurl_links.log. Also there are configuration variables such as suspicious_text_in_url, suspicious_text_in_body etc. You can look into smtp-embedded-url.bro (and -bloom.bro) to see kinds of notices it would generate. 

This script is part of a bigger smtp suite. I will try to collect other scripts and send those out as well. 

Please let me know if you have any questions or have issues running these scripts. 

Thanks, 
Aashish 
LBNL 

On Thu, Aug 14, 2014 at 01:51:30PM +0000, Hosom, Stephen M wrote:
> 
>    All,
> 
> 
>    I submitted a pull request last week for this. You could technically grab
>    the script and run it. Since I’m not part of the Bro team though, I can’t
>    promise that this will continue to work.
> 
> 
>    [1]https://github.com/bro/bro/pull/10
> 
> 
>    I run a variation of this script in my production environment right now.
>    Keep  in mind that it is normally a bad plan to extend an internal Bro
>    module. Since there’s a pretty high demand for it, if you’d like to modify
>    this  to not extend the internal SMTP modules and be separate, it is a
>    relatively short task (about 15 minutes).
> 
> 
>    Lastly, this is provided as-is with no warranty, etc. etc.
> 
> 
>    Thanks,
> 
>    Stephen
> 
> 
>    From: bro-bounces at bro.org [mailto:bro-bounces at bro.org] On Behalf Of Lankau,
>    John
>    Sent: Thursday, August 14, 2014 8:58 AM
>    To: James Lay; bro at bro-ids.org
>    Subject: Re: [Bro] Quick smtp-url-extraction question
> 
> 
>    Seth,
> 
> 
>    +100
> 
> 
>    I just wanted to add that I think that script that logs SMTP URLs would get
>    a lot of use in our environment as well.  It’s been an elusive data point,
>    but  one  we  really would like to have.  We’ve been having high-level
>    discussions on how to implement something that does this exact process in
>    our office, so I’d be very interested in using this script once it’s ready
>    as well.
> 
> 
>    Thanks!
> 
>    --John
> 
> 
>    From: [2]bro-bounces at bro.org [[3]mailto:bro-bounces at bro.org] On Behalf Of
>    James Lay
>    Sent: Thursday, August 07, 2014 7:50 PM
>    To: [4]bro at bro-ids.org
>    Subject: Re: [Bro] Quick smtp-url-extraction question
> 
> 
>    On Thu, 2014-08-07 at 13:39 -0400, Seth Hall wrote:
> 
> On Aug 7, 2014, at 1:30 PM, James Lay <[5]jlay at slave-tothe-box.net> wrote:
> 
> > I would absolutely love a script that would log urls....we all know that quot
> ed-printable and bas364 shenanigans may get missed
> 
> Much of that should be handled automatically by the mime analyzer (I'm not sure
>  of the limits of that offhand).
> 
> > , but every little bit helps..thanks a bunch Seth.
> 
> I'll see if I can get to it soon.
> 
>   .Seth
> 
> --
> Seth Hall
> International Computer Science Institute
> (Bro) because everyone has a network
> [6]http://www.bro.org/
> 
> 
>    Thanks again Seth.
>    James
> 
> References
> 
>    1. https://github.com/bro/bro/pull/10
>    2. mailto:bro-bounces at bro.org
>    3. mailto:bro-bounces at bro.org
>    4. mailto:bro at bro-ids.org
>    5. mailto:jlay at slave-tothe-box.net
>    6. http://www.bro.org/

> _______________________________________________
> Bro mailing list
> bro at bro-ids.org
> http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/bro


-- 
Aashish Sharma	(asharma at lbl.gov) 				 
Cyber Security, 
Lawrence Berkeley National Laboratory  
http://go.lbl.gov/pgp-aashish 
Office: (510)-495-2680  Cell: (510)-612-7971
-------------- next part --------------
module SMTPurl;

export {

	    redef enum Log::ID += { Links_LOG };

	    type Info: record {
                ## When the email was seen.
                ts:   time    &log;
                ## Unique ID for the connection.
                uid:  string  &log;
                ## Connection details.
                id:   conn_id &log;
                ## url that was discovered.
		host: string &log &optional ; 
                url:  string  &log &optional;

        };


        redef enum Notice::Type += {
                ## Indicates that an MD5 sum was calculated for a MIME message.
                SMTP_Embeded_Malicious_URL,
		SMTP_Link_in_EMAIL_Clicked, 
		SMTP_Link_REFERRER_Clicked, 
		SMTP_Linked_BINARY_Download, 
		SMTP_Dotted_URL, 	
		SMTP_Suspicious_File_URL, 
		SMTP_Suspicious_Embedded_Text, 
		SMTP_WatchedFileType, 
		SMTP_Click_Here_Seen
	}; 
        

#		global url_dotted_pattern: pattern = /href.*http:\/\/([[:digit:]]{1,3}\.){3}[[:digit:]]{1,3}.*\"/ ; 
		global url_dotted_pattern: pattern = /([^"#]+)/; 

		const url_regex = /^([a-zA-Z\-]{3,5})(:\/\/[^\/?#"'\r\n><]*)([^?#"'\r\n><]*)([^[:blank:]\r\n"'><]*|\??[^"'\r\n><]*)/ &redef;

		global mail_links: table [string] of string &synchronized &create_expire=12 hrs &redef ; 
		global link_already_seen: set[string] &redef ; 
		global referrer_link_already_seen: set[string] ; 
		
		const suspicious_file_types: pattern = /\.rar$|\.exe$|\.zip$/ &redef; 
		const ignore_file_types: pattern = /\.gif$|\.png$|\.jpg$|\.xml$|\.PNG$|\.jpeg$|\.css$/ &redef; 

		redef link_already_seen += { "example.com", }; 
		
		const ignore_mail_originators: set[subnet] += { 1.2.3.4/24} &redef; 
		
		const ignore_mailfroms : pattern += /bro@|alerts/ &redef ; 
		const ignore_mails_to: set[string] = {"reports at example.com", } &redef ; 
		const ignore_site_links: pattern = /http:\/\/.*\.example\.gov\/|http:\/\/.*\.example\.net/ &redef ; 

		
		const suspicious_text_in_url = /googledoc|googledocs|wrait\.ru|webs\.com|jimdo\.com|yolasite\.com\// &redef ; 
		const suspicious_text_in_body = /[Pp][Ee][Rr][Ss][Oo][Nn][Aa][Ll] [Ee][Mm][Aa][Ll]|[Pp][Aa][Ss][Ss][Ww][Oo][Rr][Dd]|[Uu][Ss][Ee][Rr] [Nn][Aa][Mm][Ee]|[Uu][Ss][Ee][Rr][Nn][Aa][Mm][Ee]/ &redef ; 

	#redef Notice::policy += {
		  #[$pred(n: Notice::Info) = { return n$note == SMTPurl::SMTP_Embeded_Malicious_URL; }, $action = Notice::ACTION_EMAIL],  
		  #####[$pred(n: Notice::Info) = { return n$note == SMTPurl::SMTP_Click_Here_Seen; }, $action = Notice::ACTION_EMAIL],   ## too many false +ve
	#} ; 
} 

redef record connection += {
        smtp_url: Info &optional;
};


event bro_init() &priority=5
{
        Log::create_stream(SMTPurl::Links_LOG, [$columns=Info]);

} 


function extract_host(name: string): string
{
        local split_on_slash = split(name, /\//);
        local num_slash = |split_on_slash|;

## ash
        return split_on_slash[3];
}



## Extracts URLs discovered in arbitrary text.
function find_all_urls(s: string): string_set
    {
    return find_all(s, url_regex);
    }


## Extracts URLs discovered in arbitrary text without
## the URL scheme included.
function find_all_urls_without_scheme(s: string): string_set
{
	local urls = find_all_urls(s);
	local return_urls: set[string] = set();
	for ( url in urls )
		{
		local no_scheme = sub(url, /^([a-zA-Z\-]{3,5})(:\/\/)/, "");
		add return_urls[no_scheme];
		}

	return return_urls;
}



function log_smtp_urls(c:connection, url:string)
{
		local info: Info; 

		info$ts = c$smtp$ts;
               	info$uid = c$smtp$uid ;
                info$id = c$id ;
               	info$url = url;
		info$host = extract_host(url) ;  

              	c$smtp_url = info;
               
		Log::write(SMTPurl::Links_LOG, c$smtp_url);

} 


event mime_segment_data(c: connection, length: count, data: string) &priority=-5
{

	if(c$smtp?$mailfrom && ignore_mailfroms  in c$smtp$mailfrom)
	{	
	
		return ; 
	} 

	if (c$smtp?$to) 
	{  
		for (to in c$smtp$to) 
		{ 
			if (to in ignore_mails_to)
                        { 
				return ; 
			} 
		} 
	} 

	if ( ! c?$smtp ) return;

	#if (c$smtp?$to in ignore_mails_to) return ; 
	if (c$id$orig_h in ignore_mail_originators) return; 


	local mail_info:string; 

	if (c$smtp?$to && c$smtp?$subject) { 
                mail_info =  fmt ("uid=%s from=%s to=%s subject=%s", c$smtp$uid, c$smtp$from, c$smtp$to, c$smtp$subject);
        }   
        else { 
		mail_info =  fmt ("uid=%s from=%s", c$smtp$uid, c$smtp$from);
        } 

	local urls = find_all_urls(data) ; 

	for (link in urls){
#		local link =  sub(a,/(http|https):\/\//,"");
		if (link !in mail_links && ignore_file_types !in link )
		  { 
			mail_links[link] = mail_info ; 
			log_smtp_urls(c, link); 
			
			if ( suspicious_file_types in link)
			{ 
				NOTICE([$note=SMTP_WatchedFileType, $msg=fmt("Suspicious filetype embeded in URL %s from  %s", link, c$id$orig_h), $conn=c]); 
			} 
			
			if ( suspicious_text_in_url in link)
			{ 
				NOTICE([$note=SMTP_Embeded_Malicious_URL, $msg=fmt("Suspicious text embeded in URL %s from  %s", link, c$smtp$uid), $conn=c]); 
			} 
			
			if ( suspicious_text_in_body in data && /[Cc][Ll][Ii][Cc][Kk] [Hh][Ee][Rr][Ee]/ in data)
			{ 
				NOTICE([$note=SMTP_Click_Here_Seen, $msg=fmt("Click Here seen in the email %s from  %s", link, c$smtp$uid), $conn=c]); 
			} 

			if (/([[:digit:]]{1,3}\.){3}[[:digit:]]{1,3}.*/ in link )
			{ 
				#local url = split_all(data, /href.*\"http:\/\/([[:digit:]]{1,3}\.){3}[[:digit:]]{1,3}.*\"/); 
				#url[2]= sub(url[2], /^href=3D\"|href=\"/, "");
				#url[2]= sub(url[2], /\"$/, "");
				NOTICE([$note=SMTP_Dotted_URL, $msg=fmt("Embeded IP in URL %s from  %s", link, c$id$orig_h), $conn=c]);
			} 

		 } ## check link in mail_links 
	} 	## for  
}
 

event http_message_done(c: connection, is_orig: bool, stat: http_message_stat) &priority=-3
{ 
	local str = HTTP::build_url_http(c$http); 

	if (str in SMTPurl::mail_links && str !in SMTPurl::link_already_seen && ignore_file_types !in str && ignore_site_links !in str)
	{ 		
		NOTICE([$note=SMTPurl::SMTP_Link_in_EMAIL_Clicked, $msg=fmt("URL %s [%s]", str, SMTPurl::mail_links[str]), $conn=c]);
		add SMTPurl::link_already_seen[str] ; 
	} 	
	
	if (c$http?$referrer) 
	{ 

	local ref = c$http$referrer; 
		
		if (ref in SMTPurl::mail_links && ref !in SMTPurl::referrer_link_already_seen && ignore_file_types !in ref && ignore_site_links !in ref)
		{  
		fmt("Added %s from %s", SMTPurl::mail_links[ref],  ref); 
		} 
	} 

	## aashish

#                if (c$http?$md5 && str in SMTPurl::mail_links )
#                {
#                	NOTICE([$note=SMTP_Linked_BINARY_Download, $msg=fmt("%s %s %s", c$id$orig_h, c$http$md5, str),
#				$sub=c$http$md5, $conn=c, $URL=str]);
#                }

} 
-------------- next part --------------
module SMTPurl;

export {

	    redef enum Log::ID += { Links_LOG };

	    type Info: record {
                ## When the email was seen.
                ts:   time    &log;
                ## Unique ID for the connection.
                uid:  string  &log;
                ## Connection details.
                id:   conn_id &log;
                ## url that was discovered.
		host: string &log &optional ; 
                url:  string  &log &optional;

        };


        redef enum Notice::Type += {
                ## Indicates that an MD5 sum was calculated for a MIME message.
                SMTP_Embeded_Malicious_URL,
		SMTP_Link_in_EMAIL_Clicked, 
		SMTP_Link_REFERRER_Clicked, 
		SMTP_Linked_BINARY_Download, 
		SMTP_Dotted_URL, 	
		SMTP_Suspicious_File_URL, 
		SMTP_Suspicious_Embedded_Text, 
		SMTP_WatchedFileType, 
		SMTP_Click_Here_Seen
	}; 
        

		global url_dotted_pattern: pattern = /([^"#]+)/; 

		const url_regex = /^([a-zA-Z\-]{3,5})(:\/\/[^\/?#"'\r\n><]*)([^?#"'\r\n><]*)([^[:blank:]\r\n"'><]*|\??[^"'\r\n><]*)/ &redef;


		global mail_links = bloomfilter_basic_init(0.00000001, 10000000) ; 
	
		global link_already_seen: set[string] &redef ; 
		global referrer_link_already_seen: set[string] ; 
		
		const suspicious_file_types: pattern = /\.rar$|\.exe$|\.zip$/ &redef; 
		const ignore_file_types: pattern = /\.gif$|\.png$|\.jpg$|\.xml$|\.PNG$|\.jpeg$|\.css$/ &redef; 

		redef link_already_seen += { "example.net"} ;
                const ignore_mail_originators: set[subnet] += { 1.2.3.4/24, 2.3.4.0/24} &redef;
                const ignore_mailfroms : pattern += /bro@|alerts|reports/ &redef ;
                const ignore_mails_to: set[string] = {"alerts at example.com", "notices at example.com",} &redef ;
                const ignore_site_links: pattern = /http:\/\/.*\.example\.come\/|http:\/\/.*\.example\.net/ &redef ;

		const suspicious_text_in_url = /googledoc|googledocs|wrait\.ru|webs\.com|jimdo\.com|yolasite\.com\// &redef ; 
		const suspicious_text_in_body = /[Pp][Ee][Rr][Ss][Oo][Nn][Aa][Ll] [Ee][Mm][Aa][Ll]|[Pp][Aa][Ss][Ss][Ww][Oo][Rr][Dd]|[Uu][Ss][Ee][Rr] [Nn][Aa][Mm][Ee]|[Uu][Ss][Ee][Rr][Nn][Aa][Mm][Ee]/ &redef ; 

} 

redef record connection += {
        smtp_url: Info &optional;
};


event bro_init() &priority=5
{
        Log::create_stream(SMTPurl::Links_LOG, [$columns=Info]);

} 


function extract_host(name: string): string
{
        local split_on_slash = split(name, /\//);
        local num_slash = |split_on_slash|;

## ash
        return split_on_slash[3];
}



## Extracts URLs discovered in arbitrary text.
function find_all_urls(s: string): string_set
    {
    return find_all(s, url_regex);
    }


## Extracts URLs discovered in arbitrary text without
## the URL scheme included.
function find_all_urls_without_scheme(s: string): string_set
{
	local urls = find_all_urls(s);
	local return_urls: set[string] = set();
	for ( url in urls )
		{
		local no_scheme = sub(url, /^([a-zA-Z\-]{3,5})(:\/\/)/, "");
		add return_urls[no_scheme];
		}

	return return_urls;
}



function log_smtp_urls(c:connection, url:string)
{
		local info: Info; 

		info$ts = c$smtp$ts;
               	info$uid = c$smtp$uid ;
                info$id = c$id ;
               	info$url = url;
		info$host = extract_host(url) ;  

              	c$smtp_url = info;
               
		Log::write(SMTPurl::Links_LOG, c$smtp_url);

} 


event mime_segment_data(c: connection, length: count, data: string) &priority=-5
{

	if(c$smtp?$mailfrom && ignore_mailfroms  in c$smtp$mailfrom)
	{	
		return ; 
	} 

	if (c$smtp?$to) 
	{  
		for (to in c$smtp$to) 
		{ 
			if (to in ignore_mails_to) 
                        { 
				return ; 
			} 
		} 
	} 

	if ( ! c?$smtp ) return;

	#if (c$smtp?$to in ignore_mails_to) return ; 
	if (c$id$orig_h in ignore_mail_originators) return; 


	local mail_info:string; 

	if (c$smtp?$to && c$smtp?$subject) { 
                mail_info =  fmt ("uid=%s from=%s to=%s subject=%s", c$smtp$uid, c$smtp$from, c$smtp$to, c$smtp$subject);
        }   
        else { 
		mail_info =  fmt ("uid=%s from=%s", c$smtp$uid, c$smtp$from);
        } 

	local urls = find_all_urls(data) ; 

	for (link in urls){
#		local link =  sub(a,/(http|https):\/\//,"");


		#local _bf_lookup = bloomfilter_lookup(mail_links, link);

		#if (link !in mail_links && ignore_file_types !in link )
		#if ((_bf_lookup ==  0) && ignore_file_types !in link )
		if ( ignore_file_types !in link )
		  { 
		#	mail_links[link] = mail_info ; 
			
			bloomfilter_add(mail_links, link); 
			log_smtp_urls(c, link); 
			
			if ( suspicious_file_types in link)
			{ 
				NOTICE([$note=SMTP_WatchedFileType, $msg=fmt("Suspicious filetype embeded in URL %s from  %s", link, c$id$orig_h), $conn=c]); 
			} 
			
			if ( suspicious_text_in_url in link)
			{ 
				NOTICE([$note=SMTP_Embeded_Malicious_URL, $msg=fmt("Suspicious text embeded in URL %s from  %s", link, c$smtp$uid), $conn=c]); 
			} 
			
			if ( suspicious_text_in_body in data && /[Cc][Ll][Ii][Cc][Kk] [Hh][Ee][Rr][Ee]/ in data)
			{ 
				NOTICE([$note=SMTP_Click_Here_Seen, $msg=fmt("Click Here seen in the email %s from  %s", link, c$smtp$uid), $conn=c]); 
			} 

			if (/([[:digit:]]{1,3}\.){3}[[:digit:]]{1,3}.*/ in link )
			{ 
				#local url = split_all(data, /href.*\"http:\/\/([[:digit:]]{1,3}\.){3}[[:digit:]]{1,3}.*\"/); 
				#url[2]= sub(url[2], /^href=3D\"|href=\"/, "");
				#url[2]= sub(url[2], /\"$/, "");
				NOTICE([$note=SMTP_Dotted_URL, $msg=fmt("Embeded IP in URL %s from  %s", link, c$id$orig_h), $conn=c]);
			} 

		 } ## check link in mail_links 
	} 	## for  
}
 

event log_smtp(rec: Info)
{ 
	#print fmt ("log_smtp: INfo: %s", Info); 
} 

#event SMTP::log_mime (rec: SMTP::EntityInfo)
#{
##	print fmt ("log_mine Log_mime: %s", rec); 
#} 

event mime_begin_entity(c: connection) 
{
#print fmt ("mime_begin_entity: %s %s %s %s", c$smtp$from, c$smtp$to, c$smtp$subject, c$smtp$reply_to);
} 

event http_message_done(c: connection, is_orig: bool, stat: http_message_stat) &priority=-3
{ 
	local str = HTTP::build_url_http(c$http); 

	local _bf_lookup_http= bloomfilter_lookup(SMTPurl::mail_links, str); 



	if ((_bf_lookup_http >0) && str !in SMTPurl::link_already_seen && ignore_file_types !in str && ignore_site_links !in str)
	{ 		
		#NOTICE([$note=SMTPurl::SMTP_Link_in_EMAIL_Clicked, $msg=fmt("URL %s [%s]", str, SMTPurl::mail_links[str]), $conn=c]);
	
		NOTICE([$note=SMTPurl::SMTP_Link_in_EMAIL_Clicked, $msg=fmt("URL %s ", str), $conn=c]);
		add SMTPurl::link_already_seen[str] ; 
	} 	
	
	if (c$http?$referrer) 
	{ 

	local ref = c$http$referrer; 
	
		local _bf_lookup_ref = bloomfilter_lookup(SMTPurl::mail_links, ref);		
		#if (ref in SMTPurl::mail_links && ref !in SMTPurl::referrer_link_already_seen && ignore_file_types !in ref && ignore_site_links !in ref)
		if ((_bf_lookup_ref > 0) && ref !in SMTPurl::referrer_link_already_seen && ignore_file_types !in ref && ignore_site_links !in ref)
		{  
		fmt("Added from %s", ref); 
		} 
	} 

	## aashish: need to port to file analysis framework 

#                if (c$http?$md5 && str in SMTPurl::mail_links )
#                {
#                	NOTICE([$note=SMTP_Linked_BINARY_Download, $msg=fmt("%s %s %s", c$id$orig_h, c$http$md5, str),
#				$sub=c$http$md5, $conn=c, $URL=str]);
#		} 	

} 
-------------- next part --------------
@load ./smtp-embedded-url-bloom.bro 

### smtp-embedded-url analysis

## Ignore HTTP tracking if the links from these domains are seen/clicked

redef SMTPurl::link_already_seen += { "example.come","example.org", };
redef SMTPurl::ignore_site_links: pattern = /.*\.example\.com\/|.*\.example\.net/ ;

## Careful: Since Bro watches all the emails (including the alerts it sends, this
## can create an Email storm because an alert including a malicious URL can cause another alert email
## ignore email going to these addresses.

redef SMTPurl::ignore_mails_to: set[string] = {"bro-alerts at example.com", "alerts at example.com", "reports at example.com"}; 

# Ignore emails from the following sender
redef SMTPurl::ignore_mailfroms += /bro@|alerts@|security@|reports/;

### Ignore emails originating from these subnets
## For IP address please use x.y.w.z/32

redef SMTPurl::ignore_mail_originators: set[subnet] += { 1.2.3.4/24, 1.2.3.5/24, } &redef;

### ignore further processing on the following file types embedded in the url - too much volume not useful dataset
redef SMTPurl::ignore_file_types: pattern = /\.gif$|\.png$|\.jpg$|\.xml$|\.PNG$|\.jpeg$|\.css$/ ;

## alert on these file types: generates SMTP_WatchedFileType
redef SMTPurl::suspicious_file_types: pattern = /\.doc$|\.docx|\.xlsx|\.xls|\.rar$|\.exe$|\.zip$/ ;


### Alert on text in URI : generates SMTP_Embeded_Malicious_URL
redef SMTPurl::suspicious_text_in_url = /googledoc|googledocs|ph\.ly\/|webs\.com\/|jimdo\.com/ &redef ;
#redef SMTPurl::suspicious_text_in_url = /googledoc|googledocs|ph\.ly\/|webs\.com\/|jimdo\.com|http(s)?:\/\/.*\/.*(\.edu|\.gov|\.com).*/ &redef ;

## Alert on the text in the body of the message: generates
redef SMTPurl::suspicious_text_in_body = /[Pp][Ee][Rr][Ss][Oo][Nn][Aa][Ll] [Ee][Mm][Aa][Ll]|[Pp][Aa][Ss][Ss][Ww][Oo][Rr][Dd]|[Uu][Ss][Ee][Rr] [Nn][Aa][Mm][Ee]|[Uu][Ss][Ee][Rr][Nn][Aa][Mm][Ee]/ &redef ;
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://mailman.ICSI.Berkeley.EDU/pipermail/bro/attachments/20140814/51a7444a/attachment.bin 


More information about the Bro mailing list