[Bro] Patterns and Word Boundaries

Thu Oct 22 08:05:18 PDT 2015

Hopefully this isn't too simplistic of a question, but I'm just getting
started with Bro.

In the text pattern syntax for Bro [1], is there an easy way to define
word boundaries, similar to how some of the RegEx dialects use '\b',
'\<', '\>', etc.? [2]

I'm trying to match for specific strings in a data stream.  For example,
the word "nmap".  I'm trying several approaches, based on past RegEx
knowledge, and I'm having trouble coming up with a single pattern that
would handle it all.  Example bro test script attached; hopefully it's
clear.

Fundamentally, is there a syntax reference for pattern matching, or does
it conform to a commonly known dialect (eg. POSIX-style RegEx, or PCRE
RegEx)?

[1] https://www.bro.org/sphinx/scripting/index.html#pattern
[2] http://www.regular-expressions.info/wordboundaries.html

-- 
Lloyd Brown
Systems Administrator
Fulton Supercomputing Lab
Brigham Young University
http://marylou.byu.edu
-------------- next part --------------
event bro_init() {
	local testcases = set( 
		"nmap", 		#Should match something
		"test nmap", 		#Should match something
		"nmap test", 		#Should match something
		"test nmap test",	#should match something
		"unmapped_entries",	#Should NOT match any of the patterns
		"test\tnmap",		#Should match something
		"nmap\ttest",		#Should match something
		"test\tnmap\ttest"	#Should match something
		);
	local nmap_patterns = vector( 
		/ nmap /, 		#Works, but what if it's non-space whitespace, eg '\t'?
		/^nmap /, 		
		/ nmap$/, 
		/^nmap$/, 
		/\bnmap\b/, 		#doesn't seem to match word boundaries as expected
		/\<nmap\>/, 		#doesn't seem to match word boundaries as expected
		/[ \t]nmap$/,		#this works, but I have to anticipate which whitespace chars will be used
		/^nmap[ \t]/,		#this works, but I have to anticipate which whitespace chars will be used
		/[ \t]nmap[ \t]/	#this works, but I have to anticipate which whitespace chars will be used

		#I wanted to try this one involving negative lookahead and negative lookbehind, but it won't even compile
		#/(?<!\s)nmap(?>!\s)/	#probably won't work; not sure if \s means what I think, and negative lookarounds are hard to get right...
		);

	for (testcase in testcases) {
		print fmt("Testcase: \"%s\"", testcase);
		for (pi in nmap_patterns) {
			if ( nmap_patterns[pi] in testcase ) {
				print fmt("     Pattern: %s - Matched", nmap_patterns[pi]);
			} else {
				print fmt("     Pattern: %s - Did NOT match", nmap_patterns[pi]);
			}
		}
	}

}