Getting matched substrings ???

Yohann Thomas yohann.thomas at rd.francetelecom.com
Wed Apr 7 00:17:50 PDT 2004


Robin Sommer wrote:

>On Tue, Apr 06, 2004 at 16:38 +0200, Yohann Thomas wrote:
>
>  
>
>>text...". I thought I could get the matched substring by the signatures, 
>>but unfortunately I can't get out of it...
>>    
>>
>
>event signature_match(state: signature_state, msg: string, data: string)
>
>The 'data' parameter of the signature_match event contains the
>payload that lead to the match. (More precisely, it contains the
>last chunk of payload that eventually triggered the match; for TCP,
>it depends on the reassembly what exactly is passed).
>
>Is this what you're looking for?
>
>Robin
>
>  
>

Vern Paxson wrote:

> > I read in the paper "Bro: A
> > System for Detecting Network Intruders in Real-Time" this phrase about
> > REGEX implementation : "Second, we anticipate matching sets of patterns
> > and wanting to know which subset were matched by a given set of
> > text...". I thought I could get the matched substring by the 
> signatures,
> > but unfortunately I can't get out of it...
>
> (That text refers to regular-expression matching on general strings, 
> rather
>  than the context-based signature analyzer that Robin added to Bro, by 
> the
>  way.)
>
> Since writing that, Bro's style has moved more towards pushing extraction
> of elements into either the event engine itself, or into built-in 
> functions,
> rather than trying to do it using regular expressions over strings.  
> If it
> were easy to add subexpressions to Bro's RE matcher, I'd be happy to 
> do so,
> but it's quite a bit of work.
>
> If you give an example of where you want to do this, perhaps we can 
> suggest
> alternate ways of structuring your analyzer.
>
>                 Vern
>

In fact, I use the "data" parameter at the moment to get the whole 
payload, but the real idea was to get only the part that matched.
Here is a simple example of what I'd like to do :

*signature apache-server {
    ip-proto == tcp
    src-port == 80
    payload /Server: [aA][pP][aA][cC][hH][eE].***/
    event "Apache"
    tcp-state responder
}

Then, in a policy script, I thought I could get "Apache//version/", 
using the function sub_bytes(), associated to the IP@ of the host 
(contained in the signature_state). It was an easy way to know that the 
information I needed was 8 characters ("Server: ") after the beginning 
of the matched substring.

*To sum up, I'd like to get some hosts characteristics like : *this host 
(IP@ W.X.Y.Z) is now running Apache 1.3.29*.
*
*But, first, I noticed that it's not possible to match this REGEX. In 
fact, I have to add "*.**" at the beginning of the pattern (which is not 
necessary with PCRE). Then, since you're explaining me it's not possible 
to get the matched substring, there's no more reason for me to pursue 
this way. So, I have to find something else...

Any ideas are welcome !!!

Yohann.







More information about the Bro mailing list