[Bro] Email Link Extraction

Thu Apr 18 09:24:25 PDT 2013

On 2013-04-18 09:58, Castle, Shane wrote:
> At first glance this seems like all it needs is an appropriate regex.
> But then consider: any string containing both "." and "/" might be a
> candidate. (Actually, just a string containing "." with no space
> around it.)
>
> So, this might range from the full regex to detect '<a
> href=".+">.+</a>' to just '\s.+\..+\s' (Perl regex used).
>
> I'd welcome attempts to work on this. And, even if the result does
> not catch everything, if it gets anything at all it'd be better than
> what we have now.
>
> --
> Shane Castle
> Data Security Mgr, Boulder County IT

Here's a special just from this morning (xx's added):

Hello,

Please view the document i uploaded for you using Google docs.
*VIEW  <hxxp://mensmentis.hu/godocs/index.htm> HERE *just sign in with 
your
email to view the document its very important

Regards

And the quoted-printable content (it's a hoot):

<a rel=3D"nofollow" href=3D"hxxp://mensmenti=
s.hu/godocs/index.htm" target=3D"_blank" 
style=3D"color:rgb(40,98,197);outl=
ine-width:0px">VIEW=A0</a>

guessing that some normalization will be needed to nuke the 3D's and 
possible ='s within links, or just match on "http://" and call it good.  
Hope the above shows up right.

James