[Bro] Email Link Extraction
James Lay
jlay at slave-tothe-box.net
Thu Apr 18 09:24:25 PDT 2013
On 2013-04-18 09:58, Castle, Shane wrote:
> At first glance this seems like all it needs is an appropriate regex.
> But then consider: any string containing both "." and "/" might be a
> candidate. (Actually, just a string containing "." with no space
> around it.)
>
> So, this might range from the full regex to detect '<a
> href=".+">.+</a>' to just '\s.+\..+\s' (Perl regex used).
>
> I'd welcome attempts to work on this. And, even if the result does
> not catch everything, if it gets anything at all it'd be better than
> what we have now.
>
> --
> Shane Castle
> Data Security Mgr, Boulder County IT
Here's a special just from this morning (xx's added):
Hello,
Please view the document i uploaded for you using Google docs.
*VIEW <hxxp://mensmentis.hu/godocs/index.htm> HERE *just sign in with
your
email to view the document its very important
Regards
And the quoted-printable content (it's a hoot):
<a rel=3D"nofollow" href=3D"hxxp://mensmenti=
s.hu/godocs/index.htm" target=3D"_blank"
style=3D"color:rgb(40,98,197);outl=
ine-width:0px">VIEW=A0</a>
guessing that some normalization will be needed to nuke the 3D's and
possible ='s within links, or just match on "http://" and call it good.
Hope the above shows up right.
James
More information about the Bro
mailing list