[Bro] Email Link Extraction
James Lay
jlay at slave-tothe-box.net
Thu Apr 18 12:13:18 PDT 2013
On 2013-04-18 12:27, Seth Hall wrote:
> On Apr 18, 2013, at 11:31 AM, James Lay <jlay at slave-tothe-box.net>
> wrote:
>
>> Yea I'll second that...email packet captures make finding links a
>> challenge as quoted emails split the link
>
> This is far from perfect due to the reason you pointed out, but it's
> a start and this code snippet is from the next release of Bro (you
> just call find_all_urls_without_scheme with the string that you want
> to extract urls from):
>
>
> const url_regex =
>
> /^([a-zA-Z\-]{3,5})(:\/\/[^\/?#"'\r\n><]*)([^?#"'\r\n><]*)([^[:blank:]\r\n"'><]*|\??[^"'\r\n><]*)/
> &redef;
>
> ## Extracts URLs discovered in arbitrary text.
> function find_all_urls(s: string): string_set
> {
> return find_all(s, url_regex);
> }
>
> ## Extracts URLs discovered in arbitrary text without
> ## the URL scheme included.
> function find_all_urls_without_scheme(s: string): string_set
> {
> local urls = find_all_urls(s);
> local return_urls: set[string] = set();
> for ( url in urls )
> {
> local no_scheme = sub(url, /^([a-zA-Z\-]{3,5})(:\/\/)/, "");
> add return_urls[no_scheme];
> }
>
> return return_urls;
> }
>
>
>
>
> .Seth
>
> --
> Seth Hall
> International Computer Science Institute
> (Bro) because everyone has a network
> http://www.bro.org/
Thanks Seth...as I'm still horrifically newb with Bro, I'm guessing the
above can go in local.bro? Thank you.
James
More information about the Bro
mailing list