[Zeek] Some issues with find_all_urls() function

Jonah Burgess jburgess03 at qub.ac.uk
Tue Aug 13 06:57:40 PDT 2019


Hi Everyone,

I'm using the "find_all_urls()" function from urls.zeek to extract all URLs from HTTP bodies. I occasionally errors such as this these:

1485557634.826679 error in /usr/local/zeek/share/zeek/base/utils/urls.zeek, line 122: bad conversion to count (to_count(parts[1]) and answers:PersonalBing:EZBubbleClose) no-repeat center;width:11px;height:11px;background-position-y:-10px}#hp_bottomCell #ezp_notification #ezp_bubble .ezp_bubble_close:hover{background-position-y:0}.ezp_location{font:14px)

1485557634.826679 error in /usr/local/zeek/share/zeek/base/utils/urls.zeek, line 122: bad conversion to count (to_count(parts[1]) and answers:PersonalBing:EZPanelClose) no-repeat center;width:11px;height:11px}.ezp_module{float:left;height:269px;width:255px;margin:25px 0;padding:0 42px}.ezp_module.ezp_module_narrow{width:122px}.ezp_module_leftseparator{border-left:1px solid #222}.ezp_module_title{font-size:20px;line-height:24px;margin-bottom:11px}.ezp_module_desc{font-size:16px;line-height:20px;margin-bottom:20px}.ezp_interests_icon{vertical-align:middle}.ezp_option_control{background:url(rms:)

1485557634.826679 error in /usr/local/zeek/share/zeek/base/utils/urls.zeek, line 122: bad conversion to count (to_count(parts[1]) and answers:PersonalBing:EZPanelClose) no-repeat center;width:11px;height:11px;position:relative;top:-22px;left:-10px}#hp_tbar.ezp_signin_message{background-image:-webkit-gradient(linear,left top,left bottom,from(rgba(0,0,0,.55)),to(rgba(0,0,0,.85)));background-image:-moz-linear-gradient(rgba(0,0,0,.55) 0,rgba(0,0,0,.85) 80%);background-image:-ms-linear-gradient(rgba(0,0,0,.55) 0,rgba(0,0,0,.85) 80%);background-image:-o-linear-gradient(rgba(0,0,0,.55) 0,rgba(0,0,0,.85) 80%);background-image:linear-gradient(rgba(0,0,0,.55) 0,rgba(0,0,0,.85) 80%)}.ezp_opened .ezp_barrier{display:block;background-color:#000;height:111px;margin:0 40px;position:relative;top:-185px;opacity:0}#sc_mdc.loading+.ezp_panelopened{margin-top:-46px}.ezp_icon{position:relative;top:-5px;left:0;cursor:pointer;background-color:rgba(34,34,34,.75);margin-right:1px;margin-bottom:-7px;-webkit-margin-after:-5px}#ezp_bubble_message{position:absolute;left:30px;background-color:rgba(0,0,0,.8);color:#fff;border:1px solid #333;padding:0 12px;font-size:13px;line-height:40px;height:40px;opacity:0}#ezp_bubble_message .ezp_info{vertical-align:middle;margin-right:12px}#ezp_bubble_message .ezp_bubble_down{background:url(rms:)

1378597102.912603 error in /usr/local/zeek/share/zeek/base/utils/urls.zeek, line 122: bad conversion to count (to_count(parts[1]) and )
www.iec.ch\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x16IEC http

I have a couple of questions regarding this:

1) When trying to resolve some of these issues, should I directly modify urls.zeek or will this have unintended consequences regarding other scripts/functionality in Zeek? The reason I ask this is when printing URLs extracted with the find_all_urls() function I get some results which are clearly not valid URLs e.g. "http://www.yootheme.com/license) */" - this should have cut off before the ")" which I believe are bug with urls.zeek rather than simply being intended functionality that I'd like to change.

2) Assuming I don't manage to fix all of these errors and choose to accept some, how can I stop them from printing to console each time I process a PCAP?

3) While trying to fix some of these errors with regex, I ran into the example "www.iec.ch\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x16IEC http". I've tried to strip everthing after the first "\" but this doesn't work due to it being Hex (I guess) rather than an actual "\", any ideas for this specific case?

4) Finally, a regex related question I've been meaning to ask for a while. Because I'm trying to extract URLs from HTML/JS, I need to deal with cases whitespace and multiple types of quote character may be used. When I've written projects in Python, I would create a variable with all of the possible characters in it and then I would use this variable in the regex e.g.

q = r"[\‘\’\'\"\s]*(?:&quot|')*"
pattern = q+r"userTokens"+q+r"(?::|=)"+q+r"(\w+)"+q

if re.search(pattern, data):
                do something..

I can't workout how to do this with regex in Bro/Zeek scripts so I'm having to create incredibly long patterns to ensure all possible cases are met, if anybody can recommend a better way (like how I did it in Python), that would be awesome!

Thanks in Advance,
Jonah (CryptoCat)


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ICSI.Berkeley.EDU/pipermail/zeek/attachments/20190813/4707a1be/attachment.html 


More information about the Zeek mailing list