[Bro-Dev] [JIRA] (BIT-1215) bro-cut should be rewritten in C for speed and to not depend on gawk
Justin Azoff (JIRA)
jira at bro-tracker.atlassian.net
Thu Jul 10 14:14:07 PDT 2014
[ https://bro-tracker.atlassian.net/browse/BIT-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17104#comment-17104 ]
Justin Azoff commented on BIT-1215:
-----------------------------------
so, the MAX_LINE_LEN needs a closer look.
I did a quick check against some of our http logs. Across 351536081 lines, there were 21 lines longer than about a megabyte. All of these were requests for webcams that use mjpeg / multi-part http responses. 350,000 responses in a single connection causes a very large log line of almost 10 megabytes.
I think we should look into reallocing the line. The following check needs to exist either way, so resizing the array and re-reading instead of exiting shouldn't be too much more work, or affect performance.
{code}
linelen = strlen(line);
if (linelen == MAX_LINE_LEN - 1) {
{code}
> bro-cut should be rewritten in C for speed and to not depend on gawk
> --------------------------------------------------------------------
>
> Key: BIT-1215
> URL: https://bro-tracker.atlassian.net/browse/BIT-1215
> Project: Bro Issue Tracker
> Issue Type: Improvement
> Components: Bro, bro-aux
> Reporter: Daniel Thayer
> Fix For: 2.4
>
>
> The current implementation of bro-cut is too slow when processing large log files (takes more than a minute to process a single log file a few hundred MB in size). Justin Azoff rewrote bro-cut in C and found that it runs an order of magnitude faster. Another benefit of a C version of bro-cut is that we will no longer depend on gawk for anything (and some of Bro's supported platforms do not include gawk by default).
--
This message was sent by Atlassian JIRA
(v6.3-OD-08-005-WN#6328)
More information about the bro-dev
mailing list