[Bro-Dev] [JIRA] (BIT-1215) bro-cut should be rewritten in C for speed and to not depend on gawk

Justin Azoff (JIRA) jira at bro-tracker.atlassian.net
Thu Jul 10 14:14:07 PDT 2014

    [ https://bro-tracker.atlassian.net/browse/BIT-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17104#comment-17104 ] 

Justin Azoff commented on BIT-1215:

so, the MAX_LINE_LEN needs a closer look.

I did a quick check against some of our http logs.  Across 351536081 lines, there were 21 lines longer than about a megabyte.  All of these were requests for webcams that use mjpeg / multi-part http responses.  350,000 responses in a single connection causes a very large log line of almost 10 megabytes.

I think we should look into reallocing the line. The following check needs to exist either way, so resizing the array and re-reading instead of exiting shouldn't be too much more work, or affect performance.

    linelen = strlen(line);
    if (linelen == MAX_LINE_LEN - 1) {

> bro-cut should be rewritten in C for speed and to not depend on gawk
> --------------------------------------------------------------------
>                 Key: BIT-1215
>                 URL: https://bro-tracker.atlassian.net/browse/BIT-1215
>             Project: Bro Issue Tracker
>          Issue Type: Improvement
>          Components: Bro, bro-aux
>            Reporter: Daniel Thayer
>             Fix For: 2.4
> The current implementation of bro-cut is too slow when processing large log files (takes more than a minute to process a single log file a few hundred MB in size).  Justin Azoff rewrote bro-cut in C and found that it runs an order of magnitude faster.  Another benefit of a C version of bro-cut is that we will no longer depend on gawk for anything (and some of Bro's supported platforms do not include gawk by default).

This message was sent by Atlassian JIRA

More information about the bro-dev mailing list