[Bro-Dev] Flare removal

Azoff, Justin S jazoff at illinois.edu
Fri May 20 15:02:14 PDT 2016


> On May 20, 2016, at 4:52 PM, Seth Hall <seth at icir.org> wrote:
>
> For the 2.5 release, we were hoping to understand why the topic/seth/remove-flare fixes some issues that people have been seeing with the communication code.  Perhaps even more to the point we are aiming to understand why that branch fixes the problem, but Robin's branch topic/robin/no-flares-2.4.1 doesn't work.
>
> The problem that we've seen will exhibit on Linux (for some reason FreeBSD doesn't seem to be affected) and you will see high memory use on the child of your manager process.  People will tend to notice it in two ways.
> 1. Memory exhaustion
> 2. Logs being written that are seconds to minutes old.
>
> This isn't exactly a request for anyone to do anything, but more a call for anyone that would like to dig around in the core to figure out what is going on here so we can get a fix merged into master.
>
> Thanks!
>  .Seth

I had looked into it a while ago.. I don't think the differences in your branches has anything to do with flares...

$ git diff  origin/topic/robin/no-flares-2.4.1  origin/topic/seth/remove-flare src/iosource/Manager.cc
diff --git a/src/iosource/Manager.cc b/src/iosource/Manager.cc
index 80fa5fe..5ad8cca 100644
--- a/src/iosource/Manager.cc
+++ b/src/iosource/Manager.cc
@@ -96,8 +96,8 @@ IOSource* Manager::FindSoonest(double* ts)
        // return it.
        int maxx = 0;

-       if ( soonest_src && (call_count % SELECT_FREQUENCY) != 0 )
-               goto finished;
+//     if ( soonest_src && (call_count % SELECT_FREQUENCY) != 0 )
+//             goto finished;

        // Select on the join of all file descriptors.
        fd_set fd_read, fd_write, fd_except;


$ git diff  origin/topic/robin/no-flares-2.4.1  origin/topic/seth/remove-flare src/RemoteSerializer.cc
[snip]

-               // FIXME: Fine-tune this (timeouts, flush, etc.)
-               struct timeval small_timeout;
-               small_timeout.tv_sec = 0;
-               small_timeout.tv_usec =
-                       io->CanWrite() || io->CanRead() ? 1 : 10;
-
-#if 0
-               if ( ! io->CanWrite() )
-                       usleep(10);
-#endif
-
-               int a = select(max_fd + 1, &fd_read, &fd_write, &fd_except,
-                               &small_timeout);
+               struct timeval timeout;
+               timeout.tv_sec = 1;
+               timeout.tv_usec = 0;

-               if ( a == 0 )
-                       ++timeouts;
+               int a = select(max_fd + 1, &fd_read, &fd_write, &fd_except, &timeout);


Seths branch removes the SELECT_FREQUENCY check and defaults the serializer 'small timeout' to 1 full second.  Robins branch still has the SELECT_FREQUENCY check and has the small timeout set to 1 or 10 microseconds.  I think the two extra changes in Seths branch combine to make bro spend more time in the RemoteSerializer code.

When I was trying to figure some of this out I believed that many of these constants were part of the issue.  All the different places calling select with different timeouts and different frequencies causing bro to spend more time calling select than it was actually moving bytes around.  The only thing I ever really found wrong with the flare code was that repeated fire/extinguishes were not No-Ops and I had a small patch that improved that without changing anything else (attached).

I think Robins branch doesn't fix the problem because I don't think the flares were really the issue.. I think bro started having issues because between 2.3 and 2.4 traffic volumes increased, cluster sizes increased, and we added a ton of new analyzers and log files which put even more strain on the communication system.






--
- Justin Azoff

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20160520/ca61cdc5/attachment.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: flare_fix.patch
Type: application/octet-stream
Size: 839 bytes
Desc: flare_fix.patch
Url : http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20160520/ca61cdc5/attachment.obj 


More information about the bro-dev mailing list