[Bro-Dev] manager-child cpu utilization?

Scott Campbell scampbell at lbl.gov
Sat Sep 17 21:37:56 PDT 2011


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I ran strace on the manager parent and child process during a cycle of
relatively high use and came up with the following sets of system calls:

  98337 kill
 143871 read
 128700 select
   8058 write

70 sec

Child:

  56193 read
      1 rt_sigreturn
 684399 select
 333803 write

75 sec

For the parent the kill() call looks like:

1316290215.456133 kill(10062, SIG_0)    = 0

with 10062 being the child manager process.

There is a select loop in RemoteSerializer.cc which may be a candidate
to tune

> // FIXME: Fine-tune this (timeouts, flush, etc.) struct timeval
> small_timeout; small_timeout.tv_sec = 0; small_timeout.tv_usec = 
> io->CanWrite() || io->CanRead() ? 1 : 10;
> 
> int a = select(max_fd + 1, &fd_read, &fd_write, &fd_except, 
> &small_timeout);
> 
> if ( a == 0 ) ++timeouts;
> 
> if ( selects % 100000 == 0 ) Log(fmt("selects=%ld canwrites=%ld
> timeouts=%ld", selects, canwrites, timeouts));
> 
> if ( a < 0 ) // Ignore errors for now. continue;

or ChunkedIO.cc :		small_timeout.tv_usec = 50;

This is just a (pair of) guesses - current version is 1.6-dev-1261
running on Linux golumn.nersc.gov 2.6.38.6-26.rc1.fc15.x86_64 #1 SMP
Mon May 9 20:45:15 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux via PF_RING
5.01 and ixgbe+DNA driver.

scott


On 9/17/11 9:17 PM, Martin Holste wrote:
> Yep, this is definitely happening.  Our cluster's manager and its 
> child, as well as the proxy and its child chew up about 20-25% CPU 
> each, so that's almost a full CPU for the manager/child and 
> proxy/child.
> 
> On Fri, Sep 16, 2011 at 7:52 PM, Seth Hall <seth at icir.org> wrote:
>> I'm seeing what appears to be extremely high CPU utilization by
>> the manager's child process in a rather large (23 worker?)
>> cluster.  I believe the deserialization is being done by the
>> child process and I'm guessing that the child process is being
>> overwhelmed by all of the deserialization it's having to do for
>> the logging framework's log records.
>> 
>> Any one have thoughts about if I'm correct or ways this could be
>> optimized?  I'm just worried this could become a problem because
>> logs are written much faster at other sites and I can see other
>> sites getting communication framework overload really easily.  I
>> think it's something that we need to keep a close eye on at
>> least.
>> 
>> .Seth
>> 
>> -- Seth Hall International Computer Science Institute (Bro)
>> because everyone has a network http://www.bro-ids.org/
>> 
>> 
>> _______________________________________________ bro-dev mailing
>> list bro-dev at bro-ids.org 
>> http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev
>> 
> 
> _______________________________________________ bro-dev mailing
> list bro-dev at bro-ids.org 
> http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iD8DBQFOdXWkK2Plq8B7ZBwRApFUAKDYqQtgpCtbOYmum2pCghxNwdHfEQCgm/D4
OhNCPaIEdX+G2HfBC5SROPM=
=TJ2l
-----END PGP SIGNATURE-----


More information about the bro-dev mailing list