[Bro] high cpu usage and strange select(2) behavior

sridhar basam sri at basam.org
Thu Feb 9 09:31:29 PST 2012


On Thu, Feb 9, 2012 at 9:43 AM, Stephane Chazelas
<stephane.chazelas at gmail.com> wrote:
> Hiya,
>
> here with a standalone bro-2.0, I'm seeing 2 bro processes
> using about 30% of the time of a cpu each regarless of whether
> there's traffic or not to watch.
>
> Doing a strace on them I see, for one process only select system
> calls in a loop with null or ridiculously low timeouts:
>
> select(12, [0 11], [], [], {0, 10})     = 0 (Timeout)
> select(12, [11], NULL, NULL, {0, 0})    = 0 (Timeout)
> select(12, [11], NULL, NULL, {0, 0})    = 0 (Timeout)
> select(12, [0 11], [], [], {0, 10})     = 0 (Timeout)
> select(12, [11], NULL, NULL, {0, 0})    = 0 (Timeout)
> select(12, [11], NULL, NULL, {0, 0})    = 0 (Timeout)
> select(12, [0 11], [], [], {0, 10})     = 0 (Timeout)
> select(12, [11], NULL, NULL, {0, 0})    = 0 (Timeout)
>
> 0 being tcp *:47760 (LISTEN), and 11 a unix domain socket, probably
> a socket pair to communicate with the other bro process. Why
> those 3 selects in an infinite loop, why a timeout? Can't the
> select just sit on [0 11] if it's the only inputs it gets? Here
> it seems it is just wasting 30% of the time of the CPU.
>
> The other process behavior is even weirder:
>
> select(0, NULL, NULL, NULL, {0, 20})    = 0 (Timeout)
> select(11, [4 7 10], [0], [0], {0, 0})  = 1 (out [0], left {0, 0})
> kill(15287, SIG_0)                      = 0
> read(10, 0xb6ebc008, 1048576)           = -1 EAGAIN (Resource temporarily unavailable)
> gettimeofday({1328798101, 327305}, NULL) = 0
> select(0, NULL, NULL, NULL, {0, 20})    = 0 (Timeout)
> select(11, [4 7 10], [0], [0], {0, 0})  = 1 (out [0], left {0, 0})
> kill(15287, SIG_0)                      = 0
> read(10, 0xb6ebc008, 1048576)           = -1 EAGAIN (Resource temporarily unavailable)
> gettimeofday({1328798101, 328513}, NULL) = 0
> select(0, NULL, NULL, NULL, {0, 20})    = 0 (Timeout)
> select(11, [4 7 10], [0], [0], {0, 0})  = 1 (out [0], left {0, 0})
> kill(15287, SIG_0)                      = 0
> read(10, 0xb6ebc008, 1048576)           = -1 EAGAIN (Resource temporarily unavailable)
>
>
> A select with no fd to watch (why not use nanosleep?) a read()
> on a non-blocking fd following a select that returns on timeout
> (null timeout with nothing for that fd). And again, why not have
> the select() sit waiting for input?
>
> That doesn't seem quite right.
>
> Is that the expected behavior, or is there any way to configure
> it so that it behaves itself?

IMO looping through select isn't an issue. On an idle/lightly loaded
system, select is going to be the dominate usage of cpu but when you
start seeing increased load, the rest of the system begins to pick up
in usage with select taking a smaller and smaller portion of the CPU.
Is it really an issue using 30% CPU on an otherwise idle system?

About using nanosleep vs select,  i don't really see any difference in
terms of what you want to achieve. In either case the process doing
the select or nanosleep is going to be put to sleep till the timer
expires so they both end up achieving the same thing. It isn't
uncommon for one to use select/poll in that way to sleep.

 Sridhar




More information about the Bro mailing list