[Bro-Dev] [JIRA] (BIT-1376) method to reproduce "internal error: unknown msg type 115 in Poll()"

Jon Siwek (JIRA) jira at bro-tracker.atlassian.net
Thu Apr 16 15:36:00 PDT 2015


    [ https://bro-tracker.atlassian.net/browse/BIT-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=20309#comment-20309 ] 

Jon Siwek commented on BIT-1376:
--------------------------------

Maybe consider the patch in topic/jsiwek/bit-1376

Basically what I think happened goes like:

child->parent sends MSG_LOG "selects=X canwrites=Y"
- first chunk saying here comes MSG_LOG gets accepted
- second chunk says "selects=X canwrites=Y" gets rejected because the hard cap is reached

the queues drain a bit and then you get a similar message sent...

child->parent sends MSG_LOG "selects=X canwrites=Y"
- first chunk saying here comes MSG_LOG gets accepted
- second chunk says "selects=X canwrites=Y" gets accepted because we're under the hard cap again

now on the parent side, it reads a chunk that says MSG_LOG, then reads another chunk that says MSG_LOG and misinterprets that as the data that goes with the first message log (which ends up being something like \x13\x00\x00\x00...), but then it reads a chunk with contents "selects=X canwrites=Y" and interprets that as the chunk containing message type information.  The message type is found in the first byte and that is 's', whose value is 115 and not valid.

Rejecting arbitrary chunks on the child-parent path seems like asking for things to get in a weird state, so the patch just now relies only on shutting down child-child (remote peers) connections to try and deal with overload.  In the test case, the memory situation looked stable, but the peers end up thrashing -- so again the user would probably need to intervene and put a higher-level solution in place (e.g. more proxies, etc), except now the signal for overload problems isn't a crash, but just messages in communication.log (maybe something better like a notice can be done).

> method to reproduce "internal error: unknown msg type 115 in Poll()"
> --------------------------------------------------------------------
>
>                 Key: BIT-1376
>                 URL: https://bro-tracker.atlassian.net/browse/BIT-1376
>             Project: Bro Issue Tracker
>          Issue Type: Problem
>          Components: Bro
>            Reporter: Jon Siwek
>
> Justin found a modification to Bro and a script that triggers the "unknown msg type 115" bug.  This method seems to reproduce the problem fairly reliably and between two bro processes started via command-line.
> Patch:
> {code}
> diff --git a/src/ChunkedIO.h b/src/ChunkedIO.h
> index b590453..39af9b1 100644
> --- a/src/ChunkedIO.h
> +++ b/src/ChunkedIO.h
> @@ -223,10 +223,10 @@ private:
>  
>         // We report that we're filling up when there are more than this number
>         // of pending chunks.
> -       static const uint32 MAX_BUFFERED_CHUNKS_SOFT = 400000;
> +       static const uint32 MAX_BUFFERED_CHUNKS_SOFT = 40;
>  
>         // Maximum number of chunks we store in memory before rejecting writes.
> -       static const uint32 MAX_BUFFERED_CHUNKS = 500000;
> +       static const uint32 MAX_BUFFERED_CHUNKS = 50;
>  
>         char* read_buffer;
>         uint32 read_len;
> {code}
> Start a bro process like this:
> {code}
> $ cat test.bro 
> @load frameworks/communication/listen
> redef Communication::nodes += {
>     ["foo"] = [$host = 127.0.0.1, $sync=T]
> };
> global counters: table[string] of count &synchronized &default=0;
> event do_some (n:count)
> {
>     ++counters["thecounter"];
>     ++counters["thecounter"];
>     ++counters["thecounter"];
>     ++counters["thecounter"];
>     ++counters["thecounter"];
>     ++counters["thecounter"];
>     ++counters["thecounter"];
>     ++counters["thecounter"];
>     ++counters["thecounter"];
>     ++counters["thecounter"];
>     ++counters["thecounter"];
>     ++counters["thecounter"];
>     ++counters["thecounter"];
>     ++counters["thecounter"];
>     ++counters["thecounter"];
>     ++counters["thecounter"];
>     ++counters["thecounter"];
>     ++counters["thecounter"];
>     ++counters["thecounter"];
>     ++counters["thecounter"];
>     ++counters[peer_description];
>     if(counters["thecounter"] % 10000 == 0 ) {
>         Reporter::warning(fmt("I am %s and The counter is %d. my counter is %d", peer_description, counters["thecounter"], counters[peer_description]));
>     }
>     if(n != 0) {
>         schedule 1msec { do_some(n-1) };
>     } else {
>         Reporter::warning(fmt("The counter is %d", counters["thecounter"]));
>     }
> }
> event bro_init()
> {
>     schedule 1sec { do_some(1000000) };
>     schedule 2sec { do_some(1000000) };
>     schedule 3sec { do_some(1000000) };
> }
> $ bro -b ./test.bro
> {code}
> Then start another like this:
> {code}
> $ cat test.bro 
> @load base/frameworks/communication
> redef Communication::nodes += {
>     ["foo"] = [$host = 127.0.0.1, $events = /.*/, $connect=T, $sync=T,
>                $retry=5sec]
> };
> global counters: table[string] of count &synchronized &default=0;
> event check ()
> 	{
> 	print counters["thecounter"];
>         schedule 5sec { check() };
> 	}
> event bro_init()
> 	{
>         schedule 5sec { check() };
> 	}
> $ bro -b ./test.bro 
> processing suspended
> processing continued
> 55069
> 58963
> 62831
> 66636
> internal error: unknown msg type 115 in Poll()
> Abort trap: 6
> {code}



--
This message was sent by Atlassian JIRA
(v6.4-OD-16-006#64014)


More information about the bro-dev mailing list