[Bro] internal error: unknown msg type 101 in Poll()

Scott Campbell scampbell at lbl.gov
Sat Feb 20 09:19:28 PST 2010


Just as a data point, we are seeing the same thing here at NERSC in a
two machine cluster.  Manager diag output looks like:

> %broctl diag manager
> [manager]
> Could not find the frame base for "RemoteSerializer::InternalCommError(char const*)".
> Cannot access memory at address 0x5
> ==== stderr.log
> pcap bufsize = 2097152
> listening on em1
> 1266607176.728276 internal error: unknown msg type 101 in Poll()
> /bro/share/broctl/scripts/run-bro: line 73:  5695 Abort trap: 6           (core dumped) nohup $tmpbro $@
> ==== stdout.log
> 
> ==== .status
> TERMINATED [internal_error]
> 
> ==== No prof.log.
> 
> bro.core
> Core was generated by `bro'.
> Program terminated with signal 6, Aborted.
> #0  0x2870c017 in kill () from /lib/libc.so.7
> #0  0x2870c017 in kill () from /lib/libc.so.7
> #1  0x2870bf76 in raise () from /lib/libc.so.7
> #2  0x2870ab8a in abort () from /lib/libc.so.7
> #3  0x08051894 in internal_error () at SSLInterpreter.cc:30
> #4  0x08164d61 in RemoteSerializer::InternalCommError (this=) at RemoteSerializer.cc:2714
> #5  0x0816c68b in RemoteSerializer::Poll (this=0xbfbfe57c, may_block=116) at RemoteSerializer.cc:1478
> #6  0x0816c87b in RemoteSerializer::NextTimestamp (this=0x82df3c8, local_network_time=0xbfbfe7f8) at RemoteSerializer.cc:1294
> #7  0x08129a7b in IOSourceRegistry::FindSoonest (this=0x82b8f58, ts=0xbfbfe838) at IOSource.cc:61
> #8  0x081465ce in net_run () at Net.cc:509
> #9  0x0804fcef in main (argc=) at main.cc:999

The memory address is consistent across crashes.

This is the stock 1.5.1 with the only "unusual" thing running on the
system being Seth's policy scripts for DNS, SMTP and HTTP logging.

thanks,
scott


On 2/20/10 9:17 AM, Sean McCreary wrote:
> I have been seeing several crashes per day due to 'internal error:
> unknown msg type 101 in Poll()' in the manager process of a bro cluster
> handling ~2.5 Gb/s of traffic.  Here is a typical stack trace:
> 
>> Program terminated with signal 6, Aborted.
>> #0  0x000000080158ef6c in kill () from /lib/libc.so.6
>> #0  0x000000080158ef6c in kill () from /lib/libc.so.6
>> #1  0x000000080158ddfd in abort () from /lib/libc.so.6
>> #2  0x000000000040b329 in internal_error () at SSLInterpreter.cc:31
>> #3  0x000000000050efde in RemoteSerializer::InternalCommError (this=0x8fd3,
>> msg=0x8fd3 <Address 0x8fd3 out of bounds>) at RemoteSerializer.cc:2714
>> #4  0x000000000051668b in RemoteSerializer::Poll (this=0x7cb7e0,
>> may_block=false) at RemoteSerializer.cc:1477
>> #5  0x0000000000516c83 in RemoteSerializer::NextTimestamp (this=0x7cb7e0,
>> local_network_time=0x7fffffffe330) at RemoteSerializer.cc:1294
>> #6  0x00000000004d6575 in IOSourceRegistry::FindSoonest (this=0x79a310,
>> ts=0x7fffffffe518) at stl_list.h:131
>> #7  0x00000000004f2df3 in net_run () at Net.cc:509
>> #8  0x0000000000408938 in main (argc=36152552, argv=0x0) at main.cc:999
> 
> This seems to be the same problem as ticket #203.  Robin's comment (see
> <http://tracker.icir.org/bro/ticket/203#comment:1> suggests this may be
> caused by high system load, but that doesn't seem to be the case.
> 
> To check this, I have set up two clusters fed by the same input traffic.
>  The first is a cluster of seven machines with a single bro instance
> running on each.  The cluster has four workers, two proxies, and the
> manager node.  In broctl, 'top' rarely reports CPU utilization over 10%
> for any node, and memory consumption is typically < 250 MB per process.
>  The manager process in this cluster crashes several times per day.
> 
> The second cluster is just one machine: a dual quad-core Xeon system
> with 16 GB of RAM.  It is running six instances of bro: four workers
> each listening to a different network interface, one proxy, and one
> manager.  CPU utilization is often ~50% on the workers, and as high as
> 20% on the manager.  Although 'netstats' reports more packet loss for
> this cluster, the manager does not crash.
> 
> Is there some other line of investigation I should pursue?  A
> single-machine Bro cluster won't handle much more traffic, so this isn't
> a useful workaround for the long term.
> _______________________________________________
> Bro mailing list
> bro at bro-ids.org
> http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/bro

-- 
We must be careful not to confuse data
with the abstractions we use to
analyze them.

William James (1842-1910)

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 250 bytes
Desc: OpenPGP digital signature
Url : http://mailman.ICSI.Berkeley.EDU/pipermail/bro/attachments/20100220/d072cd70/attachment.bin 


More information about the Bro mailing list