[Bro] Troubleshooting crashes

Tritium Cat tritium.cat at gmail.com
Mon Sep 10 17:51:39 PDT 2012


Finally getting back to this.



On Fri, Aug 31, 2012 at 1:18 AM, Seth Hall <seth at icir.org> wrote:

>
> On Aug 30, 2012, at 5:46 PM, Tritium Cat <tritium.cat at gmail.com> wrote:
>
> > What's the best way to disable Bro in a systematic way to isolate
> crashes ?
>
> Sending us the diag output from broctl is best since it will include a
> back trace.



 ==== No reporter.log

==== stderr.log
listening on eth5, capture length 8192 bytes

/usr/local/3rd-party/bro/share/broctl/scripts/run-bro: line 60: 15452
Segmentation fault      nohup $mybro $@

==== stdout.log
unlimited
unlimited
unlimited

==== .cmdline
-i eth5 -U .status -p broctl -p broctl-live -p local -p worker-5-9
local.bro broctl base/frameworks/cluster local-worker.bro broctl/auto

==== .env_vars
PATH=/usr/local/3rd-party/bro/bin:/usr/local/3rd-party/bro/share/broctl/scripts:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games
BROPATH=/usr/local/3rd-party/bro/spool/installed-scripts-do-not-touch/site::/usr/local/3rd-party/bro/spool/installed-scripts-do-not-touch/auto:/usr/local/3rd-party/bro/share/bro:/usr/local/3rd-party/bro/share/bro/policy:/usr/local/3rd-party/bro/share/bro/site
CLUSTER_NODE=worker-5-9

==== .status
RUNNING [net_run]

==== No prof.log

==== No packet_filter.log

==== No loaded_scripts.log

-- [Automatically generated.]





> >  After ~12 hours I returned to find many of the worker nodes had
> crashed.  I forgot to look at the diag for the crashed workers before
> stopping the cluster.
>
> Do you have the cron command setup correctly?  The workers should have
> been restart automatically after they crashed and a diagnostic email sent
> to you.
>         Mentioned in this section:
> http://bro-ids.org/documentation/quickstart.html#a-minimal-starting-configuration


I did not; it's working properly now.

 (...)


> Total rings         : 10
>
> How many CPU cores do you have?


48 per server.



> > -rw-r--r--  1 bro  bro     10323 Aug 30 21:15 reporter.log
> > -rw-r--r--  1 bro  bro  52846117 Aug 30 21:27 weird.log
>
> I'm curious about what's in reporter.log, normally that shouldn't have too
> much in it.  Also, that's an astonishingly large weird.log.  Is there
> anything that stands out in those two?
>


reporter.log -- looks like I need to setup GeoIPV6 database:
/usr/share/GeoIP/GeoIPCityv6.dat (empty)

  50 Reporter::INFO processing continued (empty)  <cut..>
  50 Reporter::INFO Failed to open GeoIP database: <cut..>
  29 Reporter::INFO processing suspended (empty) <cut..>


weird.log --


bro at bc : [12:33am] : 2012-08-30 : ls -l weird.* | tail -5
-rw-r--r--  1 bro  bro  16757363 Aug 30 21:00 weird.20:00:00-21:00:00.log.gz
-rw-r--r--  1 bro  bro    304697 Aug 30 21:02 weird.21:00:00-21:02:10.log.gz
-rw-r--r--  1 bro  bro  39351508 Aug 30 22:00 weird.21:12:53-22:00:00.log.gz
-rw-r--r--  1 bro  bro  55141105 Aug 30 23:00 weird.22:00:00-23:00:00.log.gz
-rw-r--r--  1 bro  bro  38190282 Aug 31 00:00 weird.23:00:00-00:00:00.log.gz


bro at bc : [12:33am] : 2012-08-30 : gzcat weird.23:00:00-00:00:00.log.gz |
awk '{print $7}' | sort | uniq -c | sort -rn | head -10
614589 data_before_established
585445 possible_split_routing
260703 window_recision
190652 SYN_seq_jump
100211 inappropriate_FIN
64533 above_hole_data_without_any_acks
37882 connection_originator_SYN_ack
33611 data_after_reset
19106 Teredo_bubble_with_payload
11510 SYN_after_reset

bro at bc : [12:34am] : current : awk '{print $7}' weird.log | sort | uniq -c
| sort -rn | head -10
51561 window_recision
49218 possible_split_routing
47776 data_before_established
24526 Teredo_bubble_with_payload
19894 connection_originator_SYN_ack
11718 SYN_seq_jump
8938 inappropriate_FIN
8701 data_after_reset
7523 above_hole_data_without_any_acks
5765 inner_IP_payload_length_mismatch




> Could you show me your node.cfg configuration too?
>

bro at bc : [12:41am] : bro : cat etc/node.cfg
[manager]
type=manager
host=z.z.z.M

[proxy-1]
type=proxy
host=z.z.z.M

[worker-1]
type=worker
host=z.z.z.A
interface=eth5
lb_procs=10
lb_method=pf_ring

[worker-2]
type=worker
host=z.z.z.B
interface=eth5
lb_procs=10
lb_method=pf_ring

[worker-3]
type=worker
host=z.z.z.C
interface=eth5
lb_procs=10
lb_method=pf_ring

[worker-4]
type=worker
host=z.z.z.D
interface=eth5
lb_procs=10
lb_method=pf_ring

[worker-5]
type=worker
host=z.z.z.E
interface=eth5
lb_procs=10
lb_method=pf_ring




> Oh, and one last thing, have you made sure to disable all of special NIC
> features?
>
> http://securityonion.blogspot.com/2011/10/when-is-full-packet-capture-not-full.html



Yeah, I've used those recommendations from the start with one exception;
the Intel X520-DA2 cards I'm using do not support disabling "ufo" (UDP
large send offload).

# Adjust interface features
#
# Disable features on network card that may deliver super packets
#
#
http://securityonion.blogspot.com/2011/10/when-is-full-packet-capture-not-full.html
#
ethtool -K eth5 rx off
ethtool -K eth5 tx off
ethtool -K eth5 sg off
ethtool -K eth5 tso off
#ethtool -K eth5 ufo off
ethtool -K eth5 gso off
ethtool -K eth5 gro off
ethtool -K eth5 lro off
ethtool -K eth5 rxvlan off
ethtool -K eth5 txvlan off
ethtool -K eth5 ntuple on
#
ethtool -s eth5 speed 10000 duplex full
ifconfig eth5 mtu 9600
ifconfig eth5 up
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ICSI.Berkeley.EDU/pipermail/bro/attachments/20120911/5a85d6b3/attachment.html 


More information about the Bro mailing list