[Bro] Troubleshooting crashes

Tritium Cat tritium.cat at gmail.com
Thu Aug 30 14:46:07 PDT 2012


Hello all,

What's the best way to disable Bro in a systematic way to isolate crashes ?

I disabled all the protocols except SSH and a few default scripts that
utilize it.  After ~12 hours I returned to find many of the worker nodes
had crashed.  I forgot to look at the diag for the crashed workers before
stopping the cluster.

Suggestions greatly appreciated.

Thanks,

-TC


local.bro configuration
=======================================
bro at bc : [9:20pm] : bro : grep -v "^#" site/local.bro |grep "[a-z]"
@load misc/loaded-scripts
@load tuning/defaults
@load protocols/ssh/software
@load protocols/ssh/geo-data
@load protocols/ssh/detect-bruteforcing
@load protocols/ssh/interesting-hostnames


base/init-default.bro configuration
=======================================
bro at bc : [9:20pm] : bro : grep -v "^#" base/init-default.bro  | grep "[a-z]"
@load base/utils/site
@load base/utils/addrs
@load base/utils/conn-ids
@load base/utils/directions-and-hosts
@load base/utils/files
@load base/utils/numbers
@load base/utils/paths
@load base/utils/patterns
@load base/utils/strings
@load base/utils/thresholds
@load base/frameworks/notice
@load base/frameworks/dpd
@load base/frameworks/signatures
@load base/frameworks/packet-filter
@load base/frameworks/software
@load base/frameworks/communication
@load base/frameworks/control
@load base/frameworks/cluster
@load base/frameworks/metrics
@load base/frameworks/intel
@load base/frameworks/reporter
@load base/frameworks/tunnels
@load base/protocols/ssh




Here's the PF_RING info:
=======================================
PF_RING Version     : 5.4.6 ($Revision: 5658$)
Ring slots          : 4096
Slot version        : 14
Capture TX          : No [RX only]
IP Defragment       : No
Socket Mode         : Standard
Transparent mode    : No (mode 2)
Total rings         : 10
Total plugins       : 0



Here's a top from one of the physical servers running workers; all five
servers have a similar profile.

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND

15423 bro       20   0 1059m 957m  69m R   95  1.5   2:21.79 bro

15429 bro       20   0 1041m 939m  69m R   80  1.5   1:49.08 bro

15426 bro       20   0 1041m 939m  69m S   72  1.5   1:56.46 bro

15422 bro       20   0 1041m 939m  69m R   70  1.5   1:58.51 bro

15424 bro       20   0 1041m 939m  69m S   70  1.5   1:51.85 bro

15427 bro       20   0 1041m 939m  69m R   64  1.5   1:59.86 bro

15420 bro       20   0 1041m 939m  69m S   62  1.5   1:52.02 bro

15421 bro       20   0 1041m 939m  69m R   60  1.5   1:43.13 bro

15425 bro       20   0 1041m 939m  69m R   58  1.5   1:39.75 bro

15428 bro       20   0 1041m 939m  69m S   56  1.5   1:39.05 bro

15430 bro       25   5  186m  76m  64m S   19  0.1   0:28.42 bro

15437 bro       25   5  186m  76m  64m S   18  0.1   0:28.08 bro

15431 bro       25   5  186m  76m  64m S   14  0.1   0:27.35 bro

15432 bro       25   5  186m  76m  64m S   14  0.1   0:26.63 bro

15433 bro       25   5  186m  76m  64m S   14  0.1   0:26.72 bro

15436 bro       25   5  186m  76m  64m S   14  0.1   0:23.36 bro

15434 bro       25   5  186m  76m  64m S   12  0.1   0:25.94 bro

15438 bro       25   5  186m  76m  64m S   12  0.1   0:22.98 bro

15435 bro       25   5  186m  76m  64m S   10  0.1   0:20.59 bro

15439 bro       25   5  186m  76m  64m S   10  0.1   0:20.54 bro


Here's a snapshot from the server running the manager, no problem here...

  PID USERNAME   THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
62903 bro          1  28    5   192M 20616K select  4   0:23  0.29% bro
62937 bro          1  28    5   188M 19916K select  0   0:24  0.20% bro


Results in logs so far:
=======================================
-rw-r--r--  1 bro  bro   1409892 Aug 30 21:27 communication.log
-rw-r--r--  1 bro  bro   5611960 Aug 30 21:27 dpd.log
-rw-r--r--  1 bro  bro      6844 Aug 30 21:12 loaded_scripts.log
-rw-r--r--  1 bro  bro    214670 Aug 30 21:27 notice.log
-rw-r--r--  1 bro  bro      1101 Aug 30 21:12 notice_policy.log
-rw-r--r--  1 bro  bro       187 Aug 30 21:12 packet_filter.log
-rw-r--r--  1 bro  bro     10323 Aug 30 21:15 reporter.log
-rw-r--r--  1 bro  bro    115243 Aug 30 21:27 software.log
-rw-r--r--  1 bro  bro   3640436 Aug 30 21:27 ssh.log
-rw-r--r--  1 bro  bro         0 Aug 30 21:12 stderr.log
-rw-r--r--  1 bro  bro        29 Aug 30 21:12 stdout.log
-rw-r--r--  1 bro  bro   5575087 Aug 30 21:27 tunnel.log
-rw-r--r--  1 bro  bro  52846117 Aug 30 21:27 weird.log

+ communication.log
   -- nothing but info on workers

+ dpd.log
   -- UDP 53 Teredo payload length messages

+ notice.log
   -- SSH messages and errors on workers.

1346362754.085749       -       -       -       -       -       -
PacketFilter::Dropped_Packets   876266 packets dropped after filtering,
876266 received -       -       -       -       -worker-3-1
Notice::ACTION_LOG      6       3600.000000     F       -       -       -
    -       -       -       -       -
1346362752.730723       -       -       -       -       -       -
PacketFilter::Dropped_Packets   387600 packets dropped after filtering,
1248522 received, 860922 on link        -       --
        -       -       worker-1-3      Notice::ACTION_LOG      6
3600.000000     F       -       -       -       -       -       -       -
    -
1346362755.198122       -       -       -       -       -       -
PacketFilter::Dropped_Packets   958912 packets dropped after filtering,
958912 received -       -       -       -       -worker-2-6
Notice::ACTION_LOG      6       3600.000000     F       -       -       -
    -       -       -       -       -

worker 3-1 is PID 15278

>>> 10.1.1.1
   (+) bro      15278 15189 59.4  1.6 1143708 1105120 ?     R 14:12:52
00:16:29 bro
   (+) bro      15279 15188 40.5  1.6 1106704 1067676 ?     S 14:12:52
00:11:15 bro



Look at PF_RING for this PID:
====================================
user at bro_server:~$ cat /proc/net/pf_ring/15278-eth5.24
Bound Device(s)    : eth5
Active             : 1
Breed              : Non-DNA
Sampling Rate      : 1
Capture Direction  : RX+TX
Socket Mode        : RX+TX
Appl. Name         : <unknown>
IP Defragment      : No
BPF Filtering      : Enabled
# Sw Filt. Rules   : 0
# Hw Filt. Rules   : 0
Poll Pkt Watermark : 1
Num Poll Calls     : 13786320
Channel Id         : -1
Cluster Id         : 0
Slot Version       : 14 [5.4.6]
Min Num Slots      : 8159
Bucket Len         : 8192
Slot Len           : 8224 [bucket+header]
Tot Memory         : 67108864
Tot Packets        : 169050905
Tot Pkt Lost       : 0
Tot Insert         : 169050905
Tot Read           : 169050895
Insert Offset      : 24725033
Remove Offset      : 24713556
TX: Send Ok        : 0
TX: Send Errors    : 0
Reflect: Fwd Ok    : 0
Reflect: Fwd Errors: 0
Num Free Slots     : 8150
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ICSI.Berkeley.EDU/pipermail/bro/attachments/20120830/c599a813/attachment.html 


More information about the Bro mailing list