[Bro] Troubleshooting crashes
Tritium Cat
tritium.cat at gmail.com
Thu Aug 30 14:46:07 PDT 2012
Hello all,
What's the best way to disable Bro in a systematic way to isolate crashes ?
I disabled all the protocols except SSH and a few default scripts that
utilize it. After ~12 hours I returned to find many of the worker nodes
had crashed. I forgot to look at the diag for the crashed workers before
stopping the cluster.
Suggestions greatly appreciated.
Thanks,
-TC
local.bro configuration
=======================================
bro at bc : [9:20pm] : bro : grep -v "^#" site/local.bro |grep "[a-z]"
@load misc/loaded-scripts
@load tuning/defaults
@load protocols/ssh/software
@load protocols/ssh/geo-data
@load protocols/ssh/detect-bruteforcing
@load protocols/ssh/interesting-hostnames
base/init-default.bro configuration
=======================================
bro at bc : [9:20pm] : bro : grep -v "^#" base/init-default.bro | grep "[a-z]"
@load base/utils/site
@load base/utils/addrs
@load base/utils/conn-ids
@load base/utils/directions-and-hosts
@load base/utils/files
@load base/utils/numbers
@load base/utils/paths
@load base/utils/patterns
@load base/utils/strings
@load base/utils/thresholds
@load base/frameworks/notice
@load base/frameworks/dpd
@load base/frameworks/signatures
@load base/frameworks/packet-filter
@load base/frameworks/software
@load base/frameworks/communication
@load base/frameworks/control
@load base/frameworks/cluster
@load base/frameworks/metrics
@load base/frameworks/intel
@load base/frameworks/reporter
@load base/frameworks/tunnels
@load base/protocols/ssh
Here's the PF_RING info:
=======================================
PF_RING Version : 5.4.6 ($Revision: 5658$)
Ring slots : 4096
Slot version : 14
Capture TX : No [RX only]
IP Defragment : No
Socket Mode : Standard
Transparent mode : No (mode 2)
Total rings : 10
Total plugins : 0
Here's a top from one of the physical servers running workers; all five
servers have a similar profile.
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
15423 bro 20 0 1059m 957m 69m R 95 1.5 2:21.79 bro
15429 bro 20 0 1041m 939m 69m R 80 1.5 1:49.08 bro
15426 bro 20 0 1041m 939m 69m S 72 1.5 1:56.46 bro
15422 bro 20 0 1041m 939m 69m R 70 1.5 1:58.51 bro
15424 bro 20 0 1041m 939m 69m S 70 1.5 1:51.85 bro
15427 bro 20 0 1041m 939m 69m R 64 1.5 1:59.86 bro
15420 bro 20 0 1041m 939m 69m S 62 1.5 1:52.02 bro
15421 bro 20 0 1041m 939m 69m R 60 1.5 1:43.13 bro
15425 bro 20 0 1041m 939m 69m R 58 1.5 1:39.75 bro
15428 bro 20 0 1041m 939m 69m S 56 1.5 1:39.05 bro
15430 bro 25 5 186m 76m 64m S 19 0.1 0:28.42 bro
15437 bro 25 5 186m 76m 64m S 18 0.1 0:28.08 bro
15431 bro 25 5 186m 76m 64m S 14 0.1 0:27.35 bro
15432 bro 25 5 186m 76m 64m S 14 0.1 0:26.63 bro
15433 bro 25 5 186m 76m 64m S 14 0.1 0:26.72 bro
15436 bro 25 5 186m 76m 64m S 14 0.1 0:23.36 bro
15434 bro 25 5 186m 76m 64m S 12 0.1 0:25.94 bro
15438 bro 25 5 186m 76m 64m S 12 0.1 0:22.98 bro
15435 bro 25 5 186m 76m 64m S 10 0.1 0:20.59 bro
15439 bro 25 5 186m 76m 64m S 10 0.1 0:20.54 bro
Here's a snapshot from the server running the manager, no problem here...
PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
62903 bro 1 28 5 192M 20616K select 4 0:23 0.29% bro
62937 bro 1 28 5 188M 19916K select 0 0:24 0.20% bro
Results in logs so far:
=======================================
-rw-r--r-- 1 bro bro 1409892 Aug 30 21:27 communication.log
-rw-r--r-- 1 bro bro 5611960 Aug 30 21:27 dpd.log
-rw-r--r-- 1 bro bro 6844 Aug 30 21:12 loaded_scripts.log
-rw-r--r-- 1 bro bro 214670 Aug 30 21:27 notice.log
-rw-r--r-- 1 bro bro 1101 Aug 30 21:12 notice_policy.log
-rw-r--r-- 1 bro bro 187 Aug 30 21:12 packet_filter.log
-rw-r--r-- 1 bro bro 10323 Aug 30 21:15 reporter.log
-rw-r--r-- 1 bro bro 115243 Aug 30 21:27 software.log
-rw-r--r-- 1 bro bro 3640436 Aug 30 21:27 ssh.log
-rw-r--r-- 1 bro bro 0 Aug 30 21:12 stderr.log
-rw-r--r-- 1 bro bro 29 Aug 30 21:12 stdout.log
-rw-r--r-- 1 bro bro 5575087 Aug 30 21:27 tunnel.log
-rw-r--r-- 1 bro bro 52846117 Aug 30 21:27 weird.log
+ communication.log
-- nothing but info on workers
+ dpd.log
-- UDP 53 Teredo payload length messages
+ notice.log
-- SSH messages and errors on workers.
1346362754.085749 - - - - - -
PacketFilter::Dropped_Packets 876266 packets dropped after filtering,
876266 received - - - - -worker-3-1
Notice::ACTION_LOG 6 3600.000000 F - - -
- - - - -
1346362752.730723 - - - - - -
PacketFilter::Dropped_Packets 387600 packets dropped after filtering,
1248522 received, 860922 on link - --
- - worker-1-3 Notice::ACTION_LOG 6
3600.000000 F - - - - - - -
-
1346362755.198122 - - - - - -
PacketFilter::Dropped_Packets 958912 packets dropped after filtering,
958912 received - - - - -worker-2-6
Notice::ACTION_LOG 6 3600.000000 F - - -
- - - - -
worker 3-1 is PID 15278
>>> 10.1.1.1
(+) bro 15278 15189 59.4 1.6 1143708 1105120 ? R 14:12:52
00:16:29 bro
(+) bro 15279 15188 40.5 1.6 1106704 1067676 ? S 14:12:52
00:11:15 bro
Look at PF_RING for this PID:
====================================
user at bro_server:~$ cat /proc/net/pf_ring/15278-eth5.24
Bound Device(s) : eth5
Active : 1
Breed : Non-DNA
Sampling Rate : 1
Capture Direction : RX+TX
Socket Mode : RX+TX
Appl. Name : <unknown>
IP Defragment : No
BPF Filtering : Enabled
# Sw Filt. Rules : 0
# Hw Filt. Rules : 0
Poll Pkt Watermark : 1
Num Poll Calls : 13786320
Channel Id : -1
Cluster Id : 0
Slot Version : 14 [5.4.6]
Min Num Slots : 8159
Bucket Len : 8192
Slot Len : 8224 [bucket+header]
Tot Memory : 67108864
Tot Packets : 169050905
Tot Pkt Lost : 0
Tot Insert : 169050905
Tot Read : 169050895
Insert Offset : 24725033
Remove Offset : 24713556
TX: Send Ok : 0
TX: Send Errors : 0
Reflect: Fwd Ok : 0
Reflect: Fwd Errors: 0
Num Free Slots : 8150
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ICSI.Berkeley.EDU/pipermail/bro/attachments/20120830/c599a813/attachment.html
More information about the Bro
mailing list