[Bro-Dev] [JIRA] (BIT-1306) bro process would get stuck/freeze with myricom drivers
Jon Siwek (JIRA)
jira at bro-tracker.atlassian.net
Mon Mar 23 12:41:01 PDT 2015
[ https://bro-tracker.atlassian.net/browse/BIT-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=20105#comment-20105 ]
Jon Siwek commented on BIT-1306:
Can you check if this small patch helps?
diff --git a/src/main.cc b/src/main.cc
index fb48bdc..7827302 100644
@@ -391,6 +391,7 @@ void terminate_bro()
+ delete remote_serializer;
I'm not sure why that got removed (it still exists in 2.3.2), but it might cause the main Bro processes to not reap its child. The main Bro process being the one that opened a network interface and the child being the one doing remote communication, but which inherits the parent's open file descriptors. So a total guess is that the process forked for remote communication became a zombie (due to lack of what's in the patch above) and holds an open file descriptor on the network device.
> bro process would get stuck/freeze with myricom drivers
> Key: BIT-1306
> URL: https://bro-tracker.atlassian.net/browse/BIT-1306
> Project: Bro Issue Tracker
> Issue Type: Problem
> Components: Bro
> Affects Versions: git/master
> Environment: OS: FreeBSD 9.3-RELEASE-p5 OS
> bro version 2.3-328
> git log -1 --format="%H"
> Reporter: Aashish Sharma
> Labels: bro-git, myricom
> Fix For: 2.4
> When I stop bro (in cluster mode), one of the bro worker process (random) would get stuck and wouldn't shutdown, stop or even be killed using kill -s 9.
> System has to be ultimately rebooted to remove stuck bro process.
> On running myri_start_stop I see:
> # /usr/local/opt/snf/sbin/myri_start_stop stop
> Removing myri_snf.ko
> kldunload: can't unload file: Device busy
> It appears that the myri_snf.ko driver cannot be unloaded because of the stuck bro process. That process still has an open descriptor on the Sniffer device/driver and bro process freezes
> More details:
> The bro process is stuck in RNE state
> R Marks a runnable process.
> N The process has reduced CPU scheduling priority (see setpriority(2)).
> E The process is trying to exit.
> Here is an example:
> ### stuck process:
> [bro at 01 ~]$ ps auxwww | fgrep 1616
> bro 1616 100.0 0.0 758040 60480 ?? RNE 2:57PM 53:50.04 /usr/local/bro-git/bin/bro -i myri0 -U .status -p broctl -p broctl-live -p local -p worker-1-1 mgr.bro broctl base/frameworks/cluster local-worker.bro broctl/auto
> ####when checking for process in proc:
> [bro at c ~]$ ls -l /proc/1616
> ls: /proc/1616: No such file or directory
This message was sent by Atlassian JIRA
More information about the bro-dev