[Bro-Dev] [JIRA] (BIT-1306) bro process would get stuck/freeze with myricom drivers

Jon Siwek (JIRA) jira at bro-tracker.atlassian.net
Mon Mar 23 12:41:01 PDT 2015


    [ https://bro-tracker.atlassian.net/browse/BIT-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=20105#comment-20105 ] 

Jon Siwek commented on BIT-1306:
--------------------------------

Can you check if this small patch helps?

{code}
diff --git a/src/main.cc b/src/main.cc
index fb48bdc..7827302 100644
--- a/src/main.cc
+++ b/src/main.cc
@@ -391,6 +391,7 @@ void terminate_bro()
        delete event_serializer;
        delete state_serializer;
        delete event_registry;
+       delete remote_serializer;
        delete analyzer_mgr;
        delete file_mgr;
        delete log_mgr;
{code}

I'm not sure why that got removed (it still exists in 2.3.2), but it might cause the main Bro processes to not reap its child.  The main Bro process being the one that opened a network interface and the child being the one doing remote communication, but which inherits the parent's open file descriptors.  So a total guess is that the process forked for remote communication became a zombie (due to lack of what's in the patch above) and holds an open file descriptor on the network device.

> bro process would get stuck/freeze with myricom drivers
> -------------------------------------------------------
>
>                 Key: BIT-1306
>                 URL: https://bro-tracker.atlassian.net/browse/BIT-1306
>             Project: Bro Issue Tracker
>          Issue Type: Problem
>          Components: Bro
>    Affects Versions: git/master
>         Environment:  OS: FreeBSD 9.3-RELEASE-p5 OS
> bro version 2.3-328
> git log -1 --format="%H"
> 379593c7fded0f9791ae71a52dd78a4c9d5a2c1f
>            Reporter: Aashish Sharma
>              Labels: bro-git, myricom
>             Fix For: 2.4
>
>
> When I stop bro (in cluster mode), one of the bro worker process (random) would get stuck and wouldn't shutdown, stop or even be killed using kill -s 9. 
> System has to be ultimately rebooted to remove stuck bro process. 
> On running  myri_start_stop I see:
> # /usr/local/opt/snf/sbin/myri_start_stop stop
> Removing myri_snf.ko
> kldunload: can't unload file: Device busy
> It appears that the myri_snf.ko driver cannot be unloaded because of the stuck bro process.  That process still has an open descriptor on the Sniffer device/driver and bro process freezes 
> More details:
> The bro process is stuck in RNE state
> R       Marks a runnable process.
> N       The process has reduced CPU scheduling priority (see setpriority(2)).
> E       The process is trying to exit.
> Here is an example:
> ### stuck process:
> [bro at 01 ~]$ ps auxwww | fgrep 1616
> bro    1616  100.0  0.0 758040 60480 ??  RNE   2:57PM   53:50.04 /usr/local/bro-git/bin/bro -i myri0 -U .status -p broctl -p broctl-live -p local -p worker-1-1 mgr.bro broctl base/frameworks/cluster local-worker.bro broctl/auto
> ####when checking for process in proc:
> [bro at c ~]$ ls -l /proc/1616
> ls: /proc/1616: No such file or directory



--
This message was sent by Atlassian JIRA
(v6.4-OD-16-005#64014)


More information about the bro-dev mailing list