[Bro] 5 node cluster

Darrain Waters dwaters at bioteam.net
Fri Oct 7 14:18:03 PDT 2016


Thanks for the quick reply. I put proxy on everything because I was
grabbing at straws. I did only have 1 proxy and it was on the manager with
the same results.


Why are you using 7,8,9,10,11,18,19,20,21,22 in particular?  What CPUs do
you have?  This is potentially not doing what you intend.  Most likely 7/19
8/20 9/21 10/22 are the same cpu.

Those are the core that are with node 1 and node 1 is associated with the
myricom card.

[bromgr at bromgr 2016-10-07]$ lscpu

Architecture:          x86_64

CPU op-mode(s):        32-bit, 64-bit

Byte Order:            Little Endian

CPU(s):                24

On-line CPU(s) list:   0-23

Thread(s) per core:    2

Core(s) per socket:    6

Socket(s):             2

NUMA node(s):          2

Vendor ID:             GenuineIntel

CPU family:            6

Model:                 63

Model name:            Intel(R) Xeon(R) CPU E5-2643 v3 @ 3.40GHz

Stepping:              2

CPU MHz:               1200.000

BogoMIPS:              6799.00

Virtualization:        VT-x

L1d cache:             32K

L1i cache:             32K

L2 cache:              256K

L3 cache:              20480K

NUMA node0 CPU(s):     0-5,12-17

NUMA node1 CPU(s):     6-11,18-23

Your underlying problem is probably that a firewall is enabled on your
hosts and the worker processes can't reach the manager.

I have ip6 & iptables off




peerstatus


[BroControl] > peerstatus

    manager

1475875039.738664 peer=worker-2-2 host=10.0.40.17 events_in=3165
events_out=3165 ops_in=0 ops_out=3472 bytes_in=? bytes_out=?

1475875039.738664 peer=worker-1-3 host=10.0.40.18 events_in=3165
events_out=3165 ops_in=0 ops_out=3472 bytes_in=? bytes_out=?

1475875039.738664 peer=proxy-2 host=10.0.40.17 events_in=3165
events_out=3165 ops_in=0 ops_out=3472 bytes_in=? bytes_out=?

1475875039.738664 peer=proxiy-5 host=10.0.40.19 events_in=3165
events_out=3165 ops_in=0 ops_out=3472 bytes_in=? bytes_out=?

1475875039.738664 peer=worker-3-4 host=10.0.40.16 events_in=3165
events_out=3165 ops_in=0 ops_out=3472 bytes_in=? bytes_out=?

1475875039.738664 peer=worker-3-3 host=10.0.40.16 events_in=3165
events_out=3165 ops_in=0 ops_out=3472 bytes_in=? bytes_out=?

1475875039.738664 peer=worker-2-4 host=10.0.40.17 events_in=3165
events_out=3165 ops_in=0 ops_out=3472 bytes_in=? bytes_out=?

1475875039.738664 peer=worker-3-8 host=10.0.40.16 events_in=3165
events_out=3165 ops_in=0 ops_out=3472 bytes_in=? bytes_out=?

1475875039.738664 peer=worker-2-9 host=10.0.40.17 events_in=3165
events_out=3165 ops_in=0 ops_out=3472 bytes_in=? bytes_out=?

1475875039.738664 peer=worker-3-1 host=10.0.40.16 events_in=3165
events_out=3165 ops_in=0 ops_out=3472 bytes_in=? bytes_out=?

1475875039.738664 peer=worker-4-1 host=10.0.40.15 events_in=3165
events_out=3165 ops_in=0 ops_out=3472 bytes_in=? bytes_out=?

1475875039.738664 peer=worker-3-9 host=10.0.40.16 events_in=3165
events_out=3165 ops_in=0 ops_out=3472 bytes_in=? bytes_out=?

1475875039.738664 peer=worker-5-8 host=10.0.40.19 events_in=3165
events_out=3165 ops_in=0 ops_out=3472 bytes_in=? bytes_out=?

1475875039.738664 peer=worker-3-6 host=10.0.40.16 events_in=3165
events_out=3165 ops_in=0 ops_out=3472 bytes_in=? bytes_out=?

1475875039.738664 peer=worker-4-9 host=10.0.40.15 events_in=3165
events_out=3165 ops_in=0 ops_out=3472 bytes_in=? bytes_out=?

1475875039.738664 peer=worker-2-3 host=10.0.40.17 events_in=3165
events_out=3165 ops_in=0 ops_out=3472 bytes_in=? bytes_out=?

1475875039.738664 peer=proxy-3 host=10.0.40.16 events_in=3165
events_out=3165 ops_in=0 ops_out=3472 bytes_in=? bytes_out=?

1475875039.738664 peer=worker-2-7 host=10.0.40.17 events_in=3165
events_out=3165 ops_in=0 ops_out=3472 bytes_in=? bytes_out=?

1475875039.738664 peer=worker-5-7 host=10.0.40.19 events_in=3165
events_out=3165 ops_in=0 ops_out=3472 bytes_in=? bytes_out=?

1475875039.738664 peer=proxy-4 host=10.0.40.15 events_in=3165
events_out=3165 ops_in=0 ops_out=3472 bytes_in=? bytes_out=?

1475875039.738664 peer=worker-1-8 host=10.0.40.18 events_in=3165
events_out=3165 ops_in=0 ops_out=3472 bytes_in=? bytes_out=?

1475875039.738664 peer=proxy-1 host=10.0.40.18 events_in=3165
events_out=3165 ops_in=0 ops_out=3472 bytes_in=? bytes_out=?

1475875039.738664 peer=worker-3-2 host=10.0.40.16 events_in=3165
events_out=3165 ops_in=0 ops_out=3472 bytes_in=? bytes_out=?

1475875039.738664 peer=worker-5-4 host=10.0.40.19 events_in=3165
events_out=3165 ops_in=0 ops_out=3472 bytes_in=? bytes_out=?

1475875039.738664 peer=worker-1-6 host=10.0.40.18 events_in=3165
events_out=3165 ops_in=0 ops_out=3472 bytes_in=? bytes_out=?

1475875039.738664 peer=worker-5-1 host=10.0.40.19 events_in=3165
events_out=3165 ops_in=0 ops_out=3472 bytes_in=? bytes_out=?

1475875039.738664 peer=worker-1-10 host=10.0.40.18 events_in=3165
events_out=3165 ops_in=0 ops_out=3472 bytes_in=? bytes_out=?

1475875039.738664 peer=worker-5-9 host=10.0.40.19 events_in=3165
events_out=3165 ops_in=0 ops_out=3472 bytes_in=? bytes_out=?

1475875039.738664 peer=worker-3-10 host=10.0.40.16 events_in=3165
events_out=3165 ops_in=0 ops_out=3472 bytes_in=? bytes_out=?

1475875039.738664 peer=worker-5-2 host=10.0.40.19 events_in=3165
events_out=3165 ops_in=0 ops_out=3472 bytes_in=? bytes_out=?

1475875039.738664 peer=worker-5-3 host=10.0.40.19 events_in=3165
events_out=3165 ops_in=0 ops_out=3472 bytes_in=? bytes_out=?

1475875039.738664 peer=worker-1-1 host=10.0.40.18 events_in=3165
events_out=3165 ops_in=0 ops_out=3472 bytes_in=? bytes_out=?

1475875039.738664 peer=worker-2-8 host=10.0.40.17 events_in=3165
events_out=3165 ops_in=0 ops_out=3472 bytes_in=? bytes_out=?

1475875039.738664 peer=worker-4-5 host=10.0.40.15 events_in=3165
events_out=3165 ops_in=0 ops_out=3472 bytes_in=? bytes_out=?

1475875039.738664 peer=worker-4-6 host=10.0.40.15 events_in=3165
events_out=3165 ops_in=0 ops_out=3472 bytes_in=? bytes_out=?

1475875039.738664 peer=worker-2-6 host=10.0.40.17 events_in=3165
events_out=3165 ops_in=0 ops_out=3472 bytes_in=? bytes_out=?

1475875039.738664 peer=worker-4-8 host=10.0.40.15 events_in=3165
events_out=3165 ops_in=0 ops_out=3472 bytes_in=? bytes_out=?

1475875039.738664 peer=worker-3-7 host=10.0.40.16 events_in=3165
events_out=3165 ops_in=0 ops_out=3472 bytes_in=? bytes_out=?

1475875039.738664 peer=worker-4-7 host=10.0.40.15 events_in=3165
events_out=3165 ops_in=0 ops_out=3472 bytes_in=? bytes_out=?

1475875039.738664 peer=worker-5-6 host=10.0.40.19 events_in=3165
events_out=3165 ops_in=0 ops_out=3472 bytes_in=? bytes_out=?

1475875039.738664 peer=worker-2-1 host=10.0.40.17 events_in=3165
events_out=3165 ops_in=0 ops_out=3472 bytes_in=? bytes_out=?

1475875039.738664 peer=worker-5-5 host=10.0.40.19 events_in=3165
events_out=3165 ops_in=0 ops_out=3472 bytes_in=? bytes_out=?

1475875039.738664 peer=worker-4-10 host=10.0.40.15 events_in=3165
events_out=3165 ops_in=0 ops_out=3472 bytes_in=? bytes_out=?

1475875039.738664 peer=worker-2-10 host=10.0.40.17 events_in=3165
events_out=3165 ops_in=0 ops_out=3472 bytes_in=? bytes_out=?

1475875039.738664 peer=worker-1-7 host=10.0.40.18 events_in=3165
events_out=3165 ops_in=0 ops_out=3472 bytes_in=? bytes_out=?

1475875039.738664 peer=worker-4-3 host=10.0.40.15 events_in=3165
events_out=3165 ops_in=0 ops_out=3472 bytes_in=? bytes_out=?

1475875039.738664 peer=worker-1-9 host=10.0.40.18 events_in=3165
events_out=3165 ops_in=0 ops_out=3472 bytes_in=? bytes_out=?

1475875039.738664 peer=worker-1-5 host=10.0.40.18 events_in=3165
events_out=3165 ops_in=0 ops_out=3472 bytes_in=? bytes_out=?

1475875039.738664 peer=worker-4-2 host=10.0.40.15 events_in=3165
events_out=3165 ops_in=0 ops_out=3472 bytes_in=? bytes_out=?

1475875039.738664 peer= host=10.0.40.19 events_in=3165 events_out=3165
ops_in=0 ops_out=3472 bytes_in=? bytes_out=?

1475875039.738664 peer=proxy-5 host=10.0.40.19 events_in=3165
events_out=3165 ops_in=0 ops_out=3472 bytes_in=? bytes_out=?

1475875039.738664 peer=worker-3-5 host=10.0.40.16 events_in=3165
events_out=3165 ops_in=0 ops_out=3472 bytes_in=? bytes_out=?

1475875039.738664 peer=worker-1-4 host=10.0.40.18 events_in=3165
events_out=3165 ops_in=0 ops_out=3472 bytes_in=? bytes_out=?

1475875039.738664 peer=worker-5-10 host=10.0.40.19 events_in=3165
events_out=3165 ops_in=0 ops_out=3472 bytes_in=? bytes_out=?

1475875039.738664 peer=worker-2-5 host=10.0.40.17 events_in=3165
events_out=3165 ops_in=0 ops_out=3472 bytes_in=? bytes_out=?

1475875039.738664 peer=worker-4-4 host=10.0.40.15 events_in=3165
events_out=3165 ops_in=0 ops_out=3472 bytes_in=? bytes_out=?

1475875039.738664 peer=worker-1-2 host=10.0.40.18 events_in=3165
events_out=3165 ops_in=0 ops_out=3472 bytes_in=? bytes_out=?



On Fri, Oct 7, 2016 at 3:58 PM, Azoff, Justin S <jazoff at illinois.edu> wrote:

>
> > On Oct 7, 2016, at 4:40 PM, Darrain Waters <dwaters at bioteam.net> wrote:
> >
> > Hello
> >
> > The myricom cards in my cluster nodes are dropping packets, and I am not
> getting any log information in prefix/logs. Did I miss something during the
> setup process ? Please see below for initial info and please let me know
> what else is needed. Thank you.
> >
> > Darrain
> >
> > I compiled bro using the option below.
> > --with-pcap=/opt/snf/
> >
> > [bromgr at bromgr ~]$ ldd /usr/local/bro/bin/bro | grep pcap
> >
> > libpcap.so.1 => /opt/snf/lib/libpcap.so.1 (0x00007faf9c3d5000)
> >
>
> Looks good
>
> >
> > I get the following when I run capstats
> >
> > [BroControl] > capstats
> >
> >
> >
> > Interface             kpps       mbps       (10s average)
> >
> > ----------------------------------------
> >
> > worker-1-1: capstats failed (error: eth2: snf_ring_open_id(ring=-1)
> failed: Device or resource busy)
> >
> > worker-2-1: capstats failed (error: eth2: snf_ring_open_id(ring=-1)
> failed: Device or resource busy)
> >
> > worker-3-1: capstats failed (error: eth2: snf_ring_open_id(ring=-1)
> failed: Device or resource busy)
> >
> > worker-4-1: capstats failed (error: eth2: snf_ring_open_id(ring=-1)
> failed: Device or resource busy)
> >
> > worker-5-1: capstats failed (error: eth2: snf_ring_open_id(ring=-1)
> failed: Device or resource busy)
>
> This is normal.. capstats for snf never worked right (it could never work
> with snfv2 and with snfv3 it needs to set a different app id as bro,
> otherwise it can't capture at the same time as bro.  As long as bro is
> running and not failing with the same error you're ok.  There are better
> ways to get data out of a myricom card using the myricom tools as well.
>
> Your node.cfg looks mostly ok.  I would switch to only running 1 or 2
> proxies and just run them on the manager node.
>
> Why are you using 7,8,9,10,11,18,19,20,21,22 in particular?  What CPUs do
> you have?  This is potentially not doing what you intend.  Most likely 7/19
> 8/20 9/21 10/22 are the same cpu.
>
> Your underlying problem is probably that a firewall is enabled on your
> hosts and the worker processes can't reach the manager.  Daniel just wrote
> a good section on this for the manual:
>
>
> This section summarizes the network communication between Bro and
> BroControl,
> which is useful to understand if you need to reconfigure your firewall.  If
> your firewall is preventing Bro communication, then either the "deploy"
> command or the "peerstatus" command will fail.
>
> For a cluster setup, BroControl uses ssh to run commands on other hosts in
> the cluster, so the manager host needs to connect to TCP port 22 on each
> of the other hosts in the cluster.  Note that BroControl never attempts
> to ssh to the localhost, so in a standalone setup BroControl does not use
> ssh.
>
> Each instance of Bro in a cluster needs to communicate directly with other
> instances of Bro regardless of whether these instances are running on the
> same
> host or not.  Each proxy and worker needs to connect to the manager,
> and each worker needs to connect to one proxy.  If a logger node is
> defined,
> then each of the other nodes needs to connect to the logger.
>
> Note that you can change the port that Bro listens on by changing the value
> of the "BroPort" option in your ``broctl.cfg`` file (this should be needed
> only if your system has another process that listens on the same port).  By
> default, a standalone Bro listens on TCP port 47760.  For a cluster setup,
> the logger listens on TCP port 47761, and the manager listens on TCP port
> 47762
> (or 47761 if no logger is defined).  Each proxy is assigned its own port
> number, starting with one number greater than the manager's port.
> Likewise,
> each worker is assigned its own port starting one number greater than the
> highest port number assigned to a proxy.
>
> Finally, a few BroControl commands (such as "print" and "peerstatus") rely
> on broccoli to communicate with Bro.  This means that for those commands to
> function, BroControl needs to connect to each Bro instance.
>
>
> --
> - Justin Azoff
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ICSI.Berkeley.EDU/pipermail/bro/attachments/20161007/d2bc2a76/attachment-0001.html 


More information about the Bro mailing list