[Bro] 5 node cluster

Darrain Waters dwaters at bioteam.net
Fri Oct 7 15:27:47 PDT 2016


Sorry, yeah I am getting comm logs and stderr on the manager. I do have two
NICS enabled on each system, one for management with IP and the other is
the myricom with no IP and in sniffer mode.

Each of the workers do have the spool wirker directories but they are empty.

I use to be able to run this on the manager

[bromgr at bromgr etc]$ sudo tcpdump -i eth2

tcpdump: snf_ring_open_id(ring=-1) failed: Device or resource busy


[BroControl] > netstats

 worker-1-1: 1475878452.092051 recvd=1 dropped=17260812 link=17260813

worker-1-10: 1475878452.292009 recvd=1 dropped=17260812 link=17260813

 worker-1-2: 1475878452.493003 recvd=1 dropped=17260812 link=17260813

 worker-1-3: 1475878452.693975 recvd=1 dropped=17260812 link=17260813

 worker-1-4: 1475878452.895009 recvd=1 dropped=17260812 link=17260813

 worker-1-5: 1475878453.095000 recvd=1 dropped=17260812 link=17260813

 worker-1-6: 1475878453.296049 recvd=1 dropped=17260812 link=17260813

 worker-1-7: 1475878453.497139 recvd=1 dropped=17260812 link=17260813

 worker-1-8: 1475878453.697990 recvd=1 dropped=17260812 link=17260813

 worker-1-9: 1475878453.897974 recvd=1 dropped=17260812 link=17260813

 worker-2-1: 1475878450.084311 recvd=1 dropped=43750502 link=43750503

worker-2-10: 1475878450.285335 recvd=1 dropped=43750502 link=43750503

 worker-2-2: 1475878450.485317 recvd=1 dropped=43750502 link=43750503

 worker-2-3: 1475878450.686430 recvd=1 dropped=43750502 link=43750503

 worker-2-4: 1475878450.887373 recvd=1 dropped=43750502 link=43750503

 worker-2-5: 1475878451.088348 recvd=1 dropped=43750502 link=43750503

 worker-2-6: 1475878451.288262 recvd=1 dropped=43750502 link=43750503

 worker-2-7: 1475878451.489370 recvd=1 dropped=43750502 link=43750503

 worker-2-8: 1475878451.689311 recvd=1 dropped=43750502 link=43750503

 worker-2-9: 1475878451.890323 recvd=1 dropped=43750502 link=43750503

 worker-3-1: 1475878448.077118 recvd=1 dropped=9847880 link=9847881

worker-3-10: 1475878448.278158 recvd=1 dropped=9847880 link=9847881

 worker-3-2: 1475878448.479115 recvd=1 dropped=9847880 link=9847881

 worker-3-3: 1475878448.679110 recvd=1 dropped=9847880 link=9847881

 worker-3-4: 1475878448.880134 recvd=1 dropped=9847880 link=9847881

 worker-3-5: 1475878449.081098 recvd=1 dropped=9847880 link=9847881

 worker-3-6: 1475878449.281137 recvd=1 dropped=9847880 link=9847881

 worker-3-7: 1475878449.482134 recvd=1 dropped=9847880 link=9847881

 worker-3-8: 1475878449.683136 recvd=1 dropped=9847880 link=9847881

 worker-3-9: 1475878449.884120 recvd=1 dropped=9847880 link=9847881

 worker-4-1: 1475878446.070765 recvd=1 dropped=14367380 link=14367381

worker-4-10: 1475878446.271782 recvd=1 dropped=14367380 link=14367381

 worker-4-2: 1475878446.472749 recvd=1 dropped=14367380 link=14367381

 worker-4-3: 1475878446.672736 recvd=1 dropped=14367380 link=14367381

 worker-4-4: 1475878446.873773 recvd=1 dropped=14367380 link=14367381

 worker-4-5: 1475878447.074779 recvd=1 dropped=14367380 link=14367381

 worker-4-6: 1475878447.274758 recvd=1 dropped=14367380 link=14367381

 worker-4-7: 1475878447.475787 recvd=1 dropped=14367380 link=14367381

 worker-4-8: 1475878447.676719 recvd=1 dropped=14367380 link=14367381

 worker-4-9: 1475878447.876731 recvd=1 dropped=14367380 link=14367381

On Fri, Oct 7, 2016 at 4:35 PM, Azoff, Justin S <jazoff at illinois.edu> wrote:

>
> > On Oct 7, 2016, at 5:18 PM, Darrain Waters <dwaters at bioteam.net> wrote:
> >
> > Thanks for the quick reply. I put proxy on everything because I was
> grabbing at straws. I did only have 1 proxy and it was on the manager with
> the same results.
> >
> >
> > Why are you using 7,8,9,10,11,18,19,20,21,22 in particular?  What CPUs
> do you have?  This is potentially not doing what you intend.  Most likely
> 7/19 8/20 9/21 10/22 are the same cpu.
> >
> > Those are the core that are with node 1 and node 1 is associated with
> the myricom card.
> >
> > [bromgr at bromgr 2016-10-07]$ lscpu
> >
> > Architecture:          x86_64
> >
> > CPU op-mode(s):        32-bit, 64-bit
> >
> > Byte Order:            Little Endian
> >
> > CPU(s):                24
> >
> > On-line CPU(s) list:   0-23
> >
> > Thread(s) per core:    2
> >
> > Core(s) per socket:    6
>
> I see.  You have 2 6 core cpus with hyper threading.  So those are the two
> sets of cpus that make up each hypertheading pair.  We haven't gotten to do
> performance testing for this yet, but you might get better performance by
> just using 2,3,4,5,6,7,8,9,10,11.  It's the tradeoff between having to copy
> half of the packets across to the other numa node, but using more of the
> 'real' cores and less of the hyper threading ones.
>
> >
> > Your underlying problem is probably that a firewall is enabled on your
> hosts and the worker processes can't reach the manager.
> > I have ip6 & iptables off
>
> On all the machines?  "everything is working but there are no logs" almost
> always turns out to be firewall rules.  The last time it turned out that
> another admin had re-enabled the firewall.. :-)
>
> One thing to check for that are the logs written to the spool/ on each
> worker.  There will be a local communication.log for each worker that may
> be complaining about something.
>
> Now that I reread your first message I see "I am not getting any log
> information in prefix/logs".  Do you mean that there are literally no log
> files in there?  under current/ you should at least have stderr.log and
> communication.log.  If you literally have no log files you may have some
> permission issues if you are not running bro as root.
>
> You can also run tcpdump on the manager and see if the workers are even
> trying to send it anything.
>
> > peerstatus
> >
> >
> >
> > [BroControl] > peerstatus
> >
> >     manager
> >
> > 1475875039.738664 peer=worker-2-2 host=10.0.40.17 events_in=3165
> events_out=3165 ops_in=0 ops_out=3472 bytes_in=? bytes_out=?
> >
> > 1475875039.738664 peer=worker-1-3 host=10.0.40.18 events_in=3165
> events_out=3165 ops_in=0 ops_out=3472 bytes_in=? bytes_out=?
> >
> > 1475875039.738664 peer=proxy-2 host=10.0.40.17 events_in=3165
> events_out=3165 ops_in=0 ops_out=3472 bytes_in=? bytes_out=?
> >
> > 1475875039.738664 peer=proxiy-5 host=10.0.40.19 events_in=3165
> events_out=3165 ops_in=0 ops_out=3472 bytes_in=? bytes_out=?
> >
> > 1475875039.738664 peer=worker-3-4 host=10.0.40.16 events_in=3165
> events_out=3165 ops_in=0 ops_out=3472 bytes_in=? bytes_out=?
> >
> > 1475875039.738664 peer=worker-3-3 host=10.0.40.16 events_in=3165
> events_out=3165 ops_in=0 ops_out=3472 bytes_in=? bytes_out=?
> >
> That appears normal.. I'm not sure what bytes_in and bytes_out were
> supposed to be.. it doesn't look like we output that anymore.
>
> What does 'broctl netstats' show?
>
>
>
> --
> - Justin Azoff
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ICSI.Berkeley.EDU/pipermail/bro/attachments/20161007/a25a83b2/attachment-0001.html 


More information about the Bro mailing list