[Bro] 5 node cluster
Darrain Waters
dwaters at bioteam.net
Fri Oct 7 15:27:47 PDT 2016
Sorry, yeah I am getting comm logs and stderr on the manager. I do have two
NICS enabled on each system, one for management with IP and the other is
the myricom with no IP and in sniffer mode.
Each of the workers do have the spool wirker directories but they are empty.
I use to be able to run this on the manager
[bromgr at bromgr etc]$ sudo tcpdump -i eth2
tcpdump: snf_ring_open_id(ring=-1) failed: Device or resource busy
[BroControl] > netstats
worker-1-1: 1475878452.092051 recvd=1 dropped=17260812 link=17260813
worker-1-10: 1475878452.292009 recvd=1 dropped=17260812 link=17260813
worker-1-2: 1475878452.493003 recvd=1 dropped=17260812 link=17260813
worker-1-3: 1475878452.693975 recvd=1 dropped=17260812 link=17260813
worker-1-4: 1475878452.895009 recvd=1 dropped=17260812 link=17260813
worker-1-5: 1475878453.095000 recvd=1 dropped=17260812 link=17260813
worker-1-6: 1475878453.296049 recvd=1 dropped=17260812 link=17260813
worker-1-7: 1475878453.497139 recvd=1 dropped=17260812 link=17260813
worker-1-8: 1475878453.697990 recvd=1 dropped=17260812 link=17260813
worker-1-9: 1475878453.897974 recvd=1 dropped=17260812 link=17260813
worker-2-1: 1475878450.084311 recvd=1 dropped=43750502 link=43750503
worker-2-10: 1475878450.285335 recvd=1 dropped=43750502 link=43750503
worker-2-2: 1475878450.485317 recvd=1 dropped=43750502 link=43750503
worker-2-3: 1475878450.686430 recvd=1 dropped=43750502 link=43750503
worker-2-4: 1475878450.887373 recvd=1 dropped=43750502 link=43750503
worker-2-5: 1475878451.088348 recvd=1 dropped=43750502 link=43750503
worker-2-6: 1475878451.288262 recvd=1 dropped=43750502 link=43750503
worker-2-7: 1475878451.489370 recvd=1 dropped=43750502 link=43750503
worker-2-8: 1475878451.689311 recvd=1 dropped=43750502 link=43750503
worker-2-9: 1475878451.890323 recvd=1 dropped=43750502 link=43750503
worker-3-1: 1475878448.077118 recvd=1 dropped=9847880 link=9847881
worker-3-10: 1475878448.278158 recvd=1 dropped=9847880 link=9847881
worker-3-2: 1475878448.479115 recvd=1 dropped=9847880 link=9847881
worker-3-3: 1475878448.679110 recvd=1 dropped=9847880 link=9847881
worker-3-4: 1475878448.880134 recvd=1 dropped=9847880 link=9847881
worker-3-5: 1475878449.081098 recvd=1 dropped=9847880 link=9847881
worker-3-6: 1475878449.281137 recvd=1 dropped=9847880 link=9847881
worker-3-7: 1475878449.482134 recvd=1 dropped=9847880 link=9847881
worker-3-8: 1475878449.683136 recvd=1 dropped=9847880 link=9847881
worker-3-9: 1475878449.884120 recvd=1 dropped=9847880 link=9847881
worker-4-1: 1475878446.070765 recvd=1 dropped=14367380 link=14367381
worker-4-10: 1475878446.271782 recvd=1 dropped=14367380 link=14367381
worker-4-2: 1475878446.472749 recvd=1 dropped=14367380 link=14367381
worker-4-3: 1475878446.672736 recvd=1 dropped=14367380 link=14367381
worker-4-4: 1475878446.873773 recvd=1 dropped=14367380 link=14367381
worker-4-5: 1475878447.074779 recvd=1 dropped=14367380 link=14367381
worker-4-6: 1475878447.274758 recvd=1 dropped=14367380 link=14367381
worker-4-7: 1475878447.475787 recvd=1 dropped=14367380 link=14367381
worker-4-8: 1475878447.676719 recvd=1 dropped=14367380 link=14367381
worker-4-9: 1475878447.876731 recvd=1 dropped=14367380 link=14367381
On Fri, Oct 7, 2016 at 4:35 PM, Azoff, Justin S <jazoff at illinois.edu> wrote:
>
> > On Oct 7, 2016, at 5:18 PM, Darrain Waters <dwaters at bioteam.net> wrote:
> >
> > Thanks for the quick reply. I put proxy on everything because I was
> grabbing at straws. I did only have 1 proxy and it was on the manager with
> the same results.
> >
> >
> > Why are you using 7,8,9,10,11,18,19,20,21,22 in particular? What CPUs
> do you have? This is potentially not doing what you intend. Most likely
> 7/19 8/20 9/21 10/22 are the same cpu.
> >
> > Those are the core that are with node 1 and node 1 is associated with
> the myricom card.
> >
> > [bromgr at bromgr 2016-10-07]$ lscpu
> >
> > Architecture: x86_64
> >
> > CPU op-mode(s): 32-bit, 64-bit
> >
> > Byte Order: Little Endian
> >
> > CPU(s): 24
> >
> > On-line CPU(s) list: 0-23
> >
> > Thread(s) per core: 2
> >
> > Core(s) per socket: 6
>
> I see. You have 2 6 core cpus with hyper threading. So those are the two
> sets of cpus that make up each hypertheading pair. We haven't gotten to do
> performance testing for this yet, but you might get better performance by
> just using 2,3,4,5,6,7,8,9,10,11. It's the tradeoff between having to copy
> half of the packets across to the other numa node, but using more of the
> 'real' cores and less of the hyper threading ones.
>
> >
> > Your underlying problem is probably that a firewall is enabled on your
> hosts and the worker processes can't reach the manager.
> > I have ip6 & iptables off
>
> On all the machines? "everything is working but there are no logs" almost
> always turns out to be firewall rules. The last time it turned out that
> another admin had re-enabled the firewall.. :-)
>
> One thing to check for that are the logs written to the spool/ on each
> worker. There will be a local communication.log for each worker that may
> be complaining about something.
>
> Now that I reread your first message I see "I am not getting any log
> information in prefix/logs". Do you mean that there are literally no log
> files in there? under current/ you should at least have stderr.log and
> communication.log. If you literally have no log files you may have some
> permission issues if you are not running bro as root.
>
> You can also run tcpdump on the manager and see if the workers are even
> trying to send it anything.
>
> > peerstatus
> >
> >
> >
> > [BroControl] > peerstatus
> >
> > manager
> >
> > 1475875039.738664 peer=worker-2-2 host=10.0.40.17 events_in=3165
> events_out=3165 ops_in=0 ops_out=3472 bytes_in=? bytes_out=?
> >
> > 1475875039.738664 peer=worker-1-3 host=10.0.40.18 events_in=3165
> events_out=3165 ops_in=0 ops_out=3472 bytes_in=? bytes_out=?
> >
> > 1475875039.738664 peer=proxy-2 host=10.0.40.17 events_in=3165
> events_out=3165 ops_in=0 ops_out=3472 bytes_in=? bytes_out=?
> >
> > 1475875039.738664 peer=proxiy-5 host=10.0.40.19 events_in=3165
> events_out=3165 ops_in=0 ops_out=3472 bytes_in=? bytes_out=?
> >
> > 1475875039.738664 peer=worker-3-4 host=10.0.40.16 events_in=3165
> events_out=3165 ops_in=0 ops_out=3472 bytes_in=? bytes_out=?
> >
> > 1475875039.738664 peer=worker-3-3 host=10.0.40.16 events_in=3165
> events_out=3165 ops_in=0 ops_out=3472 bytes_in=? bytes_out=?
> >
> That appears normal.. I'm not sure what bytes_in and bytes_out were
> supposed to be.. it doesn't look like we output that anymore.
>
> What does 'broctl netstats' show?
>
>
>
> --
> - Justin Azoff
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ICSI.Berkeley.EDU/pipermail/bro/attachments/20161007/a25a83b2/attachment-0001.html
More information about the Bro
mailing list