[Bro-Dev] [Bro-Commits] [git/bro] topic/actor-system: First-pass broker-enabled Cluster scripting API + misc. (07ad06b)
Aashish Sharma
asharma at lbl.gov
Thu Nov 2 11:37:46 PDT 2017
My view:
I have again and again encountered 4 types cases while doing script/pkg work:
1) manager2worker: Input-framework reads external data and all workers need to see it.
examples: intel-framework,
2) worker2manager: workers see something report to manager, manager keeps
aggregated counts to make decisions
example: scan-detection
3) worker2manager2all-workers: workers see something, send to manager, manager
distributes to all workers
example: tracking clicked URLs from extracted from email
Basically, Bro has two kinds of heuristic needs
a) Cooked data analysis and corelations - cooked data is the data which ends up
in logs - basically the entire 'protocol record' example c$http or c$smtp -
these are majority.
Cooked data processing functionality can be also interpreted, for simplicity) as
:
tail -f blah.log | ./python-script
but inside bro.
b) Raw or derived data - which you need to extract from traffic with a defined
policy of your own (example - extracted URLs from email tapping into
mime_data_all event) or extracting mac addresses from router
advertisements/solicitation events or something which is not yet in ::Info
record or a new 'thing' - this should be rare and few use cases over time.
So in short, give me reliable events which are simply tail -f log functionality
on a data/processing node. It will reduce the number of syncronization needs by
order of magnitude(s).
for (b) - raw or derived data, we can keep complexities of broker stores and
syncs. etc. but I have hopes that a refined raw data could become its own log
easily and be processed as cooked data.
So a lot of data centrality issues related to cluster can go away with data
note which can handle a lot of cooked data related stuff for (1), (2) and in
somecases (3).
Now, while Justins' multiple data nodes idea has specticular merits, I am not much fan of it. Reason being having multiple data-notes results in same sets of problems - syncronization, latencies, mess of data2worker, worker2data events etc etc.
I'd love to keep things rather simple. Cooked data goes to one (or more) datanodes (datastores). Just replicate for relibaility rather then pick and choose what goes where.
Just picking up some things:
> > In the case of broadcasting from a worker to all other workers, the reason why you relay via another node is only because workers are not connected to each other? Do we know that a fully-connected cluster is a bad idea? i.e. why not have a worker able to broadcast directly to all other workers if that’s what is needed?
>
> Mostly so that workers don't end up spending all their time sending out messages when they should be analyzing packets.
Yes, Also, I have seen this can case broadcast stroms. Thats why I have always
used manager as a central judge on what goes. See, often same data is seen by
all workers. so if manager is smart, it can just send first instance to workers
and all other workers can stop announcing further.
Let me explain:
- I block a scanner on 3 connections.
- 3 workers see a connection each - they each report to manager
- manager says "yep scanner" sends note to all workers saying traffic from this
IP is now uninteresting stop reporting.
- lets say 50 workers
- total commnication events = 53
If all workers send data to all workers a scanner hitting 65,000 hosts will be a
mess inside cluster. esp when scanners are hitting in ms and not seconds.
Similar to this is another case.
lets say
- I read 1 million blacklisted IPs from a file on manager.
- manager sends 1 million X 50 events ( to 50 workers)
- each worker needs to report if a blacklisted IP has touched network
- now imagine, if we want to keep a count of how many unique local IPs has each
of these blacklisted IPs touched
- and at what rate and when was first contact and when was last contact.
(btw, I have a working script for this - so whatever new broker does, it needs
to be able to give me this functionality)
Here is a sample log:
#fields ts ipaddr ls days_seen first_seen last_seen active_for last_active hosts total_conns source
1509606970.541130 185.87.185.45 Blacklist::ONGOING 3 1508782518.636892 1509462618.466469 07-20:55:00 01-16:05:52 20 24 TOR
1509606980.542115 46.166.162.53 Blacklist::ONGOING 3 1508472908.494320 1509165782.304233 08-00:27:54 05-02:33:18 7 9 TOR
1509607040.546524 77.161.34.157 Blacklist::ONGOING 3 1508750181.852639 1509481945.439893 08-11:16:04 01-10:44:55 7 9 TOR
1509607050.546742 45.79.167.181 Blacklist::ONGOING 4 1508440578.524377 1508902636.365934 05-08:20:58 08-03:40:14 66 818 TOR
1509607070.547143 192.36.27.7 Blacklist::ONGOING 6 1508545003.176139 1509498930.174750 11-00:58:47 01-06:02:20 30 33 TOR
1509607070.547143 79.137.80.94 Blacklist::ONGOING 6 1508606207.881810 1509423624.519253 09-11:03:37 02-02:57:26 15 16 TOR
Aashish
Aashish
On Thu, Nov 02, 2017 at 05:58:31PM +0000, Azoff, Justin S wrote:
>
> > On Nov 2, 2017, at 1:22 PM, Siwek, Jon <jsiwek at illinois.edu> wrote:
> >
> >
> >> On Nov 1, 2017, at 6:11 PM, Azoff, Justin S <jazoff at illinois.edu> wrote:
> >>
> >> - a bif/function for efficiently broadcasting an event to all other workers (or data nodes)
> >> - If the current node is a data node, just send it to all workers
> >> - otherwise, round robin the event to a data node and have it send it to all workers minus the current node.
> >
> > In the case of broadcasting from a worker to all other workers, the reason why you relay via another node is only because workers are not connected to each other? Do we know that a fully-connected cluster is a bad idea? i.e. why not have a worker able to broadcast directly to all other workers if that’s what is needed?
>
> Mostly so that workers don't end up spending all their time sending out messages when they should be analyzing packets.
>
> >> If &synchronized is going away script writers should be able to broadcast an event to all workers by doing something like
> >>
> >> Cluster::Broadcast(Cluster::WORKERS, event Foo(42));
> >>
> >> This would replace a ton of code that currently uses things like worker2manager_events+manager2worker_events+ at if ( Cluster::local_node_type() == Cluster::MANAGER )
> >
> > The successor to &synchronized was primarily intended to be the new data store stuff, so is there a way to map what you need onto that functionality? Or can you elaborate on an example where you think this new broadcast pattern is a better way to replace &synchronized than using a data store?
> >
> > - Jon
>
> I think a shared data store would work for most of the use cases where people are messing with worker2manager_events.
>
> If all the cases of people using worker2manager_events+manager2worker_events to mimic broadcast functionality are really just
> doing so to update data then it does make sense to just replace all of that with a new data store.
>
> How would something like policy/protocols/ssl/validate-certs.bro look with intermediate_cache as a data store?
>
>
> —
> Justin Azoff
>
>
> _______________________________________________
> bro-dev mailing list
> bro-dev at bro.org
> http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev
More information about the bro-dev
mailing list