<html>


<head>


<meta http-equiv="Content-Type" content="text/html; charset=utf-8">


</head>


<body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">


<br class="">


<blockquote type="cite" class="">On Nov 3, 2017, at 3:13 PM, Jan Grashöfer &lt;<a href="mailto:jan.grashoefer@gmail.com" class="">jan.grashoefer@gmail.com</a>&gt; wrote:<br class="">


<br class="">


On 03/11/17 18:07, Azoff, Justin S wrote:&gt; Partitioning the intel data set is a little tricky since it supports subnets and hashing 10.10.0.0/16<br class="">


<blockquote type="cite" class="">and 10.10.10.10 won't necessarily give you the same node. &nbsp;Maybe subnets need to exist on all<br class="">


nodes but everything else can be partitioned? &nbsp;<br class="">


</blockquote>


<br class="">


Good point! Subnets are stored kind of separate to allow prefix matches anyway. However, I am a bit hesitant as it would become a quite complex setup.<br class="">


</blockquote>


<div class=""><br class="">


</div>


Indeed.. &nbsp;replication&#43;load balancing is probably a good enough first step.<br class="">


<br class="">


<blockquote type="cite" class="">


<blockquote type="cite" class="">There would also need to be a method for<br class="">


re-distributing the data if the cluster configuration changes due to nodes being added or removed.<br class="">


</blockquote>


<br class="">


Right, that's exactly what I was thinking of. I guess this applies also to other use cases which will use HRW. I am just not sure whether dynamic layout changes are out of scope at the moment...<br class="">


</blockquote>


<div class=""><br class="">


</div>


<div class="">Other use cases are still problematic, but even without replication/redistribution the situation is still greatly improved.</div>


<div class="">Take scan detection for example:</div>


<div class=""><br class="">


</div>


<div class="">With sumstats/scan-ng/simple-scan if the current manager host or process dies, all detection comes to a halt</div>


<div class="">until it is restarted. &nbsp;Once it is restarted, all state is lost so everything starts over from 0.</div>


<div class=""><br class="">


</div>


<div class="">If there were 4 data nodes participating in scan detection, and all 4 die, same result, so this is no better or</div>


<div class="">worse than the current situation.</div>


<div class="">If only one node dies though, only 1/4 of the analysis is affected. The remaining analysis can immediately</div>


<div class="">fail over to the next node. So while it may still have to start from 0, there would only be a small hole in the analysis.</div>


<div class=""><br class="">


</div>


<div class="">For example:</div>


<div class=""><br class="">


</div>


<div class="">The scan threshold is 20 packets.<br class="">


</div>


<div class="">A scan has just started from 10.10.10.10.&nbsp;</div>


<div class="">10 packets into the scan, the data node that 10.10.10.10 hashes to crashes.</div>


<div class="">HRW now routes data for 10.10.10.10 to another node</div>


<div class="">30 packets into the scan, the threshold on the new node crosses 20 and a notice is raised.</div>


<div class=""><br class="">


</div>


<div class="">Replication between data nodes could make this even more seamless, but it's not a huge priority, at least for me.</div>


<div class="">My priority is getting the cluster to a point where things don't grind to a halt just because one component is down.</div>


<div class=""><br class="">


</div>


<div class="">Ignoring the worker-&gt;logger connections, it would look something like the attached layout.png</div>


<div class=""><br class="">


</div>


<div class=""><img apple-inline="yes" id="58C912E2-06E1-4AC9-BEB8-1E96C04F75CB" src="cid:4B1B7729-7A8D-483C-83A8-04E1783FE0AE@home" class=""></div>


<div class=""><br class="">


</div>


<blockquote type="cite" class="">Fully agreed! In that case it might be nice if one can define separate special purpose data nodes, e.g. &quot;intel data nodes&quot;. But, I am not sure whether this is a good idea as this might lead to complex cluster&nbsp;definitions and


 poor usability as users need to know a bit about how the underlying mechanisms work. On the other hand this would theoretically allow to completely decouple the intel data store (e.g. interface a&nbsp;&quot;real&quot; database with some pybroker-scripts).<br class="">


<br class="">


Jan<br class="">


</blockquote>


<div class=""><br class="">


</div>


<div class="">I've been thinking the same thing, but I hope it doesn't come to that. &nbsp;Ideally people will be able</div>


<div class="">to scale their clusters by just increasing the number of data nodes without having to get into</div>


<div class="">the details about what node is doing what.</div>


<div class=""><br class="">


</div>


<div class="">Partitioning the data analysis by task has been suggested.. i.e., one data node for scan detection,</div>


<div class="">one data node for spam detection, one data node for sumstats.. I think this would be very easy to</div>


<div class="">implement, but it doesn't do anything to help scale out those individual tasks once one process can</div>


<div class="">no longer handle the load. &nbsp;You would just end up with something like the scan detection and spam</div>


<div class="">data nodes at 20% cpu and the sumstats node CPU at 100%</div>


<div class=""><br class="">


</div>


<div class=""><br class="">


</div>


<div class="">—&nbsp;<br class="">


Justin Azoff<br class="">


<br class="">


</div>


</body>


</html>