[Bro-Dev] Broker has landed in master, please test

Azoff, Justin S jazoff at illinois.edu
Fri Jun 15 11:25:17 PDT 2018

> On Jun 1, 2018, at 11:37 AM, Azoff, Justin S <jazoff at illinois.edu> wrote:
> I could never figure out what was causing the problem, and it's possible that &synchronized not doing anything anymore is why it's better now.  I'm mostly using &synchronized for syncing input files across all the workers and one of them does have 300k entries in it.  That file is fairly constant though, only a few k changes every 5 minutes and nothing that should use 20G of ram.

FWIW, I figured out what was causing this problem.  While the file wasn't changing that much, I was using something like

    curl -o file.new $URL && mv file.new file.csv

to download the file, and apparently unless you pass -f to curl, it doesn't actually exit with a non-zero status code on server errors.

This was causing a server error page to be written to the csv file every now and then.  When this happened:

* the input reader would throw a warning that the file couldn't be parsed, and clear out the set
* bro would then clear the set, triggering a removal of 300k items across all nodes (56 in the case of the test cluster)
* 5 minutes later the next download would work
* bro would then fill back in the set, and trigger 300k items to be synced to all 56 nodes again.

so within 5 minutes, 300,000*56*2 updates would be kicked off, which is 33million updates.  This seemed to max out the proxies for 30 minutes.
The raw size of the data is only ~4M, or 261M total, which makes it a little crazy that memory usage would blow up by dozens of gigabytes of ram.

&synchronized not having an effect in master made this problem go away, and adding a -f to curl on our pre-broker clusters fixed those too.

All the more reason to port the method of distributing the data off of &synchronized.  I think I will just run the curl command on the worker nodes too,
effectively replacing &synchronized with curl.

Justin Azoff

