[Zeek] capture_loss vs. pkts_dropped vs. missed_bytes

Thu May 2 10:01:36 PDT 2019

I am still tuning our new Zeek cluster: an Arista switch for load balancing
with 4x10 Gbps links from a Gigamon and 10 Gbps links to the sensors, five
sensors (16 physical cores with 128 GB RAM each) using af_packet, 15
workers per sensor, and a separate management node running the manager,
logger, proxy, and storage (XFS on RAID-0 with 8 7200 RPM spindles, 256 GB
RAM). Output is JSON (for feeding into an ElasticStack later).

The average capture loss was <1%  early on with spikes to 50-70%. We
increased the af_packet_buffer_size from the default (128MB) to 2GB and
capture_loss is gone.
$ zcat capture_loss.10\:00\:00-11\:00\:00.log.gz | jq .percent_lost |
statgen
 Count         Min         Max         Avg      StdDev
   300      0.0000      0.0000      0.0000      0.0000

Next, I looked at the missing bytes from the conn.log which doesn't look
too bad:
$ zcat conn.10\:00\:00-11\:00\:00.log.gz | jq .missed_bytes | statgen
 Count         Min         Max         Avg      StdDev
  5488      0.0000   5802.0000      1.7332     92.9547
Out of the 5488 records, only two were non-zero (5802 and 3710)  and for
both of those the missed_bytes == resp_bytes (service: ssl).

But even with the above, the pkts_dropped in stats.log is extremely high:
$ zcat stats.10\:00\:00-11\:00\:00.log.gz | jq .pkts_dropped | grep -v null
| statgen
 Count         Min         Max         Avg      StdDev
   900     3564854    18216752  5762446.99  1591145.34

So even though there was no capture_loss and almost no missing_bytes, the
pkts_dropped is huge. Is this something to be concerned about? If so, I am
not sure how to go about figuring out the problem. What should I do next?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ICSI.Berkeley.EDU/pipermail/zeek/attachments/20190502/e15a0026/attachment.html