[Bro] High-CPU on just a single worker in the cluster

Dave Crawford bro at pingtrip.com
Wed Apr 13 16:03:19 PDT 2016


I'm in the process of trying to debug an odd high-cpu issue and looking for guidance.

The deployment is a follows:
  - Cluster has with two nodes, each with 10 workers and the workers are pinned to specific cpu cores.
  - x520 with PF_RING
  - Traffic to each node is load balanced equally

The issue is that one worker on one of the nodes is always at 100% CPU while all other workers are around 50%. If I restart Bro a different worker will pin to 100%, but always on the same node.

I ran 'strace' on both a "bad" and "good" worker and one anomaly I spotted was that the "bad" worker never called 'nanosleep', whereas the "good" worker had about 84,000 'nanosleep' calls in the same amount of time.

I'm wondering if its possible for a queue to go bad on the x520, which might explain why its a random worker on the same node after restarting.

Is there a way to determine which x520 queue a specific worker is reading from? 

Thanks,
-Dave




More information about the Bro mailing list