[Bro] High-CPU on just a single worker in the cluster
bro at pingtrip.com
Wed Apr 13 16:03:19 PDT 2016
I'm in the process of trying to debug an odd high-cpu issue and looking for guidance.
The deployment is a follows:
- Cluster has with two nodes, each with 10 workers and the workers are pinned to specific cpu cores.
- x520 with PF_RING
- Traffic to each node is load balanced equally
The issue is that one worker on one of the nodes is always at 100% CPU while all other workers are around 50%. If I restart Bro a different worker will pin to 100%, but always on the same node.
I ran 'strace' on both a "bad" and "good" worker and one anomaly I spotted was that the "bad" worker never called 'nanosleep', whereas the "good" worker had about 84,000 'nanosleep' calls in the same amount of time.
I'm wondering if its possible for a queue to go bad on the x520, which might explain why its a random worker on the same node after restarting.
Is there a way to determine which x520 queue a specific worker is reading from?
More information about the Bro