[Bro] Capture Loss

Tue Mar 14 19:49:17 PDT 2017

Glad to hear you're now on the right track!  You're very welcome.  FWIW I
think the other person on the other similar thread I copied my reply from
might not have known about installing PF_RING with DKMS so wanted to cover
possible kernel module issues etc..  I was going to guess your issue was
upstream based on what you described in your first email but I didn't want
to speculate too much heh.  Taps are def. the way to go if you have the
option to use them instead of SPAN ports, for sure.

Best regards,

-Drew

On Tue, Mar 14, 2017 at 5:47 PM, Arash Fallah <af7 at umbc.edu> wrote:

> Hey Drew,
>
> I've been on the list for over a year, I tried searching to see similar
> issues but I didn't find it. We are capturing from a span port, we have 3
> edge routers and tons of asymmetrical routing. We are experiencing packet
> loss at such a high rate, we believe the error might be upstream (thanks to
> Seth)! We are going to try passive taps instead of capturing from SPAN
> ports.
>
> PF_RING is installed with DKMS. All offloading has been disabled and I
> have been checking reporter.log for invalid checksums (none so far). CPU
> pinning is enabled. Though I did I did not know about ring slots for
> PF_RING, I do not think our network at 3Gbps requires increasing the
> threshold from my research.
>
> Thanks so much, you were on point with your questions.
>
> On Thu, Mar 9, 2017 at 4:27 PM, Drew Dixon <dwdixon at umich.edu> wrote:
>
>> Did you search the email list already or did you just join the list?  Are
>> you capturing the traffic from a SPAN port or a Tap?  Is your network full
>> of asymmetrical traffic/routing?  Answers to these two questions first is
>> pretty important IMO.  I responded to a very similar question around 6 days
>> ago or so on list...here's what I said again:
>>
>> _____________________________
>>
>> First I think the recommended number of workers is something like number
>> of *real* cores (not counting hyperthreading)  -2 so for 8 *real* cores you
>> would use 6 workers, if you have 16 *real* cores you probably want closer
>> to 14 workers if this is a dedicated bro box.  Maybe try bumping up your
>> number of workers and enabling cpu pinning if you haven't done so.
>>
>> Have you reviewed everything located here? :
>>
>> https://www.bro.org/documentation/faq.html#how-can-i-reduce-
>> the-amount-of-captureloss-or-dropped-packets-notices
>>
>> Specifically a few things come to mind...I know you mentioned NIC
>> settings but are you sure you disabled all the NIC offloading features
>> using ethtool?, more detail on that at this link:
>>
>> http://securityonion.blogspot.com/2011/10/when-is-full-packe
>> t-capture-not-full.html
>>
>> Also, wouldn't hurt to double check the the pf_ring kernel module is
>> loaded/loading staying loaded?  If you patch the server and the kernel gets
>> updated unless you have something automated to reload/reinstall the pf_ring
>> module you will probably need to reload the pf_ring module for the new
>> kernel...
>>
>> Also, did you configure the number of ring slots for PF_RING ?
>>
>> Check to be sure that /etc/modprobe.d/pf_ring.conf exists for your
>> PF_RING installation...this is where you will configure the number of ring
>> slots for PF_RING, the default is 4096 I believe but on busy networks this
>> needs to be increased as appropriate (in increments of 4096)...the max
>> value is 65534.  I would try that if you've tried everything else at the
>> first link above to no avail...
>>
>> This is also a great resource re: PF_RING and number of ring slots:
>>
>> https://groups.google.com/forum/#!topic/security-onion/zu7U7U9pBT8
>>
>> Hope this helps,
>>
>> -Drew
>> ____________________________
>>
>> On Tue, Mar 7, 2017 at 10:34 AM, Arash Fallah <af7 at umbc.edu> wrote:
>>
>>> I'm running Bro in a clustered configuration using PF_RING to have 8
>>> separate workers on one box. Additionally, I have commented out almost
>>> everything in the default local.bro to run in Bro as efficiently as
>>> possible. Together, these 8 workers are using less than 20% of total CPU
>>> capacity.
>>>
>>> However, we are experiencing capture loss consistently in the 50% range,
>>> even though CPUs are idle 80% of the time on average.
>>>
>>> Does anyone have any experience with this? I would greatly appreciate
>>> the help.
>>>
>>> _______________________________________________
>>> Bro mailing list
>>> bro at bro-ids.org
>>> http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/bro
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ICSI.Berkeley.EDU/pipermail/bro/attachments/20170314/c91a8fd5/attachment-0001.html