[Zeek] Zeek Myricom port aggregation

Greg Grasmehr greg.grasmehr at caltech.edu
Wed Jun 5 15:30:35 PDT 2019


Other problem is I am running timemachine on this box, I think it would
be a little dicey trying to run two separate instances of Zeek and TM -
not even sure running two instances of TM would work...

On 06/05/19 15:22:13, Greg Grasmehr wrote:
> Thanks for that Justin, however I am merging two ports on a single card
> and the reason for doing so is the Arista has 2 x 10G taps and
> aggregating those links within the switch causes drops on the switch due
> to periodic microbursts that exceed the bandwidth to a single 10G tool
> port.  The other option of aggregating the traffic in the switch to the
> 40G ports as a tool port isn't a viable solution.
> 
> This is really vexing because tcpdump has 0 trouble reading the
> aggregated ports, so while the kernel panic error information points to
> the SNF software as a cause; I wonder...
> 
> Greg
> 
> On 06/05/19 14:39:43, Justin Azoff wrote:
> > Oh, I forgot to send you the recommended configuration for 2 cards in one
> > box..
> > 
> > Most likely you don't need to be merging the ports... as long as the arista
> > or something is merging the flows for you each port is already getting a
> > consistent subset of flows.  At that point the on card aggregation doesn't
> > do anything for you.
> > 
> > I would use a configuration like this:
> > 
> > [node-foo-card1]
> > interface = p1p1
> > lb_method=myricom
> > lb_procs=9
> > #check hwloc for numa/pci info
> > pin_cpus=1,3,5,7,9,...
> > 
> > [node-foo-card2]
> > interface = p2p1
> > lb_method=myricom
> > lb_procs=9
> > #check hwloc for numa/pci info
> > pin_cpus=2,4,6,8,...
> > 
> > the hardest part is using the right pin_cpus settings.  It's a little
> > easier if you disable HT and then check to see which card is attached to
> > which cpu using hwloc.  sometimes it doesn't matter much, but on some
> > motherboards you can make sure you match up pci slots to physical cpus to
> > avoid moving data between the numa nodes.
> > 
> > 
> > 
> > On Wed, Jun 5, 2019 at 2:32 PM Greg Grasmehr <greg.grasmehr at caltech.edu>
> > wrote:
> > 
> > > Just an update:
> > >
> > > I contacted Myricom support about this issue a while back and haven't
> > > heard anything in a while from them so I believe they are unable to
> > > duplicate it perhaps, as they generally fix kernel problems very quickly
> > > in my experience.
> > >
> > > Fortunately I will be swapping drives in an array and will need to take
> > > Zeek down, so I will experiment for a bit before bringing it back up and
> > > see if I can figure out what the issue is.
> > >
> > > This kind of experimentation is very difficult when you don't have a dev
> > > system to test on.  :P
> > >
> > > Greg
> > >
> > > On 05/15/19 15:04:40, Greg Grasmehr wrote:
> > > > tcpdump works perfectly with aggregation, no issues
> > > >
> > > > On 05/15/19 17:35:56, Justin Azoff wrote:
> > > > > That looks like a bug in the myricom Driver and not zeek.  Can you
> > > > > reproduce the same kernel issue using tcpdump?  You configure
> > > > > aggregation for that using SNF_FLAGS:
> > > > >
> > > > > SNF_FLAGS=0x2 (Port aggregation (or merging))
> > > > > Flag 0x2 says that the port number that is passed to an application is
> > > actually
> > > > > a mask of port, not just one port.
> > > > > For example, when using tcpdump:
> > > > > export SNF_FLAGS=0x2
> > > > > env SNF_FLAGS=0x2 /path/to/tcpdump -i snf3
> > > > >
> > > > > Without SNF_FLAGS=0x2, you would actually try to open snf port 3 (which
> > > > > may not exist if you only have one adapter.)
> > > > >
> > > > >
> > > > > It's possible that you don't need to use aggregation in the first
> > > > > place,  That is generally only needed if you are connecting a fiber
> > > > > tap directly into a card.  If flows are being load balanced across
> > > > > multiple ports you can just run two different sets of workers, one for
> > > > > each port
> > > > >
> > > > > On Wed, May 15, 2019 at 2:17 PM Greg Grasmehr <
> > > greg.grasmehr at caltech.edu> wrote:
> > > > > >
> > > > > > Hello,
> > > > > >
> > > > > > Hoping someone has some insight into whatever I am doing wrong as
> > > try as
> > > > > > I might, I can't seem to get the Myricom plugin working if
> > > configured to
> > > > > > aggregate port data.  Zeek starts and then crashes in every case,
> > > > > > regardless of configuration ie
> > > > > >
> > > > > > interface=myricom::3
> > > > > > interface=myricom::*
> > > > > >
> > > > > > and snf_aggregate = T
> > > > > >
> > > > > > Here is related dmesg output logged by kdump
> > > > > >
> > > > > > [67471.838822] BUG: unable to handle kernel paging request at
> > > 00007f0d8459607f
> > > > > > [67471.838863] IP: [<ffffffffc0bed569>] snf_eop_ioctl+0x609/0xc60
> > > [myri_snf]
> > > > > > [67471.838897] PGD 8000000a93bb9067 PUD 12d142c067 PMD 12d142d067
> > > PTE 8000001d54829025
> > > > > > [67471.838927] Oops: 0001 [#1] SMP
> > > > > > [67471.838942] Modules linked in: binfmt_misc macsec tcp_diag
> > > udp_diag inet_diag unix_diag af_packet_diag netlink_diag myri_snf(OE)
> > > mpt2sas raid_class scsi_transport_sas mptctl mptbase ip6t_rpfilter
> > > ipt_REJECT nf_reject_ipv4 nf_log_ipv4 ip6t_REJECT nf_reject_ipv6
> > > nf_log_ipv6 nf_log_common xt_LOG xt_conntrack ip_set nfnetlink ebtable_nat
> > > ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6
> > > nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat
> > > nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack
> > > iptable_mangle iptable_security iptable_raw ebtable_filter ebtables
> > > ip6table_filter ip6_tables iptable_filter dell_rbu sunrpc dcdbas iTCO_wdt
> > > iTCO_vendor_support sb_edac intel_powerclamp coretemp intel_rapl iosf_mbi
> > > kvm_intel kvm irqbypass crc32_pclmul joydev
> > > > > > [67471.839241]  ghash_clmulni_intel aesni_intel lrw gf128mul
> > > glue_helper ablk_helper cryptd mxm_wmi ext4 mbcache jbd2 pcspkr ipmi_ssif
> > > mei_me lpc_ich mei sg ipmi_si ipmi_devintf ipmi_msghandler wmi
> > > acpi_power_meter ip_tables xfs libcrc32c sd_mod crc_t10dif
> > > crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper syscopyarea
> > > sysfillrect sysimgblt fb_sys_fops ttm drm crct10dif_pclmul crct10dif_common
> > > crc32c_intel drm_panel_orientation_quirks ahci libahci dca libata tg3
> > > megaraid_sas ptp pps_core dm_mirror dm_region_hash dm_log dm_mod [last
> > > unloaded: myri10ge]
> > > > > > [67471.839450] CPU: 24 PID: 92952 Comm: bro Kdump: loaded Tainted:
> > > G           OE  ------------   3.10.0-957.10.1.el7.x86_64 #1
> > > > > > [67471.839483] Hardware name: Dell Inc. PowerEdge R730xd/072T6D,
> > > BIOS 2.9.1 12/04/2018
> > > > > > [67471.839508] task: ffff95d0e7c41040 ti: ffff95e3197c0000 task.ti:
> > > ffff95e3197c0000
> > > > > > [67471.839531] RIP: 0010:[<ffffffffc0bed569>]  [<ffffffffc0bed569>]
> > > snf_eop_ioctl+0x609/0xc60 [myri_snf]
> > > > > > [67471.839564] RSP: 0018:ffff95e3197c3d38  EFLAGS: 00010006
> > > > > > [67471.839583] RAX: 0000000000000286 RBX: 0000000000000001 RCX:
> > > 0000000000000000
> > > > > > [67471.839605] RDX: ffff95d0526253d0 RSI: 00007f0d84596000 RDI:
> > > ffffb70f589ba7f8
> > > > > > [67471.839627] RBP: ffff95e3197c3df8 R08: ffffb70f599bb000 R09:
> > > 0000000000000003
> > > > > > [67471.839648] R10: 0000000000000000 R11: 0000000000000000 R12:
> > > ffff95d052625000
> > > > > > [67471.839670] R13: 00007ffeb542d710 R14: 00007ffeb542d710 R15:
> > > 0000000000000000
> > > > > > [67471.839693] FS:  00007f180d6a7900(0000) GS:ffff95eefe900000(0000)
> > > knlGS:0000000000000000
> > > > > > [67471.839717] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > > [67471.839735] CR2: 00007f0d8459607f CR3: 0000001ff663c000 CR4:
> > > 00000000003607e0
> > > > > > [67471.839757] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > > 0000000000000000
> > > > > > [67471.839778] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> > > 0000000000000400
> > > > > > [67471.839800] Call Trace:
> > > > > > [67471.839818]  [<ffffffff98d67ef2>] ? down_read+0x12/0x40
> > > > > > [67471.839840]  [<ffffffffc0bdfba0>] mx_common_ioctl+0x40/0x90
> > > [myri_snf]
> > > > > > [67471.839865]  [<ffffffffc0bd44e2>] mx_ioctl+0x72/0x290 [myri_snf]
> > > > > > [67471.839888]  [<ffffffff98856880>] do_vfs_ioctl+0x3a0/0x5a0
> > > > > > [67471.839908]  [<ffffffff98d70608>] ? __do_page_fault+0x228/0x500
> > > > > > [67471.839928]  [<ffffffff98856b21>] SyS_ioctl+0xa1/0xc0
> > > > > > [67471.839947]  [<ffffffff98d75ddb>] system_call_fastpath+0x22/0x27
> > > > > > [67471.839966] Code: d3 e6 44 85 ce 74 e1 48 83 bf b8 00 00 00 00 75
> > > d1 4c 8b 87 c0 00 00 00 4c 63 d9 41 8b 70 04 48 c1 e6 09 4b 03 b4 dc c0 06
> > > 00 00 <0f> b6 76 7f 41 39 30 75 b4 4c 89 a7 b8 00 00 00 49 89 bc 24 60
> > > > > > [67471.840084] RIP  [<ffffffffc0bed569>] snf_eop_ioctl+0x609/0xc60
> > > [myri_snf]
> > > > > > [67471.840112]  RSP <ffff95e3197c3d38>
> > > > > > [67471.840125] CR2: 00007f0d8459607f
> > > > > >
> > > > > > _______________________________________________
> > > > > > Zeek mailing list
> > > > > > zeek at zeek.org
> > > > > > http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/zeek
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Justin
> > > > _______________________________________________
> > > > Zeek mailing list
> > > > zeek at zeek.org
> > > > http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/zeek
> > >
> > 
> > 
> > -- 
> > Justin
> _______________________________________________
> Zeek mailing list
> zeek at zeek.org
> http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/zeek


More information about the Zeek mailing list