From oho at acm.org Sat Mar 1 00:04:13 2008 From: oho at acm.org (Orion Hodson) Date: Sat, 1 Mar 2008 00:04:13 -0800 Subject: [Xorp-hackers] Why use the home-grown heap.cc? In-Reply-To: <47C902DB.8010209@candelatech.com> References: <47C902DB.8010209@candelatech.com> Message-ID: It was something that an earlier contributor had on the shelf that matched requirements at the time and was AFAIR well tested in previous scenarios. The existing heap allows popping an arbitrary item in the heap by design, which is useful when the heap is a container for timers that may be unscheduled. Nothing is set in stone so if there is a patch to STL-ize it and that patch passes regressions and is demonstrably better then it'd likely be taken up. - Orion On Feb 29, 2008, at 11:16 PM, Ben Greear wrote: > While testing the latest xorp (plus my patches, which certainly > could be > the cause), > I ran into the assert below. Before I dig into this farther, I am > curious if there is a good > reason we are using a home-grown heap class as opposed to something > from > the STL? > > Loaded symbols for /lib/libnss_files.so.2 > Core was generated by `xorp_rtrmgr -p 20002 -b vr_conf/xorp- > vr10002.conf'. > Program terminated with signal 6, Aborted. > #0 0xb7f74410 in __kernel_vsyscall () > (gdb) bt > #0 0xb7f74410 in __kernel_vsyscall () > #1 0x0056f690 in raise () from /lib/libc.so.6 > #2 0x00570f91 in abort () from /lib/libc.so.6 > #3 0x0817f119 in xlog_fatal (module_name=0x820c3b9 "LIBXORP", > where=0xbfbfd068 "heap.cc:171 pop_obj", > fmt=0x820c484 "-- heap_extract, father %d out of bound 0..%d") at > xlog.c:435 > #4 0x081a478c in Heap::pop_obj (this=0x84cb478, obj=0x84cf918) at > heap.cc:170 > #5 0x081a0159 in TimerList::unschedule_node (this=0xbfc010cc, > n=0x84cf918) at timer.cc:592 > #6 0x081a01c8 in TimerNode::unschedule (this=0x84cf918) at timer.cc: > 119 > #7 0x081a032e in ~TimerNode (this=0x84cf918) at timer.cc:74 > #8 0x081a2984 in ~PeriodicTimerNode2 (this=0x84cf918) at timer.cc:188 > #9 0x0819f2b0 in TimerNode::release_ref (this=0x84cf918) at > timer.cc:87 > #10 0x08069322 in ~XorpTimer (this=0x84c9df0) at timer.hh:535 > #11 0x08110cd7 in ~Finder (this=0x84c9d6c) at finder.cc:360 > #12 0x080f83eb in ~FinderServer (this=0x84c9d68) at finder_server.cc: > 82 > #13 0x08067c5e in Rtrmgr::run (this=0xbfc01438) at main_rtrmgr.cc:350 > #14 0x08068353 in main (argc=5, argv=0xbfc01544) at main_rtrmgr.cc:500 > (gdb) frame 4 > #4 0x081a478c in Heap::pop_obj (this=0x84cb478, obj=0x84cf918) at > heap.cc:170 > 170 heap.cc: No such file or directory. > in heap.cc > (gdb) print father > $1 = 3 > (gdb) print _elements > $2 = 1 > (gdb) > > -- > Ben Greear > Candela Technologies Inc http://www.candelatech.com > > > _______________________________________________ > Xorp-hackers mailing list > Xorp-hackers at icir.org > http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/xorp-hackers From bms at incunabulum.net Sun Mar 2 09:01:10 2008 From: bms at incunabulum.net (Bruce M Simpson) Date: Sun, 02 Mar 2008 17:01:10 +0000 Subject: [Xorp-hackers] [Fwd: Re: RFC: Use one socket per interface for receiving packets in the FEA.] Message-ID: <47CADD56.6020204@incunabulum.net> Sorry for the resend noise, Thunderbird keeps assuming I wish to send from another domain as it appears in the headers when replying. -------------- next part -------------- An embedded message was scrubbed... From: "Bruce M. Simpson" Subject: Re: [Xorp-hackers] RFC: Use one socket per interface for receiving packets in the FEA. Date: Sun, 02 Mar 2008 16:59:44 +0000 Size: 4313 Url: http://mailman.ICSI.Berkeley.EDU/pipermail/xorp-hackers/attachments/20080302/cc67a101/attachment.eml From bms at incunabulum.net Sun Mar 2 19:08:13 2008 From: bms at incunabulum.net (Bruce M Simpson) Date: Mon, 03 Mar 2008 03:08:13 +0000 Subject: [Xorp-hackers] RFC: Use one socket per interface for receiving packets in the FEA. In-Reply-To: <47CB00C6.70006@candelatech.com> References: <200801160237.m0G2bt9d006369@possum.icir.org> <47C36818.1080507@candelatech.com> <200802271026.m1RAQ9Nq007826@fruitcake.ICSI.Berkeley.EDU> <47C6F18D.8030200@candelatech.com> <200802281909.m1SJ9U8Y029564@fruitcake.ICSI.Berkeley.EDU> <47C73D0B.7030501@candelatech.com> <200802282318.m1SNIibh019025@fruitcake.ICSI.Berkeley.EDU> <47C8506B.6020607@candelatech.com> <200802291920.m1TJKTok018510@fruitcake.ICSI.Berkeley.EDU> <47C8ABC7.9020104@candelatech.com> <47CADD00.6080409@icsi.berkeley.edu> <47CB00C6.70006@candelatech.com> Message-ID: <47CB6B9D.8070402@incunabulum.net> Ben Greear wrote: > > The new patch is attached. First of all, thanks for the patch. I'm sure we need to implement semantics like this. The patch touches a few places in the xorp-olsr branch where I've converted MFEA related code to use libcomm functions which had to be implemented for OLSR support. BTW, quick question: why do you need to use a XorpFd* instead of a reference? XorpFd's constructor guarantees that it is initialized to an invalid value regardless of the platform it's on, which covers the case where lazy allocation is needed (a quick read suggests you are lazy allocating the fd). I'm sure Pavlin will be in touch soon with more feedback. cheers BMS From greearb at candelatech.com Sun Mar 2 22:28:19 2008 From: greearb at candelatech.com (Ben Greear) Date: Sun, 02 Mar 2008 22:28:19 -0800 Subject: [Xorp-hackers] RFC: Use one socket per interface for receiving packets in the FEA. In-Reply-To: <47CB6B9D.8070402@incunabulum.net> References: <200801160237.m0G2bt9d006369@possum.icir.org> <47C36818.1080507@candelatech.com> <200802271026.m1RAQ9Nq007826@fruitcake.ICSI.Berkeley.EDU> <47C6F18D.8030200@candelatech.com> <200802281909.m1SJ9U8Y029564@fruitcake.ICSI.Berkeley.EDU> <47C73D0B.7030501@candelatech.com> <200802282318.m1SNIibh019025@fruitcake.ICSI.Berkeley.EDU> <47C8506B.6020607@candelatech.com> <200802291920.m1TJKTok018510@fruitcake.ICSI.Berkeley.EDU> <47C8ABC7.9020104@candelatech.com> <47CADD00.6080409@icsi.berkeley.edu> <47CB00C6.70006@candelatech.com> <47CB6B9D.8070402@incunabulum.net> Message-ID: <47CB9A83.5030005@candelatech.com> Bruce M Simpson wrote: > Ben Greear wrote: >> >> The new patch is attached. > > First of all, thanks for the patch. I'm sure we need to implement > semantics like this. > > The patch touches a few places in the xorp-olsr branch where I've > converted MFEA related code to use libcomm functions which had to be > implemented for OLSR support. > > BTW, quick question: why do you need to use a XorpFd* instead of a > reference? I like to be able to return NULL if we cannot find the socket, rather than create one that is invalid. It's also more efficient to pass back pointers than use an object copy (and you can't pass back a reference to some tmp object on the stack, so you can't just pass back some dummy object that is logically equiv to NULL.) But, it's just a matter of taste...I'll not complain if you eventually apply something that uses references instead... > > XorpFd's constructor guarantees that it is initialized to an invalid > value regardless of the platform it's on, which covers the case where > lazy allocation is needed (a quick read suggests you are lazy > allocating the fd). Yes, I am lazy allocating it...but even if we weren't, it's possible for code (buggy or otherwise) to request a socket for a device that just does not currently exist..and I wouldn't want to create a dummy to pass back as reference in that case. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com From bms at incunabulum.net Mon Mar 3 07:46:22 2008 From: bms at incunabulum.net (Bruce M Simpson) Date: Mon, 03 Mar 2008 15:46:22 +0000 Subject: [Xorp-hackers] TTCN-3 and protocol testing Message-ID: <47CC1D4E.3010101@incunabulum.net> Hi, TTCN-3 just popped up on my radar. Now that open source tools are beginning to emerge, it might be worth looking at this, I see INRIA have a RIPng test suite in TTCN-3 at the last link. It has an ITU-T and OSI pedigree, I wonder if it can be tamed for IETF use. Woof. Whilst it's easy to sink into the trap of too much testing, not enough testing is surely a bad thing. http://en.wikipedia.org/wiki/Ttcn http://www.ttcn-3.org/OpenSourceTools.htm http://t3devkit.gforge.inria.fr/doc/userref/ http://www.itu.int/ITU-T/studygroups/com07/ttcn.html http://gforge.inria.fr/frs/?group_id=587 cheers BMS From greearb at candelatech.com Sun Mar 2 11:32:22 2008 From: greearb at candelatech.com (Ben Greear) Date: Sun, 02 Mar 2008 11:32:22 -0800 Subject: [Xorp-hackers] RFC: Use one socket per interface for receiving packets in the FEA. In-Reply-To: <47CADD00.6080409@icsi.berkeley.edu> References: <200801160237.m0G2bt9d006369@possum.icir.org> <47C36818.1080507@candelatech.com> <200802271026.m1RAQ9Nq007826@fruitcake.ICSI.Berkeley.EDU> <47C6F18D.8030200@candelatech.com> <200802281909.m1SJ9U8Y029564@fruitcake.ICSI.Berkeley.EDU> <47C73D0B.7030501@candelatech.com> <200802282318.m1SNIibh019025@fruitcake.ICSI.Berkeley.EDU> <47C8506B.6020607@candelatech.com> <200802291920.m1TJKTok018510@fruitcake.ICSI.Berkeley.EDU> <47C8ABC7.9020104@candelatech.com> <47CADD00.6080409@icsi.berkeley.edu> Message-ID: <47CB00C6.70006@candelatech.com> Bruce M. Simpson wrote: > Ben, > > I've only briefly glimpsed at this patch, but I have a few comments > based on your description of the patch. > > I'll leave it to Pavlin to comment on the MFEA specific scope of the > patch. > > Ben Greear wrote: >> Here's a first attempt at using one rx socket per device, and binding >> to that particular >> device. This keeps us from receiving multicast traffic not destined >> for us when we are running >> multiple instances of xorp on the same system. >> >> This code appears to work, but it does not properly clean up sockets >> when devices >> are un-configured. I'll be working on that next. >> >> This code will be less efficient than the old way if the OS doesn't >> support >> SO_BINDTODEVICE, so I'll also add some code to mimic the old behaviour >> in that case (ie, windows). >> >> There's also some other cruft in there to deal with races around >> removing interfaces >> and removing the OSPF multicast groups. These changes have nothing >> in particular >> to do with per-interface rx sockets. >> >> Suggestions for improvements are welcome. > > Right now, using a socket per interface is actually REQUIRED by > protocols which rely on IP layer broadcasts, i.e. OLSR (which is in > production deployment in places), BCAST (which was only ever > experimental) and the old RIPv1 which didn't use IP layer multicast > (which is more widespread than you'd hope for). > > Pavlin and I have had some discussion about this. He quite rightly > states that using link-scoped multicasts is "the right thing" to do, > unfortunately the way that deployment of these protocols has played > out operationally, they are using all-ones and network broadcasts. > > The xorp_olsr code, which has not yet been committed publicly, opens a > single socket per interface via XRL in a very similar way as xorp_rip > does to handle RIPv1. > > It does this as it's the only consistent way of receiving IP > broadcasts on multiple interfaces in an OS portable way. > > The lack of SO_BINDTODEVICE on a host platform is actually not that > big a deal. > > To be sure, it's a Linux specific hack to deal with the ambiguity we > are presented with by the legacy BSD socket behaviour, that is, > broadcasts are NOT delivered to sockets which are bound in the laddr > tuple by bind(). > > If anything, I speculate that the overall cost of each additional > socket on the host platform is negligible, compared to the cost of > doing dispatch/fan-out in userland for a large number of such sockets. > I don't have hard data, but that is my gut feeling based on exposure. > > In a "more deeply embedded" situation, the possibility exists that you > implement the IP layer in the FEA process anyway, so tight control > over the resource use of the XRL socket APIs exists at the cost of > having to code your own IP. > > I still feel SO_BINDTODEVICE isn't the right way forward for solving > the problems it was introduced to solve, it's a case of immediatist > pragmatism, as it introduces a number of layering violations. I just want to make sure that I can bind a socket to an interface, and only have packets *from* that interface show up on the socket. This means that the app will not receive crap on interfaces that it is not (logically) listening on. The current CVS tree of Xorp receives on all interfaces, and then filters in the fea user-space logic. When using 30 Xorp instances, this is about 900 wasted process wakes per second for bogus packets (assuming a conservative one multicast pkt tx per xorp per second). It scales N^2 as well, so gets much worse as the numbers of Xorp instances increase. One problem with binding to local IPs is that if the IP addr of the interface changes, then you'd have to rebind the socket. With BINDTODEVICE that shouldn't be a problem. Anyway, I've done some more work to make this handle interfaces leaving. I guess it could still fail to register a new socket if nothing tries to bind a multicast addr to it. With OSPF that isn't a problem, but maybe other routing protocols don't use multicast and would need some explicit logic to allow fea to detect addition of new interfaces. The new patch is attached. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com -------------- next part -------------- A non-text attachment was scrubbed... Name: fea2.patch Type: text/x-patch Size: 45678 bytes Desc: not available Url : http://mailman.ICSI.Berkeley.EDU/pipermail/xorp-hackers/attachments/20080302/17bc3709/attachment-0001.bin From greearb at candelatech.com Tue Mar 4 11:32:32 2008 From: greearb at candelatech.com (Ben Greear) Date: Tue, 04 Mar 2008 11:32:32 -0800 Subject: [Xorp-hackers] heavy CPU use for xorp_fea on startup Message-ID: <47CDA3D0.9050804@candelatech.com> I am trying to start 30 xorp instances on a system with around 600 interfaces (vlans and such). My hardware has lots of RAM and a quad-core CPU, but it still takes more than the 10 second keep-alive timeout to get xorp_fea initialized (it seems to be reading large numbers of netlink messages). This causes continual xorp restarts since the timeout fails. I tried throttling so that I only started one xorp per 5 seconds, and it still times out. I'm going to experiment with increasing the keep-alive timer, but I am curious if there are better alternatives. * Maybe don't start keep-alive polling until fea finalizes it's initialization? * Maybe have fea answer keep-alives *while* it's initializing itself? * Optimize fea to only probe info for devices it is configured to care about? Suggestions are welcome. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com From greearb at candelatech.com Tue Mar 4 12:14:56 2008 From: greearb at candelatech.com (Ben Greear) Date: Tue, 04 Mar 2008 12:14:56 -0800 Subject: [Xorp-hackers] Root cause of this timeout message? Message-ID: <47CDADC0.6070907@candelatech.com> Any idea what code actually calls this timeout? I want to see if I can increase that timeout..but not having much luck tracking down the source of this message... [ 2008/03/04 11:45:52 ERROR xorp_rtrmgr:22387 FINDER finder_xrl_queue.hh:85 dispatch_cb ] Sent xrl got response 211 Reply timed out [ 2008/03/04 11:45:52 ERROR xorp_rtrmgr:22387 FINDER finder_xrl_queue.hh:85 dispatch_cb ] Sent xrl got response 211 Reply timed out -- Ben Greear Candela Technologies Inc http://www.candelatech.com From pavlin at ICSI.Berkeley.EDU Tue Mar 4 12:40:30 2008 From: pavlin at ICSI.Berkeley.EDU (Pavlin Radoslavov) Date: Tue, 04 Mar 2008 12:40:30 -0800 Subject: [Xorp-hackers] Limitations for multiple instances of XORP In-Reply-To: <47C593F8.3050807@incunabulum.net> References: <9C8CA610679BA74FBAEF300B18B9F39810A46015@rrc-dte-exs03.dte.telcordia.com> <47C593F8.3050807@incunabulum.net> Message-ID: <200803042040.m24KeUEl006901@fruitcake.ICSI.Berkeley.EDU> > What I can tell you is that the sizable runtime memory footprint is > going to have an effect -- the short answer is, try it and see. You say > nothing about the size of this machine you're running XORP on, which > XORP processes you are running, how you've built/linked them, so > anything here is pure speculation without real data. > > But I've just had some coffee, so I'll wax lyrical. > > ELF lazy symbol binding will probably have a negligible effect on > runtime performance, when executables are first loaded. > > Sure, page sharing will be a factor at the single executable level, but > it's not the same as benchmarking the actual reduction in footprint when > shared libraries are introduced across the board, something I did last > year but haven't published. > > I've done work on reducing this in build engineering land, by rototiling > for shared libraries, something which people don't seem to want to get > involved with ("Help me Obi-Wanken Autotools, you're my only hope") > judging by the burgeoning silence on the topic -- or, perhaps it's more > open source tragedy of the commons, what can we get for nothing this > week/month? Yes, we agree that we want to start using shared libraries. One of the motivation for the autotools refactoring/upgrade in XORP was to make this transition easier. The main obstacle is scheduling the transition and getting the cycles for the actual switch. Thanks, Pavlin > [Cue slapstick humour] > > The lack of progress is understandably so, given that the quality of the > freely available tools has only recently come to the point where doing > it for a moderately sized software project such as XORP, has been > feasible, i.e. Boost.Build. From pavlin at ICSI.Berkeley.EDU Tue Mar 4 15:15:36 2008 From: pavlin at ICSI.Berkeley.EDU (Pavlin Radoslavov) Date: Tue, 04 Mar 2008 15:15:36 -0800 Subject: [Xorp-hackers] Root cause of this timeout message? In-Reply-To: <47CDADC0.6070907@candelatech.com> References: <47CDADC0.6070907@candelatech.com> Message-ID: <200803042315.m24NFaPY007186@fruitcake.ICSI.Berkeley.EDU> Ben Greear wrote: > Any idea what code actually calls this timeout? I want to > see if I can increase that timeout..but not having much luck > tracking down the source of this message... > > [ 2008/03/04 11:45:52 ERROR xorp_rtrmgr:22387 FINDER finder_xrl_queue.hh:85 dispatch_cb ] Sent xrl got response 211 Reply timed out > [ 2008/03/04 11:45:52 ERROR xorp_rtrmgr:22387 FINDER finder_xrl_queue.hh:85 dispatch_cb ] Sent xrl got response 211 Reply timed out Try increasing one or both of the following: * DEFAULT_SENDER_KEEPALIVE_MS inside libxipc/xrl_pf_stcp.cc (current value of 10000ms, i.e., 10s) * RESPONSE_TIMEOUT_MS inside libxipc/finder_messenger.hh (current value of 30000ms, i.e., 30s) No guarantee they are the source of the error, but givem them a try. It will be interesting to see how the values of those two are affected by the large number of XORP instances you are trying to run. Regards, Pavlin From pavlin at ICSI.Berkeley.EDU Tue Mar 4 15:50:45 2008 From: pavlin at ICSI.Berkeley.EDU (Pavlin Radoslavov) Date: Tue, 04 Mar 2008 15:50:45 -0800 Subject: [Xorp-hackers] heavy CPU use for xorp_fea on startup In-Reply-To: <47CDA3D0.9050804@candelatech.com> References: <47CDA3D0.9050804@candelatech.com> Message-ID: <200803042350.m24Nokp7013701@fruitcake.ICSI.Berkeley.EDU> Ben Greear wrote: > I am trying to start 30 xorp instances on a system with > around 600 interfaces (vlans and such). My hardware has lots > of RAM and a quad-core CPU, but it still takes more than the 10 > second keep-alive timeout to get xorp_fea initialized (it seems > to be reading large numbers of netlink messages). This causes > continual xorp restarts since the timeout fails. What kind of netlink messages the FEA sees from the kernel? Are those asynchronous notifications/upcalls about some events or something that the FEA explicitly requested (e.g., to read the set of interfaces and IP addresses). Also, if you run "ip monitor all" in parallel do you see all those netlink messages? > I tried throttling so that I only started one xorp per 5 > seconds, and it still times out. As a starter try tweaking the XRL-related timeouts I suggested in another thread: * DEFAULT_SENDER_KEEPALIVE_MS inside libxipc/xrl_pf_stcp.cc (current value of 10000ms, i.e., 10s) * RESPONSE_TIMEOUT_MS inside libxipc/finder_messenger.hh (current value of 30000ms, i.e., 30s) > I'm going to experiment with increasing the keep-alive timer, > but I am curious if there are better alternatives. > > * Maybe don't start keep-alive polling until fea finalizes it's > initialization? On startup this is what is suppose to happen. However, there are different types of keepalives (some by the underlying XRL mechanism, other by the rtrmgr itself). On top of that, there are things the FEA does before it gets to initializing the XRL mechanism, things during/after (re)configuration, etc. All those events need to be analyzed during heavy load to identify the bottleneck. > * Maybe have fea answer keep-alives *while* it's initializing itself? > > * Optimize fea to only probe info for devices it is configured to > care about? On startup it queries info about all interfaces in the system. This info is needed for various reasons. E.g., if later it is reconfigured and on shutdown it is suppose to restore the original state. Hence, it might be quite complicated to do selective probing (if possible at all) and without further analysis currently it is not clear whether this is the bottleneck. Regards, Pavlin From greearb at candelatech.com Tue Mar 4 16:05:31 2008 From: greearb at candelatech.com (Ben Greear) Date: Tue, 04 Mar 2008 16:05:31 -0800 Subject: [Xorp-hackers] heavy CPU use for xorp_fea on startup In-Reply-To: <200803042350.m24Nokp7013701@fruitcake.ICSI.Berkeley.EDU> References: <47CDA3D0.9050804@candelatech.com> <200803042350.m24Nokp7013701@fruitcake.ICSI.Berkeley.EDU> Message-ID: <47CDE3CB.6030309@candelatech.com> Pavlin Radoslavov wrote: > Ben Greear wrote: > >> I am trying to start 30 xorp instances on a system with >> around 600 interfaces (vlans and such). My hardware has lots >> of RAM and a quad-core CPU, but it still takes more than the 10 >> second keep-alive timeout to get xorp_fea initialized (it seems >> to be reading large numbers of netlink messages). This causes >> continual xorp restarts since the timeout fails. > > What kind of netlink messages the FEA sees from the kernel? Not sure sure...I just did a quick strace and saw lots of netlink socket reads. I'll try to get oprofile configured and see if that gives me any clues about performance hotspots... > On startup it queries info about all interfaces in the system. > This info is needed for various reasons. E.g., if later it is > reconfigured and on shutdown it is suppose to restore the original > state. Does it really need to probe interfaces that it is not configured to use? It seems like it could ignore them until it is asked to use them...or, at least just probe them very minimally. Anyway...this is mostly theoretical as you noted. I'll try to get some solid profile data... Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com From greearb at candelatech.com Tue Mar 4 17:01:24 2008 From: greearb at candelatech.com (Ben Greear) Date: Tue, 04 Mar 2008 17:01:24 -0800 Subject: [Xorp-hackers] heavy CPU use for xorp_fea on startup In-Reply-To: <200803042350.m24Nokp7013701@fruitcake.ICSI.Berkeley.EDU> References: <47CDA3D0.9050804@candelatech.com> <200803042350.m24Nokp7013701@fruitcake.ICSI.Berkeley.EDU> Message-ID: <47CDF0E4.2080205@candelatech.com> Pavlin Radoslavov wrote: > Ben Greear wrote: > >> I am trying to start 30 xorp instances on a system with >> around 600 interfaces (vlans and such). My hardware has lots >> of RAM and a quad-core CPU, but it still takes more than the 10 >> second keep-alive timeout to get xorp_fea initialized (it seems >> to be reading large numbers of netlink messages). This causes >> continual xorp restarts since the timeout fails. > > What kind of netlink messages the FEA sees from the kernel? > Are those asynchronous notifications/upcalls about some events > or something that the FEA explicitly requested (e.g., to read the > set of interfaces and IP addresses). > Also, if you run "ip monitor all" in parallel do you see > all those netlink messages? Looks like at least one problem is that we do a linear lookup when searching for an interface by if-index. IfTreeVif* IfTree::find_vif(uint32_t pif_index) { IfTree::IfMap::iterator iter; // // XXX: Find the first vif that matches the physical index // for (iter = _interfaces.begin(); iter != _interfaces.end(); ++iter) { IfTreeVif* vifp = iter->second.find_vif(pif_index); if (vifp != NULL) return (vifp); } return (NULL); } So, when reading in 600 interfaces, and then 600 addresses, it adds up quickly. I think the first thing I'll attempt is to add a second map mapping the if-index to a pointer to the IFTreeVif object found in the IfTree _interfaces map. Hopefully the hash will be significantly faster than the linear search through the name -> IFTreeVif hash... The complete oprofile report for the xorp_fea binary is here: http://candelatech.com/oss/fea_oprofile.txt.gz Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com From pavlin at ICSI.Berkeley.EDU Tue Mar 4 17:29:54 2008 From: pavlin at ICSI.Berkeley.EDU (Pavlin Radoslavov) Date: Tue, 04 Mar 2008 17:29:54 -0800 Subject: [Xorp-hackers] heavy CPU use for xorp_fea on startup In-Reply-To: <47CDF0E4.2080205@candelatech.com> References: <47CDA3D0.9050804@candelatech.com> <200803042350.m24Nokp7013701@fruitcake.ICSI.Berkeley.EDU> <47CDF0E4.2080205@candelatech.com> Message-ID: <200803050129.m251TsMl002543@fruitcake.ICSI.Berkeley.EDU> > Looks like at least one problem is that we do a linear > lookup when searching for an interface by if-index. Absolutely correct. I am well aware of this issue and my intention was indeed to add an internal map to IfTree similar to what you suggest below. FYI, this method (and few other related methods) were added during a recent FEA refactoring and the initial emphasis was on correctness. Obviously, with the much smaller number of interfaces the rest of us we have to deal with, the performance optimization wasn't an urgent priority. In your implementation you need to be very careful that you capture all places inside IfTree that are related to the new pif_index to vif mapping. Thanks, Pavlin > IfTreeVif* > IfTree::find_vif(uint32_t pif_index) > { > IfTree::IfMap::iterator iter; > > // > // XXX: Find the first vif that matches the physical index > // > for (iter = _interfaces.begin(); iter != _interfaces.end(); ++iter) { > IfTreeVif* vifp = iter->second.find_vif(pif_index); > if (vifp != NULL) > return (vifp); > } > > return (NULL); > } > > > So, when reading in 600 interfaces, and then 600 addresses, > it adds up quickly. > > I think the first thing I'll attempt is to add a second map > mapping the if-index to a pointer to the IFTreeVif object > found in the IfTree _interfaces map. Hopefully the hash will be significantly > faster than the linear search through the name -> IFTreeVif hash... > > The complete oprofile report for the xorp_fea binary is here: > > http://candelatech.com/oss/fea_oprofile.txt.gz > > Thanks, > Ben > > -- > Ben Greear > Candela Technologies Inc http://www.candelatech.com > > _______________________________________________ > Xorp-hackers mailing list > Xorp-hackers at icir.org > http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/xorp-hackers From greearb at candelatech.com Tue Mar 4 17:58:47 2008 From: greearb at candelatech.com (Ben Greear) Date: Tue, 04 Mar 2008 17:58:47 -0800 Subject: [Xorp-hackers] heavy CPU use for xorp_fea on startup In-Reply-To: <200803050129.m251TsMl002543@fruitcake.ICSI.Berkeley.EDU> References: <47CDA3D0.9050804@candelatech.com> <200803042350.m24Nokp7013701@fruitcake.ICSI.Berkeley.EDU> <47CDF0E4.2080205@candelatech.com> <200803050129.m251TsMl002543@fruitcake.ICSI.Berkeley.EDU> Message-ID: <47CDFE57.6070301@candelatech.com> Pavlin Radoslavov wrote: >> Looks like at least one problem is that we do a linear >> lookup when searching for an interface by if-index. >> > > Absolutely correct. I am well aware of this issue > and my intention was indeed to add an internal map to IfTree > similar to what you suggest below. > FYI, this method (and few other related methods) were added during a > recent FEA refactoring and the initial emphasis was on correctness. > Obviously, with the much smaller number of interfaces the rest of us > we have to deal with, the performance optimization wasn't an urgent > priority. > > In your implementation you need to be very careful that you capture > all places inside IfTree that are related to the new pif_index to > vif mapping. > I was thinking on the way home: Maybe just map if-index to if-name. If the mapping lookup fails, do a long slow linear lookup and if the object is found, add it to the if-index -> map. If it succeeds, then lookup the vif by way of the existing hash, double-check the if-index is correct (if not, do a slow lookup). I'll also update the if-index -> if-name hash when a vif is added so at least most of the time we shouldn't have to do a slow lookup. This should keep us from having to worry about stale memory references to vifs that might have been deleted, while giving good scaling lookups. Sound legit? Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com From pavlin at ICSI.Berkeley.EDU Tue Mar 4 18:20:52 2008 From: pavlin at ICSI.Berkeley.EDU (Pavlin Radoslavov) Date: Tue, 04 Mar 2008 18:20:52 -0800 Subject: [Xorp-hackers] heavy CPU use for xorp_fea on startup In-Reply-To: <47CDFE57.6070301@candelatech.com> References: <47CDA3D0.9050804@candelatech.com> <200803042350.m24Nokp7013701@fruitcake.ICSI.Berkeley.EDU> <47CDF0E4.2080205@candelatech.com> <200803050129.m251TsMl002543@fruitcake.ICSI.Berkeley.EDU> <47CDFE57.6070301@candelatech.com> Message-ID: <200803050220.m252Kq3i011978@fruitcake.ICSI.Berkeley.EDU> > > In your implementation you need to be very careful that you capture > > all places inside IfTree that are related to the new pif_index to > > vif mapping. > > > I was thinking on the way home: Maybe just map if-index to if-name. If > the mapping > lookup fails, do a long slow linear lookup and if the object is found, > add it to the if-index -> map. > If it succeeds, then lookup the vif by way of the existing hash, > double-check the if-index is > correct (if not, do a slow lookup). I think this adds lots of complexity. Just a simple if-index to vif-name-pointer should be sufficient. Off the top of my head, you need to consider the following places inside IfTree that will affect the mapping: * Any change to the _vifs (also vifs()) container of any of the interfaces. This will capture add/deletion of a vif. * Any change of pif_index of IfTreeVif. I think this change is captured by IfTreeVif::set_pif_index() but you might want to double-check and look carefully for any direct assignment to IfTreeVif::_pif_index. Note that pif_index of 0 is invalid, so any vif with such index shouldn't be on the map. Thanks, Pavlin > I'll also update the if-index -> if-name hash when a vif is added so at > least most of the > time we shouldn't have to do a slow lookup. > > This should keep us from having to worry about stale memory references to > vifs that might have been deleted, while giving good scaling lookups. > > Sound legit? > > Ben > > -- > Ben Greear > Candela Technologies Inc http://www.candelatech.com > > > _______________________________________________ > Xorp-hackers mailing list > Xorp-hackers at icir.org > http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/xorp-hackers From greearb at candelatech.com Tue Mar 4 18:34:33 2008 From: greearb at candelatech.com (Ben Greear) Date: Tue, 04 Mar 2008 18:34:33 -0800 Subject: [Xorp-hackers] heavy CPU use for xorp_fea on startup In-Reply-To: <200803050220.m252Kq3i011978@fruitcake.ICSI.Berkeley.EDU> References: <47CDA3D0.9050804@candelatech.com> <200803042350.m24Nokp7013701@fruitcake.ICSI.Berkeley.EDU> <47CDF0E4.2080205@candelatech.com> <200803050129.m251TsMl002543@fruitcake.ICSI.Berkeley.EDU> <47CDFE57.6070301@candelatech.com> <200803050220.m252Kq3i011978@fruitcake.ICSI.Berkeley.EDU> Message-ID: <47CE06B9.4070702@candelatech.com> Pavlin Radoslavov wrote: >>> In your implementation you need to be very careful that you capture >>> all places inside IfTree that are related to the new pif_index to >>> vif mapping. >>> >>> >> I was thinking on the way home: Maybe just map if-index to if-name. If >> the mapping >> lookup fails, do a long slow linear lookup and if the object is found, >> add it to the if-index -> map. >> If it succeeds, then lookup the vif by way of the existing hash, >> double-check the if-index is >> correct (if not, do a slow lookup). >> > > I think this adds lots of complexity. > Just a simple if-index to vif-name-pointer should be sufficient. > Off the top of my head, you need to consider the following places > inside IfTree that will affect the mapping: > Maybe I'm too paranoid..but any code can get an if, and then muck with it's vifs. The mapping is external to the if and vif objects, so it would be hard to make sure no one can ever screw up a listing. Also, an ifname can change while the ifindex remains the same, or a new interface with the same name but a different ifindex can be created. Anyway, I'll make a stab at it and then let you review what I come up with. Also, any comments on the socket-per-interface patch? I'd like to work towards getting that accepted, as my cvs diff is getting quite large... Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com From pavlin at ICSI.Berkeley.EDU Tue Mar 4 18:55:15 2008 From: pavlin at ICSI.Berkeley.EDU (Pavlin Radoslavov) Date: Tue, 04 Mar 2008 18:55:15 -0800 Subject: [Xorp-hackers] heavy CPU use for xorp_fea on startup In-Reply-To: <47CE06B9.4070702@candelatech.com> References: <47CDA3D0.9050804@candelatech.com> <200803042350.m24Nokp7013701@fruitcake.ICSI.Berkeley.EDU> <47CDF0E4.2080205@candelatech.com> <200803050129.m251TsMl002543@fruitcake.ICSI.Berkeley.EDU> <47CDFE57.6070301@candelatech.com> <200803050220.m252Kq3i011978@fruitcake.ICSI.Berkeley.EDU> <47CE06B9.4070702@candelatech.com> Message-ID: <200803050255.m252tFEQ017992@fruitcake.ICSI.Berkeley.EDU> Ben Greear wrote: > Pavlin Radoslavov wrote: > >>> In your implementation you need to be very careful that you capture > >>> all places inside IfTree that are related to the new pif_index to > >>> vif mapping. > >>> > >>> > >> I was thinking on the way home: Maybe just map if-index to if-name. If > >> the mapping > >> lookup fails, do a long slow linear lookup and if the object is found, > >> add it to the if-index -> map. > >> If it succeeds, then lookup the vif by way of the existing hash, > >> double-check the if-index is > >> correct (if not, do a slow lookup). > >> > > > > I think this adds lots of complexity. > > Just a simple if-index to vif-name-pointer should be sufficient. > > Off the top of my head, you need to consider the following places > > inside IfTree that will affect the mapping: > > > > Maybe I'm too paranoid..but any code can get an if, and then muck with it's > vifs. The mapping is external to the if and vif objects, so it would be > hard > to make sure no one can ever screw up a listing. I presume we are talking of a map that is internal to IfTree. Any addition/removal of a vif should use the add_vif() and remove_vif() methods so those methods need to take care of updating the internal map as well. Indeed, to be on the safe side, the IfTreeInterface::vifs() method that returns a non-const reference to the vifs map should be removed. This obviously requires refactoring in other parts of the FEA. For the sake of moving things forward we can ignore it for now, but a warning comment should be added to the VifMap& vifs() method. > Also, an ifname can change while the ifindex remains the same, or > a new interface with the same name but a different ifindex can be > created. The ifname and the vifname cannot change, because they are the unique ID of an interface/vif. If they change, this will be translated into delete/add sequence to the the IfTree and that should take care of the ifindex update. The ifindex is also unique per interface/vif so there shouldn't be more than one interface/vif with the same ifindex (modulo ifindex of 0 which is invalid index). > Anyway, I'll make a stab at it and then let you review what I come up > with. > > Also, any comments on the socket-per-interface patch? I'd like to work > towards > getting that accepted, as my cvs diff is getting quite large... One thing at a time :) I have accumulated a number of XORP-related emails from the last few days and some of them require time to deal with. Pavlin From pavlin at ICSI.Berkeley.EDU Tue Mar 4 22:48:02 2008 From: pavlin at ICSI.Berkeley.EDU (Pavlin Radoslavov) Date: Tue, 04 Mar 2008 22:48:02 -0800 Subject: [Xorp-hackers] Potential null pointer dereference. In-Reply-To: <200802271100.m1RAxwLr013924@fruitcake.ICSI.Berkeley.EDU> References: <47C46BCA.90509@candelatech.com> <47C4717A.4060701@candelatech.com> <200802271100.m1RAxwLr013924@fruitcake.ICSI.Berkeley.EDU> Message-ID: <200803050648.m256m2cO000869@fruitcake.ICSI.Berkeley.EDU> > > > While merging my old patch set with the latest xorp tree, I believe > > > I found a potential null pointer dereference. Here is my attempt > > > at fixing it: > > > > > > [greearb at file-server control_socket]$ cvs diff -u netlink_socket_utilities.cc > > > Index: netlink_socket_utilities.cc > > > =================================================================== > > > RCS file: /cvs/xorp/fea/data_plane/control_socket/netlink_socket_utilities.cc,v > > > retrieving revision 1.12 > > > diff -u -r1.12 netlink_socket_utilities.cc > > > --- netlink_socket_utilities.cc 8 Jan 2008 23:30:09 -0000 1.12 > > > +++ netlink_socket_utilities.cc 26 Feb 2008 19:40:36 -0000 > > > @@ -332,9 +332,10 @@ > > > const IfTreeVif* vifp = iftree.find_vif(if_index); > > > if (vifp == NULL) { > > > if (! is_deleted) { > > > - XLOG_FATAL("Could not find interface and vif for index %d", > > > + XLOG_ERROR("Could not find interface and vif for index %d", > > > if_index); > > > } > > > + return XORP_ERROR; > > > } > > > if_name = vifp->ifname(); > > > vif_name = vifp->vifname(); > > > > > > > > > Here's another one: > > [greearb at file-server ifconfig]$ cvs diff -u ifconfig_parse_netlink_socket.cc > > Index: ifconfig_parse_netlink_socket.cc > > =================================================================== > > RCS file: /cvs/xorp/fea/data_plane/ifconfig/ifconfig_parse_netlink_socket.cc,v > > retrieving revision 1.17 > > diff -u -r1.17 ifconfig_parse_netlink_socket.cc > > --- ifconfig_parse_netlink_socket.cc 21 Feb 2008 02:02:33 -0000 1.17 > > +++ ifconfig_parse_netlink_socket.cc 26 Feb 2008 20:06:18 -0000 > > @@ -603,7 +603,8 @@ > > // > > return; > > } > > - XLOG_FATAL("Could not find vif with index %u in IfTree", if_index); > > + XLOG_ERROR("Could not find vif with index %u in IfTree", if_index); > > + return; > > } > > debug_msg("Address event on interface %s vif %s with interface index %u\n", > > vifp->ifname().c_str(), vifp->vifname().c_str(), > > > > Ben, > > Did you see those FATAL/ERROR statements actually triggered when > running XORP? > The reason those XLOG statements are FATAL is to capture bugs that > might be hiding somewhere else. > If you were able to trigger those statements, could you provide > instructions how to reproduce the problem so we can investigate it. Some update on the subject: * The XLOG_FATAL() inside netlink_socket_utilities.cc might happen because of a race condition. E.g., an interface was added and then immediately deleted, but the processing of the addition of the corresponding connected route was delayed. Fixing this race condition requires serious refactoring of the bottom of the FEA. For the time being I am leaving the XLOG_FATAL() in place until there is enough evidence and the issue is understood. * The XLOG_FATAL() inside ifconfig_parse_netlink_socket.cc is more mysterious, because it doesn't appear it should be triggered by a race condition like the previous one. Ben, if you can provide some backtraces from the above two XLOG_FATAL() crashes this might help understanding the problems. BTW, there was an independent bug regarding dereferencing a NULL pointer in the first case, but I fixed that. I also added some additional comments to the code to explain each of the actions. Thanks, Pavlin From pavlin at ICSI.Berkeley.EDU Tue Mar 4 22:55:44 2008 From: pavlin at ICSI.Berkeley.EDU (Pavlin Radoslavov) Date: Tue, 04 Mar 2008 22:55:44 -0800 Subject: [Xorp-hackers] Potential null pointer dereference. In-Reply-To: <47C59018.9080607@incunabulum.net> References: <47C46BCA.90509@candelatech.com> <47C4717A.4060701@candelatech.com> <200802271100.m1RAxwLr013924@fruitcake.ICSI.Berkeley.EDU> <47C59018.9080607@incunabulum.net> Message-ID: <200803050655.m256tj7p002222@fruitcake.ICSI.Berkeley.EDU> Bruce M Simpson wrote: > Pavlin Radoslavov wrote: > > The reason those XLOG statements are FATAL is to capture bugs that > > might be hiding somewhere else. > > If you were able to trigger those statements, could you provide > > instructions how to reproduce the problem so we can investigate it. > > > > +1. > > Whilst Ben's patches are well intentioned, they do not fully address the > issues, and you correctly point out they most likely mask the underlying > issue. > > There is definitely a corner case in the first situation, where vifp may > be NULL and yet be dereferenced when is_deleted is true. This applies to > all netlink socket processing. Yes, the NULL pointer dereferencing was a bug which is now fixed. > In the second situation, it looks like the case where the FEA is told of > a new interface event by Linux, for an interface which it doesn't know > about, this is treated as a fatal error by the FEA. The interface event is addition of a new address to an interface. Obviously, the kernel must first tell the FEA that an interface is added/exists and only then the "new address" event should be send. Hence, it is a mystery for me when/why the XLOG_FATAL() there is triggered. > It looks like this issue is also present in the PF_ROUTE support code. > > Now this reminds me of a situation I saw when testing out the > forthcoming OLSR code, under both FreeBSD and Linux. I haven't recorded > the details as it hasn't been an immediate priority, however it IS a > looming issue. > > Hot swapping an interface seems to have problems -- that is, if I fire > up a full XORP router with an OLSR process, remove its configuration for > an interface, add a new interface to the underlying system, and then > attempt to bring up OLSR on the new interface, the FEA does not > recognise the new interface. > > Looking at the code it appears this is the case. There's a clear need to > be able to add interfaces at runtime to support hot swapping of network > interfaces for both ad-hoc and classic routing protocols. > > Could we be looking at the same underlying issue? > > I was under the impression the FEA could deal with learning about new > interfaces at runtime, this evidence seems to suggest it doesn't and > needs fixing. Yes, the most recent code in CVS is suppose to work with hot-swapped interfaces. If you are hitting some errors/crashes this is probably because of some remaining bugs. Obviously, those bugs will remain in the code if they are not reported. Thanks, Pavlin From pavlin at ICSI.Berkeley.EDU Tue Mar 4 23:05:29 2008 From: pavlin at ICSI.Berkeley.EDU (Pavlin Radoslavov) Date: Tue, 04 Mar 2008 23:05:29 -0800 Subject: [Xorp-hackers] Potential null pointer dereference. In-Reply-To: <20080229012016.GA34549@spritelink.se> References: <47C46BCA.90509@candelatech.com> <47C4717A.4060701@candelatech.com> <200802271100.m1RAxwLr013924@fruitcake.ICSI.Berkeley.EDU> <20080229012016.GA34549@spritelink.se> Message-ID: <200803050705.m2575TxM004212@fruitcake.ICSI.Berkeley.EDU> > > The reason those XLOG statements are FATAL is to capture bugs that > > might be hiding somewhere else. > > If you were able to trigger those statements, could you provide > > instructions how to reproduce the problem so we can investigate it. > > For a true lab scenario this would probably be > the wanted behaviour, but for software that is > supposed to be able to work in the real world it > is probably not. If we just ignore those fatal errors, you could end up with a process that has wrong internal state. So pick your poison: a coredump with state that might be useful for us to locate and fix the problem or a process with probably incorrect internals that might or might not be working properly but is practically impossible to debug. > One of the most interesting parts I've seen with > IOS XR (Ciscos new operating system for the CRS-1) > is the ability to track what all processes are > doing all the time, so if a fault occurs, debug > information is saved at all times, even though the > administrator of the system has not asked of this. What this information looks like: a backtrace-like info after a crash or real time log messages for various events? Thanks, Pavlin > This is very Service Providerisch ;), it'd be nice > to see XORP have something like it. > > -K > > -- > Kristian Larsson KLL-RIPE > Network Engineer & Peering Coordinator SpriteLink [AS39525] > +46 704 910401 kll at spritelink.net > > _______________________________________________ > Xorp-hackers mailing list > Xorp-hackers at icir.org > http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/xorp-hackers From greearb at candelatech.com Tue Mar 4 23:10:04 2008 From: greearb at candelatech.com (Ben Greear) Date: Tue, 04 Mar 2008 23:10:04 -0800 Subject: [Xorp-hackers] Potential null pointer dereference. In-Reply-To: <200803050655.m256tj7p002222@fruitcake.ICSI.Berkeley.EDU> References: <47C46BCA.90509@candelatech.com> <47C4717A.4060701@candelatech.com> <200802271100.m1RAxwLr013924@fruitcake.ICSI.Berkeley.EDU> <47C59018.9080607@incunabulum.net> <200803050655.m256tj7p002222@fruitcake.ICSI.Berkeley.EDU> Message-ID: <47CE474C.8060205@candelatech.com> Pavlin Radoslavov wrote: > Bruce M Simpson wrote: > > >> Pavlin Radoslavov wrote: >> >>> The reason those XLOG statements are FATAL is to capture bugs that >>> might be hiding somewhere else. >>> If you were able to trigger those statements, could you provide >>> instructions how to reproduce the problem so we can investigate it. >>> >>> >> +1. >> >> Whilst Ben's patches are well intentioned, they do not fully address the >> issues, and you correctly point out they most likely mask the underlying >> issue. >> >> There is definitely a corner case in the first situation, where vifp may >> be NULL and yet be dereferenced when is_deleted is true. This applies to >> all netlink socket processing. >> > > Yes, the NULL pointer dereferencing was a bug which is now fixed. > > >> In the second situation, it looks like the case where the FEA is told of >> a new interface event by Linux, for an interface which it doesn't know >> about, this is treated as a fatal error by the FEA. >> > > The interface event is addition of a new address to an interface. > Obviously, the kernel must first tell the FEA that an interface is > added/exists and only then the "new address" event should be send. > Hence, it is a mystery for me when/why the XLOG_FATAL() there is > triggered. It is possible that there really was a bug somewhere..and I hit this assert before that bug was fixed. Even in my code, I keep a trace message there...I'll keep an eye out for that to see if I ever see it again. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com From greearb at candelatech.com Tue Mar 4 23:56:46 2008 From: greearb at candelatech.com (Ben Greear) Date: Tue, 04 Mar 2008 23:56:46 -0800 Subject: [Xorp-hackers] heavy CPU use for xorp_fea on startup In-Reply-To: <200803050255.m252tFEQ017992@fruitcake.ICSI.Berkeley.EDU> References: <47CDA3D0.9050804@candelatech.com> <200803042350.m24Nokp7013701@fruitcake.ICSI.Berkeley.EDU> <47CDF0E4.2080205@candelatech.com> <200803050129.m251TsMl002543@fruitcake.ICSI.Berkeley.EDU> <47CDFE57.6070301@candelatech.com> <200803050220.m252Kq3i011978@fruitcake.ICSI.Berkeley.EDU> <47CE06B9.4070702@candelatech.com> <200803050255.m252tFEQ017992@fruitcake.ICSI.Berkeley.EDU> Message-ID: <47CE523E.6020402@candelatech.com> Pavlin Radoslavov wrote: > Ben Greear wrote: > > >> Pavlin Radoslavov wrote: >> >>>>> In your implementation you need to be very careful that you capture >>>>> all places inside IfTree that are related to the new pif_index to >>>>> vif mapping. >>>>> >>>>> >>>>> >>>> I was thinking on the way home: Maybe just map if-index to if-name. If >>>> the mapping >>>> lookup fails, do a long slow linear lookup and if the object is found, >>>> add it to the if-index -> map. >>>> If it succeeds, then lookup the vif by way of the existing hash, >>>> double-check the if-index is >>>> correct (if not, do a slow lookup). >>>> >>>> >>> I think this adds lots of complexity. >>> Just a simple if-index to vif-name-pointer should be sufficient. >>> Off the top of my head, you need to consider the following places >>> inside IfTree that will affect the mapping: >>> >>> >> Maybe I'm too paranoid..but any code can get an if, and then muck with it's >> vifs. The mapping is external to the if and vif objects, so it would be >> hard >> to make sure no one can ever screw up a listing. >> > > I presume we are talking of a map that is internal to IfTree. > Any addition/removal of a vif should use the add_vif() and > remove_vif() methods so those methods need to take care of updating > the internal map as well. > Indeed, to be on the safe side, the IfTreeInterface::vifs() > method that returns a non-const reference to the vifs map should > be removed. This obviously requires refactoring in other parts > of the FEA. For the sake of moving things forward we can ignore it > for now, but a warning comment should be added to the VifMap& vifs() > method. > > >> Also, an ifname can change while the ifindex remains the same, or >> a new interface with the same name but a different ifindex can be >> created. >> > > The ifname and the vifname cannot change, because they are the > unique ID of an interface/vif. If they change, this will be > translated into delete/add sequence to the the IfTree and that > should take care of the ifindex update. > The ifindex is also unique per interface/vif so there shouldn't be > more than one interface/vif with the same ifindex (modulo ifindex of > 0 which is invalid index). > I need to re-read your email when I'm fresh....but in the meantime, here is a partial diff from my tree with the hashing changes. It compiles, but not tested yet. We can remove the fallbacks to the linear searches if/when we are sure all the boundary cases are resolved. In the meantime, I don't think it will really hurt anything, and there is enough logging to clue us in should we still need to work on the hash... Comments welcome. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com -------------- next part -------------- A non-text attachment was scrubbed... Name: fea_hash.patch Type: text/x-patch Size: 9618 bytes Desc: not available Url : http://mailman.ICSI.Berkeley.EDU/pipermail/xorp-hackers/attachments/20080304/2746e2f5/attachment-0001.bin From greearb at candelatech.com Wed Mar 5 10:15:56 2008 From: greearb at candelatech.com (Ben Greear) Date: Wed, 05 Mar 2008 10:15:56 -0800 Subject: [Xorp-hackers] heavy CPU use for xorp_fea on startup In-Reply-To: <200803050255.m252tFEQ017992@fruitcake.ICSI.Berkeley.EDU> References: <47CDA3D0.9050804@candelatech.com> <200803042350.m24Nokp7013701@fruitcake.ICSI.Berkeley.EDU> <47CDF0E4.2080205@candelatech.com> <200803050129.m251TsMl002543@fruitcake.ICSI.Berkeley.EDU> <47CDFE57.6070301@candelatech.com> <200803050220.m252Kq3i011978@fruitcake.ICSI.Berkeley.EDU> <47CE06B9.4070702@candelatech.com> <200803050255.m252tFEQ017992@fruitcake.ICSI.Berkeley.EDU> Message-ID: <47CEE35C.9060804@candelatech.com> Pavlin Radoslavov wrote: > The ifname and the vifname cannot change, because they are the > unique ID of an interface/vif. If they change, this will be It may never actually happen in practice, but this code makes it look like it might be possible... Either way, since I update the hash here, it shouldn't matter... // From ifconfig_parse_netlink-socket.cc, with my patch applied. // // Set the physical interface index for the interface // if (is_newlink || (if_index != ifp->pif_index())) { ifp->set_pif_index(if_index); iftree.updateIfCache(if_index, if_name); } > translated into delete/add sequence to the the IfTree and that > should take care of the ifindex update. > The ifindex is also unique per interface/vif so there shouldn't be > more than one interface/vif with the same ifindex (modulo ifindex of > 0 which is invalid index). A quick test of my patch this morning shows that it significantly improves performance for my scenario...but I'm regressing to linear searches for some VIF lookups. I think I can simplify my code a bit, but I need to verify some things: 1) Is there ever a case where a vif has a different pif_index than the parent device? If not, I can remove the _vifindexes hash entirely and not worry about add/delete vif (only add_interface, delete_interface), since the lookup methods use pif_index and not the vif_index as far as I can tell. 2) How is the _vif_index used? Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com From imipak at yahoo.com Wed Mar 5 17:12:58 2008 From: imipak at yahoo.com (Jonathan Day) Date: Wed, 5 Mar 2008 17:12:58 -0800 (PST) Subject: [Xorp-hackers] Query regarding IPv6 code Message-ID: <743969.44647.qm@web31503.mail.mud.yahoo.com> I've been trying to compile XORP on a box that doesn't have IPv6 enabled. So, naturally, I told configure to disable IPv6. Turns out that there's a number of places where there are unguarded uses of IPv6 defines and types. These crash when compiling on a system that doesn't have these defined, for obvious reasons. Now, the two questions I'd like to throw out there are: 1) Is this just my box, or is this repeatable? 2) If it is repeatable, is it better to fix this by placing guards (#ifdef) around the IPv6 code, -or- by having a set of dummy IPv6 headers? (If the IPv6 functions are all going to fall out in the wash, dummy headers would mean fewer changes to mainline code, which means less risk of introducing bugs and fewer preprocessor commands. On the other hand, guarding is closer to being what you actually want.) Jonathan ____________________________________________________________________________________ Looking for last minute shopping deals? Find them fast with Yahoo! Search. http://tools.search.yahoo.com/newsearch/category.php?category=shopping From greearb at candelatech.com Wed Mar 5 17:29:13 2008 From: greearb at candelatech.com (Ben Greear) Date: Wed, 05 Mar 2008 17:29:13 -0800 Subject: [Xorp-hackers] heavy CPU use for xorp_fea on startup In-Reply-To: <47CE523E.6020402@candelatech.com> References: <47CDA3D0.9050804@candelatech.com> <200803042350.m24Nokp7013701@fruitcake.ICSI.Berkeley.EDU> <47CDF0E4.2080205@candelatech.com> <200803050129.m251TsMl002543@fruitcake.ICSI.Berkeley.EDU> <47CDFE57.6070301@candelatech.com> <200803050220.m252Kq3i011978@fruitcake.ICSI.Berkeley.EDU> <47CE06B9.4070702@candelatech.com> <200803050255.m252tFEQ017992@fruitcake.ICSI.Berkeley.EDU> <47CE523E.6020402@candelatech.com> Message-ID: <47CF48E9.1090808@candelatech.com> After further debugging, I notice that IfConfig::pull_config() is called several times during startup. Each time this is called, the iftree object is cleared and re-read from netlink (which in my case is a bit costly, even with hashing.) I'm going to leave this alone for now, but it might be something worth trying to optimize in the future. -- Ben Greear Candela Technologies Inc http://www.candelatech.com From pavlin at ICSI.Berkeley.EDU Wed Mar 5 21:28:45 2008 From: pavlin at ICSI.Berkeley.EDU (Pavlin Radoslavov) Date: Wed, 05 Mar 2008 21:28:45 -0800 Subject: [Xorp-hackers] Query regarding IPv6 code In-Reply-To: <743969.44647.qm@web31503.mail.mud.yahoo.com> References: <743969.44647.qm@web31503.mail.mud.yahoo.com> Message-ID: <200803060528.m265Sjau020597@fruitcake.ICSI.Berkeley.EDU> Jonathan Day wrote: > I've been trying to compile XORP on a box that doesn't > have IPv6 enabled. So, naturally, I told configure to > disable IPv6. When you say that the box doesn't have IPv6 enabled, do you mean that the OS has IPv6 support, but it has been explicitly disabled (e.g., by recompiling the kernel), or do you mean that it lacks even the system IPv6 header files (e.g., /usr/include/netinet6/*). If it lacks the header files, then "configure" should be smart enough to automatically exclude IPv6. You should see something like the following when running "configure": checking whether the system has IPv6 stack... no > Turns out that there's a number of places where there > are unguarded uses of IPv6 defines and types. These > crash when compiling on a system that doesn't have > these defined, for obvious reasons. If you see compilation errors because of missing IPv6-related stuff, then this is a bug. Please create a Bugzilla entry with the output from ./configure and the compilation. > Now, the two questions I'd like to throw out there > are: > > 1) Is this just my box, or is this repeatable? What OS are you using? Those days it is difficult to find an OS that doesn't have IPv6, so we don't have the chance to test that the compliation actually succeeds on IPv4-only boxes. > 2) If it is repeatable, is it better to fix this by > placing guards (#ifdef) around the IPv6 code, -or- by > having a set of dummy IPv6 headers? Depends on the situation. Majority of the protocols use the help of "class IPv4, "class IPv6", "class IPvX", but sometimes we use "#ifdef HAVE_IPV6 ... #endif" guards. The "#ifdef HAVE_IPV6" guard is typically used only in the FEA. Usage of IPv6-specific headers or structures might be guarded by "#ifdef HAVE_FOO" where "HAVE_FOO" is tested/defined by "configure". Thanks, Pavlin > (If the IPv6 functions are all going to fall out in > the wash, dummy headers would mean fewer changes to > mainline code, which means less risk of introducing > bugs and fewer preprocessor commands. On the other > hand, guarding is closer to being what you actually > want.) > > Jonathan > > > ____________________________________________________________________________________ > Looking for last minute shopping deals? > Find them fast with Yahoo! Search. http://tools.search.yahoo.com/newsearch/category.php?category=shopping > > _______________________________________________ > Xorp-hackers mailing list > Xorp-hackers at icir.org > http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/xorp-hackers From pavlin at ICSI.Berkeley.EDU Wed Mar 5 22:32:11 2008 From: pavlin at ICSI.Berkeley.EDU (Pavlin Radoslavov) Date: Wed, 05 Mar 2008 22:32:11 -0800 Subject: [Xorp-hackers] RFC: Use one socket per interface for receiving packets in the FEA. In-Reply-To: <47CB9A83.5030005@candelatech.com> References: <200801160237.m0G2bt9d006369@possum.icir.org> <47C36818.1080507@candelatech.com> <200802271026.m1RAQ9Nq007826@fruitcake.ICSI.Berkeley.EDU> <47C6F18D.8030200@candelatech.com> <200802281909.m1SJ9U8Y029564@fruitcake.ICSI.Berkeley.EDU> <47C73D0B.7030501@candelatech.com> <200802282318.m1SNIibh019025@fruitcake.ICSI.Berkeley.EDU> <47C8506B.6020607@candelatech.com> <200802291920.m1TJKTok018510@fruitcake.ICSI.Berkeley.EDU> <47C8ABC7.9020104@candelatech.com> <47CADD00.6080409@icsi.berkeley.edu> <47CB00C6.70006@candelatech.com> <47CB6B9D.8070402@incunabulum.net> <47CB9A83.5030005@candelatech.com> Message-ID: <200803060632.m266WBJa002422@fruitcake.ICSI.Berkeley.EDU> Ben, Now that you have (I presume) a working solution, can you get some numbers about the performance increase you can get with one socket per interface. I agree that once you have a large number of interfaces and large number of virtual XORP instances, the number of unnecessary packet delivery increases as O(V*I), but I still would like to see what the actual CPU savings are. An even more interesting question would be to test those numbers with and without the pif_index->vif mapping optimization. Based on your profiling that indicates that the pif_index search uses lots of CPU, with the pif_index->vif mapping in place, I wouldn't be surprised if the CPU savings from the one socket per interface solution will be reduced. Anyway, for the rest of the email I will assume that the savings are large enough to justify the extra modifications/complexity. It seems that your code will work only if the system supports SO_BINDTODEVICE (i.e., only Linux) which bothers me quite a bit. The alternative (OS-independent) solution would be to open a socket per IP address per interface. The argument for doing something like this is that typically the number of interfaces (both physical and virtual like tunnels) and the number of IP addresses have same order of magnitude (though I'd be interested to hear real-world examples where this is not the case). Another issue I see is with handling the special multicast routing socket (it must have protocol type IGMP) and the handling of the regular IGMP socket for IGMP control traffic. On system like Linux, if you open two IGMP sockets and use one of them as the special multicast routing socket and the other one for regular IGMP control traffic, certain IGMP messages won't arrive on the regular IGMP socket. This is the reason that the MFEA has extra logic for handling the situation so a single IGMP socket is used for both purposes. However, if we have multiple IGMP sockets (one of them for multicast routing purpose and the rest of them using SO_BINDTODEVICE to bind to a specific interface), then I don't know whether we will still have problems with the delivery of IGMP control traffic. This is something that requires careful testing to find the answer. It seems that some of your changes might step over some of Bruce's OLSR related changes, so from this aspect it also requires careful coordination. Said that, I think it will be premature to just take your patch and commit it now, because it will create more problems than it solves. However I don't want those changes to be lost in email. Hence, could you create a Bugzilla entry and add your patch to it so it can be easily located later. Also, please add two versions of your patch: one vs the current tree (i.e., the patch as you sent it to the list), and another one that contains only the socket-related delta, because there are changes in your patch that are for some earlier unrelated issues. Thanks, Pavlin From pavlin at ICSI.Berkeley.EDU Wed Mar 5 22:43:39 2008 From: pavlin at ICSI.Berkeley.EDU (Pavlin Radoslavov) Date: Wed, 05 Mar 2008 22:43:39 -0800 Subject: [Xorp-hackers] heavy CPU use for xorp_fea on startup In-Reply-To: <47CF48E9.1090808@candelatech.com> References: <47CDA3D0.9050804@candelatech.com> <200803042350.m24Nokp7013701@fruitcake.ICSI.Berkeley.EDU> <47CDF0E4.2080205@candelatech.com> <200803050129.m251TsMl002543@fruitcake.ICSI.Berkeley.EDU> <47CDFE57.6070301@candelatech.com> <200803050220.m252Kq3i011978@fruitcake.ICSI.Berkeley.EDU> <47CE06B9.4070702@candelatech.com> <200803050255.m252tFEQ017992@fruitcake.ICSI.Berkeley.EDU> <47CE523E.6020402@candelatech.com> <47CF48E9.1090808@candelatech.com> Message-ID: <200803060643.m266hdCH004608@fruitcake.ICSI.Berkeley.EDU> Ben Greear wrote: > After further debugging, I notice that IfConfig::pull_config() > is called several times during startup. Each time this is > called, the iftree object is cleared and re-read from > netlink (which in my case is a bit costly, even with hashing.) > > I'm going to leave this alone for now, but it might be > something worth trying to optimize in the future. I am aware of that, but please submit a Bugzilla entry so it is not forgotten. Thanks, Pavlin From greearb at candelatech.com Thu Mar 6 09:34:10 2008 From: greearb at candelatech.com (Ben Greear) Date: Thu, 06 Mar 2008 09:34:10 -0800 Subject: [Xorp-hackers] RFC: Use one socket per interface for receiving packets in the FEA. In-Reply-To: <200803060632.m266WBJa002422@fruitcake.ICSI.Berkeley.EDU> References: <200801160237.m0G2bt9d006369@possum.icir.org> <47C36818.1080507@candelatech.com> <200802271026.m1RAQ9Nq007826@fruitcake.ICSI.Berkeley.EDU> <47C6F18D.8030200@candelatech.com> <200802281909.m1SJ9U8Y029564@fruitcake.ICSI.Berkeley.EDU> <47C73D0B.7030501@candelatech.com> <200802282318.m1SNIibh019025@fruitcake.ICSI.Berkeley.EDU> <47C8506B.6020607@candelatech.com> <200802291920.m1TJKTok018510@fruitcake.ICSI.Berkeley.EDU> <47C8ABC7.9020104@candelatech.com> <47CADD00.6080409@icsi.berkeley.edu> <47CB00C6.70006@candelatech.com> <47CB6B9D.8070402@incunabulum.net> <47CB9A83.5030005@candelatech.com> <200803060632.m266WBJa002422@fruitcake.ICSI.Berkeley.EDU> Message-ID: <47D02B12.8060806@candelatech.com> Pavlin Radoslavov wrote: > Ben, > > Now that you have (I presume) a working solution, can you get some > numbers about the performance increase you can get with one socket > per interface. > I agree that once you have a large number of interfaces and large > number of virtual XORP instances, the number of unnecessary packet > delivery increases as O(V*I), but I still would like to see what the > actual CPU savings are. > It's hard to directly quantify, and it's also true that the hash lookup will also directly improve the fea's packet reception logic (since it looks up the vif by index for every received packet). I have been running a 20 node (one xorp per node) scenario and a 30 recently. Before the per-interface socket and hashing fixes, the system load at 'idle' was around 4.00 on my quad-core system with the 20 node scenario. After the hashing fixes and the per-interface socket fixes, the load is about 0.10 on this same system with the larger 30 node scenario. Please note that without the hashing optimization, the 30 node scenario will not even start due to fea taking too long. I don't have numbers for hashing w/out the per-interface socket patch, but if you are really interested, I'll disable my per-interface patch and run some tests. > An even more interesting question would be to test those numbers > with and without the pif_index->vif mapping optimization. > Based on your profiling that indicates that the pif_index search > uses lots of CPU, with the pif_index->vif mapping in place, I > wouldn't be surprised if the CPU savings from the one socket per > interface solution will be reduced. > I am certain you are correct due to the vif lookup in the fea rx packet logic. > > Anyway, for the rest of the email I will assume that the savings are > large enough to justify the extra modifications/complexity. > > It seems that your code will work only if the system supports > SO_BINDTODEVICE (i.e., only Linux) which bothers me quite a bit. > The alternative (OS-independent) solution would be to open a socket > per IP address per interface. The argument for doing something like > this is that typically the number of interfaces (both physical and > virtual like tunnels) and the number of IP addresses have same order > of magnitude (though I'd be interested to hear real-world examples > where this is not the case). > To be honest, I don't know so much how raw IP sockets work. I do know that the SO_BINDTODEVICE works on Linux, but I am not certain if binding to IPs also works. Also, I am not sure how to detect IP changes in the fea so that I can properly re-bind on IP change. Most of the logic should be the same whether using BINDTODEVICE or a local IP binding. We could change the code to not #ifdef on BINDTODEVICE but instead an internal #if USE_PER_IF_SOCKETs and then set that #define locally for testing. A windows user could try binding to IPs and see if that works...but I don't have a testbed with other than Linux systems in it.. > Another issue I see is with handling the special multicast routing > socket (it must have protocol type IGMP) and the handling of the > regular IGMP socket for IGMP control traffic. > On system like Linux, if you open two IGMP sockets and use one of > them as the special multicast routing socket and the other one for > regular IGMP control traffic, certain IGMP messages won't arrive on > the regular IGMP socket. This is the reason that the MFEA has extra > logic for handling the situation so a single IGMP socket is used for > both purposes. > However, if we have multiple IGMP sockets (one of them for multicast > routing purpose and the rest of them using SO_BINDTODEVICE to bind > to a specific interface), then I don't know whether we will still > have problems with the delivery of IGMP control traffic. > This is something that requires careful testing to find the answer. > Ok, I have no idea about this either. I haven't done any multicast routing testing, just OSPF. > It seems that some of your changes might step over some of Bruce's > OLSR related changes, so from this aspect it also requires careful > coordination. > > Said that, I think it will be premature to just take your patch and > commit it now, because it will create more problems than it solves. > > However I don't want those changes to be lost in email. > Hence, could you create a Bugzilla entry and add your patch to it so > it can be easily located later. > Also, please add two versions of your patch: one vs the current tree > (i.e., the patch as you sent it to the list), and another one that > contains only the socket-related delta, because there are changes in > your patch that are for some earlier unrelated issues. > Ok, sounds fair enough.... I'll send these patches in a day or two when I have finished my testing and have teased out a patch for just the per-socket binding. Thanks for the review. Ben > Thanks, > Pavlin > -- Ben Greear Candela Technologies Inc http://www.candelatech.com From pavlin at ICSI.Berkeley.EDU Thu Mar 6 09:56:42 2008 From: pavlin at ICSI.Berkeley.EDU (Pavlin Radoslavov) Date: Thu, 06 Mar 2008 09:56:42 -0800 Subject: [Xorp-hackers] RFC: Use one socket per interface for receiving packets in the FEA. In-Reply-To: <47D02B12.8060806@candelatech.com> References: <200801160237.m0G2bt9d006369@possum.icir.org> <47C36818.1080507@candelatech.com> <200802271026.m1RAQ9Nq007826@fruitcake.ICSI.Berkeley.EDU> <47C6F18D.8030200@candelatech.com> <200802281909.m1SJ9U8Y029564@fruitcake.ICSI.Berkeley.EDU> <47C73D0B.7030501@candelatech.com> <200802282318.m1SNIibh019025@fruitcake.ICSI.Berkeley.EDU> <47C8506B.6020607@candelatech.com> <200802291920.m1TJKTok018510@fruitcake.ICSI.Berkeley.EDU> <47C8ABC7.9020104@candelatech.com> <47CADD00.6080409@icsi.berkeley.edu> <47CB00C6.70006@candelatech.com> <47CB6B9D.8070402@incunabulum.net> <47CB9A83.5030005@candelatech.com> <200803060632.m266WBJa002422@fruitcake.ICSI.Berkeley.EDU> <47D02B12.8060806@candelatech.com> Message-ID: <200803061756.m26Huhbf025142@fruitcake.ICSI.Berkeley.EDU> > I have been running a 20 node (one xorp per node) scenario and a 30 > recently. > Before the per-interface socket and hashing fixes, the system load at > 'idle' was around 4.00 > on my quad-core system with the 20 node scenario. > > After the hashing fixes and the per-interface socket fixes, the load is > about 0.10 on this > same system with the larger 30 node scenario. Please note that without > the hashing optimization, the 30 node > scenario will not even start due to fea taking too long. This is good enough justification that we need at least one of the two mechanisms in place :) > I don't have numbers for hashing w/out the per-interface socket patch, > but if you > are really interested, I'll disable my per-interface patch and run some > tests. Yes please. > Most of the logic should be the same whether using BINDTODEVICE or a local > IP binding. We could change the code to not #ifdef on BINDTODEVICE but > instead > an internal #if USE_PER_IF_SOCKETs and then set that #define locally for > testing. This was my feeling too hence the reason I suggested that you add your patch to Bugzilla. > Ok, sounds fair enough.... I'll send these patches in a day or two when > I have > finished my testing and have teased out a patch for just the per-socket > binding. Great! Thanks, Pavlin From pavlin at ICSI.Berkeley.EDU Fri Mar 7 09:38:37 2008 From: pavlin at ICSI.Berkeley.EDU (Pavlin Radoslavov) Date: Fri, 07 Mar 2008 09:38:37 -0800 Subject: [Xorp-hackers] heavy CPU use for xorp_fea on startup In-Reply-To: <47CEE35C.9060804@candelatech.com> References: <47CDA3D0.9050804@candelatech.com> <200803042350.m24Nokp7013701@fruitcake.ICSI.Berkeley.EDU> <47CDF0E4.2080205@candelatech.com> <200803050129.m251TsMl002543@fruitcake.ICSI.Berkeley.EDU> <47CDFE57.6070301@candelatech.com> <200803050220.m252Kq3i011978@fruitcake.ICSI.Berkeley.EDU> <47CE06B9.4070702@candelatech.com> <200803050255.m252tFEQ017992@fruitcake.ICSI.Berkeley.EDU> <47CEE35C.9060804@candelatech.com> Message-ID: <200803071738.m27Hcb9F008073@fruitcake.ICSI.Berkeley.EDU> Ben Greear wrote: > Pavlin Radoslavov wrote: > > > The ifname and the vifname cannot change, because they are the > > unique ID of an interface/vif. If they change, this will be > > It may never actually happen in practice, but this code makes it look like it > might be possible... Either way, since I update the hash here, it shouldn't > matter... > > // From ifconfig_parse_netlink-socket.cc, with my patch applied. > > // > // Set the physical interface index for the interface > // > if (is_newlink || (if_index != ifp->pif_index())) { > ifp->set_pif_index(if_index); > iftree.updateIfCache(if_index, if_name); > } The interface cache should be internal to IfTree and we don't want to manipulate it outside IfTree. I started working on that, but it turned out to be more complicated than I originally anticipated. Hopefully very soon I will have it finished. > I think I can simplify my code a bit, but I need to verify some things: > > 1) Is there ever a case where a vif has a different pif_index than > the parent device? If not, I can remove the _vifindexes hash entirely > and not worry about add/delete vif (only add_interface, delete_interface), > since the lookup methods use pif_index and not the vif_index as far as > I can tell. Yes. The vlan vifs have their own pif_index. If they are attached to the parent (physical) interface, then the pif_index of the interface and the vif are different. > 2) How is the _vif_index used? It is used by the MFEA to propagate its own indexing scheme to the multicast protocols (IGMP/MLD and PIM-SM). The rest of the FEA (outside MFEA) doesn't need to manipulate it in any way. I will let you know when I am done with the patch and will ask you to test it. Thanks, Pavlin From greearb at candelatech.com Fri Mar 7 10:03:48 2008 From: greearb at candelatech.com (Ben Greear) Date: Fri, 07 Mar 2008 10:03:48 -0800 Subject: [Xorp-hackers] heavy CPU use for xorp_fea on startup In-Reply-To: <200803071738.m27Hcb9F008073@fruitcake.ICSI.Berkeley.EDU> References: <47CDA3D0.9050804@candelatech.com> <200803042350.m24Nokp7013701@fruitcake.ICSI.Berkeley.EDU> <47CDF0E4.2080205@candelatech.com> <200803050129.m251TsMl002543@fruitcake.ICSI.Berkeley.EDU> <47CDFE57.6070301@candelatech.com> <200803050220.m252Kq3i011978@fruitcake.ICSI.Berkeley.EDU> <47CE06B9.4070702@candelatech.com> <200803050255.m252tFEQ017992@fruitcake.ICSI.Berkeley.EDU> <47CEE35C.9060804@candelatech.com> <200803071738.m27Hcb9F008073@fruitcake.ICSI.Berkeley.EDU> Message-ID: <47D18384.3050105@candelatech.com> Pavlin Radoslavov wrote: > Ben Greear wrote: > >> Pavlin Radoslavov wrote: >> >>> The ifname and the vifname cannot change, because they are the >>> unique ID of an interface/vif. If they change, this will be >> It may never actually happen in practice, but this code makes it look like it >> might be possible... Either way, since I update the hash here, it shouldn't >> matter... >> >> // From ifconfig_parse_netlink-socket.cc, with my patch applied. >> >> // >> // Set the physical interface index for the interface >> // >> if (is_newlink || (if_index != ifp->pif_index())) { >> ifp->set_pif_index(if_index); >> iftree.updateIfCache(if_index, if_name); >> } > > The interface cache should be internal to IfTree and we don't want > to manipulate it outside IfTree. The problem for me was that when the iface was added to the tree, it didn't have it's index set. But, with some refactoring, that could be resolved. Also, with my patch, I was not catching whatever code adds ifaces to the tree that the fea rx packet logic searches. I had the linear search to back it up and fix up the hash, but it would of course be best to figure out how those interfaces were being added and update the cache immediately. > I started working on that, but it turned out to be more complicated > than I originally anticipated. > Hopefully very soon I will have it finished. > >> I think I can simplify my code a bit, but I need to verify some things: >> >> 1) Is there ever a case where a vif has a different pif_index than >> the parent device? If not, I can remove the _vifindexes hash entirely >> and not worry about add/delete vif (only add_interface, delete_interface), >> since the lookup methods use pif_index and not the vif_index as far as >> I can tell. > > Yes. The vlan vifs have their own pif_index. If they are attached to > the parent (physical) interface, then the pif_index of the interface > and the vif are different. Ok, you'll probably need two hashes as I had in my original patch: one to search for IFs, and another to search for the VIFs (or, iface by vif-index and then hash the iface lookup of a VIF as well). > I will let you know when I am done with the patch and will ask you > to test it. Sounds good. I am running into OSPF issues where not all of the routers get to 'Full' state. I'm not sure if this is related to my patches or not, and it's a slow slog to debug this...hopefully I'll make some progress today. From what I have been able to test, it seems the hashing is more important for performance than the per-interface sockets, btw. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com From greearb at candelatech.com Fri Mar 7 14:32:22 2008 From: greearb at candelatech.com (Ben Greear) Date: Fri, 07 Mar 2008 14:32:22 -0800 Subject: [Xorp-hackers] Question on fea's use of sendmsg Message-ID: <47D1C276.5000104@candelatech.com> I'm seeing a strange problem where the fea attempts to send a packet, the sendmsg returns a correct positive number, but I don't see the packet on the wire. I do see multicast pkts from this same host, and I have set up an independent udp connection between these two interfaces and traffic flows fine (routing & interfaces seem functional.) I can't find anything obvious in the man pages, so I was wondering if any of you have any ideas for what might be happening? Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com From pavlin at ICSI.Berkeley.EDU Fri Mar 7 15:14:39 2008 From: pavlin at ICSI.Berkeley.EDU (Pavlin Radoslavov) Date: Fri, 07 Mar 2008 15:14:39 -0800 Subject: [Xorp-hackers] Question on fea's use of sendmsg In-Reply-To: <47D1C276.5000104@candelatech.com> References: <47D1C276.5000104@candelatech.com> Message-ID: <200803072314.m27NEdKA019428@fruitcake.ICSI.Berkeley.EDU> Ben Greear wrote: > I'm seeing a strange problem where the fea attempts to send > a packet, the sendmsg returns a correct positive number, > but I don't see the packet on the wire. > > I do see multicast pkts from this same host, and I have set > up an independent udp connection between these two interfaces > and traffic flows fine (routing & interfaces seem functional.) > > I can't find anything obvious in the man pages, so I was wondering > if any of you have any ideas for what might be happening? Are you using SO_BINDTODEVICE to bind the socket to a particular interface? Also, are those unicast or multicast packets? One thing you could do is run tcpdump on all possible interfaces (including the loopback interface), and see if the packets pops up from some unexpected place. It is better if you run tcpdump on the XORP host itself to avoid any side effects. You might also want to watch for some other clues like ARP messages (e.g., if the destination is unicast). Regards, Pavlin From greearb at candelatech.com Fri Mar 7 15:25:04 2008 From: greearb at candelatech.com (Ben Greear) Date: Fri, 07 Mar 2008 15:25:04 -0800 Subject: [Xorp-hackers] Question on fea's use of sendmsg In-Reply-To: <200803072314.m27NEdKA019428@fruitcake.ICSI.Berkeley.EDU> References: <47D1C276.5000104@candelatech.com> <200803072314.m27NEdKA019428@fruitcake.ICSI.Berkeley.EDU> Message-ID: <47D1CED0.1050507@candelatech.com> Pavlin Radoslavov wrote: > Ben Greear wrote: > >> I'm seeing a strange problem where the fea attempts to send >> a packet, the sendmsg returns a correct positive number, >> but I don't see the packet on the wire. >> >> I do see multicast pkts from this same host, and I have set >> up an independent udp connection between these two interfaces >> and traffic flows fine (routing & interfaces seem functional.) >> >> I can't find anything obvious in the man pages, so I was wondering >> if any of you have any ideas for what might be happening? > > Are you using SO_BINDTODEVICE to bind the socket to a particular > interface? Yes. > > Also, are those unicast or multicast packets? multicast seems to work, and *some* unicast work, at least some of the time (to/from the same sockets/processes). But, some unicast fail, and they are typically larger packets, though less than MTU. > One thing you could do is run tcpdump on all possible interfaces > (including the loopback interface), and see if the packets pops up > from some unexpected place. Yep, I should do this...I have only been looking where I expected it to be :) Also, I noticed that the sender socket had a large amount of packets in it's rx buffer because nothing ever reads it. I added code to set it's rx buflen to only 8k, and I'm now adding logic to read & discard those packets in case they are somehow jamming up the system due to consuming too many kernel buffers. > It is better if you run tcpdump on the XORP host itself to avoid any > side effects. > You might also want to watch for some other clues like ARP messages > (e.g., if the destination is unicast). Arp looks fine (request & response) seen in tcpdump, and arp tables look fine. Thanks for the ideas. Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com From routernumber1 at yahoo.co.in Fri Mar 7 20:51:49 2008 From: routernumber1 at yahoo.co.in (Router Switch) Date: Sat, 8 Mar 2008 04:51:49 +0000 (GMT) Subject: [Xorp-hackers] RIP configuration mode commands and operation mode commands Message-ID: <886154.1258.qm@web94008.mail.in2.yahoo.com> Hi All, As we know that in XORP we can start a particular protocol in isolation, provided the xorp_finder should be in running in the background. Now lets consider a situation in which processes finder,fea,rib,policy and RIP is runing in the background i.e without rtrmgr and xorpsh. Now my question is: 1)can we set all the required parameter to configure RIP through linux cli. 2)can we get the value of all these configuration parameter through linux cli. if "yes" please write in details how (which method to call and how). i got some configuration related methods in xrl_target_rip.hh can we call these methods ? will it work? Thanks & Regards Arjun Prasad > --------------------------------- Forgot the famous last words? Access your message archive online. Click here. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.ICSI.Berkeley.EDU/pipermail/xorp-hackers/attachments/20080308/394cef30/attachment.html From bms at incunabulum.net Sat Mar 8 07:16:38 2008 From: bms at incunabulum.net (Bruce M Simpson) Date: Sat, 08 Mar 2008 15:16:38 +0000 Subject: [Xorp-hackers] RIP configuration mode commands and operation mode commands In-Reply-To: <886154.1258.qm@web94008.mail.in2.yahoo.com> References: <886154.1258.qm@web94008.mail.in2.yahoo.com> Message-ID: <47D2ADD6.1060602@incunabulum.net> Router Switch wrote: > Now my question is: > 1)can we set all the required parameter to configure > RIP through linux cli. > 2)can we get the value of all these configuration parameter > through linux cli. You can perform some basic configuration tasks using call_xrl, but because some routing process configuration uses compound data types, you may not be able to fully configure a protocol in some cases. I can't speak for RIP specifically, I suggest you take a look at its XIF file. cheers BMS From pavlin at ICSI.Berkeley.EDU Sat Mar 8 16:25:50 2008 From: pavlin at ICSI.Berkeley.EDU (Pavlin Radoslavov) Date: Sat, 08 Mar 2008 16:25:50 -0800 Subject: [Xorp-hackers] heavy CPU use for xorp_fea on startup In-Reply-To: <47D18384.3050105@candelatech.com> References: <47CDA3D0.9050804@candelatech.com> <200803042350.m24Nokp7013701@fruitcake.ICSI.Berkeley.EDU> <47CDF0E4.2080205@candelatech.com> <200803050129.m251TsMl002543@fruitcake.ICSI.Berkeley.EDU> <47CDFE57.6070301@candelatech.com> <200803050220.m252Kq3i011978@fruitcake.ICSI.Berkeley.EDU> <47CE06B9.4070702@candelatech.com> <200803050255.m252tFEQ017992@fruitcake.ICSI.Berkeley.EDU> <47CEE35C.9060804@candelatech.com> <200803071738.m27Hcb9F008073@fruitcake.ICSI.Berkeley.EDU> <47D18384.3050105@candelatech.com> Message-ID: <200803090025.m290PoMq009468@fruitcake.ICSI.Berkeley.EDU> > From what I have been able to test, it seems the hashing > is more important for performance than the per-interface > sockets, btw. Ben, I just committed the pif_index mapping optimization to CVS. Please give it a try and let me know whether it works for you and whether the speedup is in the ballpark you have seen in your testing so far. Thanks, Pavlin From greearb at candelatech.com Sat Mar 8 18:15:37 2008 From: greearb at candelatech.com (Ben Greear) Date: Sat, 08 Mar 2008 18:15:37 -0800 Subject: [Xorp-hackers] heavy CPU use for xorp_fea on startup In-Reply-To: <200803090025.m290PoMq009468@fruitcake.ICSI.Berkeley.EDU> References: <47CDA3D0.9050804@candelatech.com> <200803042350.m24Nokp7013701@fruitcake.ICSI.Berkeley.EDU> <47CDF0E4.2080205@candelatech.com> <200803050129.m251TsMl002543@fruitcake.ICSI.Berkeley.EDU> <47CDFE57.6070301@candelatech.com> <200803050220.m252Kq3i011978@fruitcake.ICSI.Berkeley.EDU> <47CE06B9.4070702@candelatech.com> <200803050255.m252tFEQ017992@fruitcake.ICSI.Berkeley.EDU> <47CEE35C.9060804@candelatech.com> <200803071738.m27Hcb9F008073@fruitcake.ICSI.Berkeley.EDU> <47D18384.3050105@candelatech.com> <200803090025.m290PoMq009468@fruitcake.ICSI.Berkeley.EDU> Message-ID: <47D34849.1080803@candelatech.com> Pavlin Radoslavov wrote: >> From what I have been able to test, it seems the hashing >> is more important for performance than the per-interface >> sockets, btw. >> > > Ben, > > I just committed the pif_index mapping optimization to CVS. > Please give it a try and let me know whether it works for you and > whether the speedup is in the ballpark you have seen in your testing > so far. > Thanks, will do. Ben > Thanks, > Pavlin > -- Ben Greear Candela Technologies Inc http://www.candelatech.com From pavlin at ICSI.Berkeley.EDU Sun Mar 9 01:12:38 2008 From: pavlin at ICSI.Berkeley.EDU (Pavlin Radoslavov) Date: Sun, 09 Mar 2008 01:12:38 -0800 Subject: [Xorp-hackers] Question on fea's use of sendmsg In-Reply-To: <47D1CED0.1050507@candelatech.com> References: <47D1C276.5000104@candelatech.com> <200803072314.m27NEdKA019428@fruitcake.ICSI.Berkeley.EDU> <47D1CED0.1050507@candelatech.com> Message-ID: <200803090912.m299CciV028057@fruitcake.ICSI.Berkeley.EDU> > Also, I noticed that the sender socket had a large amount of packets > in it's rx buffer because nothing ever reads it. I added code to set > it's rx buflen to only 8k, and I'm now adding logic to read & discard > those packets in case they are somehow jamming up the system due to > consuming too many kernel buffers. FYI, there was a bug in the I/O code in the FEA: the size of the rx buflen was increased instead of the tx buflen for sockets used for transmission. The bug is now fixed in CVS. Regards, Pavlin From greearb at candelatech.com Mon Mar 10 15:01:38 2008 From: greearb at candelatech.com (Ben Greear) Date: Mon, 10 Mar 2008 15:01:38 -0700 Subject: [Xorp-hackers] OSPF problem with keep-alive messages. Message-ID: <47D5AFC2.4000202@candelatech.com> I notice something weird in my 30-node setup. 28 of my routers are up and functioning and in 'FULL' state. One had to be restarted because it had port conflicts with another. Now, the restarted one is not able to sync with it's peers. It seems to me that the problem is that during 'Loading' state, there is too much traffic and it crowds out the hello messages (I am running this over an emulated lower bandwidth and high latency link.) Should we consider an LS Update to be as good as a Hello message when trying to determine the 'Alive' status of a neighbor? Here is a snippet of a tshark dump for one of the interfaces on this router: 0.090164 10.2.3.2 -> 224.0.0.5 OSPF LS Update 0.090211 10.2.3.2 -> 224.0.0.5 OSPF LS Update 0.090261 10.2.3.2 -> 224.0.0.5 OSPF LS Update 0.100697 10.2.3.3 -> 10.2.3.2 OSPF LS Acknowledge 0.109600 10.2.3.3 -> 224.0.0.5 OSPF LS Update 0.136957 10.2.3.3 -> 224.0.0.5 OSPF LS Update 0.140972 10.2.3.3 -> 224.0.0.5 OSPF LS Update 0.143208 10.2.3.3 -> 224.0.0.5 OSPF LS Update 0.179049 10.2.3.3 -> 224.0.0.5 OSPF LS Update 0.182767 10.2.3.3 -> 224.0.0.5 OSPF LS Update 0.216704 10.2.3.3 -> 224.0.0.5 OSPF LS Update 0.229887 10.2.3.2 -> 224.0.0.5 OSPF LS Acknowledge 0.229942 10.2.3.2 -> 224.0.0.5 OSPF LS Update 0.229992 10.2.3.2 -> 224.0.0.5 OSPF LS Update 0.230042 10.2.3.2 -> 224.0.0.5 OSPF LS Update 0.230087 10.2.3.2 -> 224.0.0.5 OSPF LS Update 0.230132 10.2.3.2 -> 224.0.0.5 OSPF LS Acknowledge 0.230175 10.2.3.2 -> 10.2.3.3 OSPF LS Acknowledge 0.230223 10.2.3.2 -> 224.0.0.5 OSPF LS Update 0.230269 10.2.3.2 -> 224.0.0.5 OSPF LS Update Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com From greearb at candelatech.com Mon Mar 10 15:09:12 2008 From: greearb at candelatech.com (Ben Greear) Date: Mon, 10 Mar 2008 15:09:12 -0700 Subject: [Xorp-hackers] Question on fea's use of sendmsg In-Reply-To: <200803090912.m299CciV028057@fruitcake.ICSI.Berkeley.EDU> References: <47D1C276.5000104@candelatech.com> <200803072314.m27NEdKA019428@fruitcake.ICSI.Berkeley.EDU> <47D1CED0.1050507@candelatech.com> <200803090912.m299CciV028057@fruitcake.ICSI.Berkeley.EDU> Message-ID: <47D5B188.30800@candelatech.com> Pavlin Radoslavov wrote: >> Also, I noticed that the sender socket had a large amount of packets >> in it's rx buffer because nothing ever reads it. I added code to set >> it's rx buflen to only 8k, and I'm now adding logic to read & discard >> those packets in case they are somehow jamming up the system due to >> consuming too many kernel buffers. > > FYI, there was a bug in the I/O code in the FEA: the size of the rx > buflen was increased instead of the tx buflen for sockets used for > transmission. The bug is now fixed in CVS. Sounds good. Did you also set the rx side to be very small? I have been setting it to 4000, though something smaller might work as well. Setting it to zero failed for reasons I never pursued. Also, my main problem with losing sendmsg packets is resolved. It was due to some QoS configurations that a companion program was doing. I removed that, and the 30 node scenario *mostly* works now. I will be backing out my debugging hacks & hashing code and merging with your latest CVS tree in a few minutes. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com From greearb at candelatech.com Mon Mar 10 15:15:03 2008 From: greearb at candelatech.com (Ben Greear) Date: Mon, 10 Mar 2008 15:15:03 -0700 Subject: [Xorp-hackers] OSPF assert Message-ID: <47D5B2E7.40906@candelatech.com> Any idea what the root cause of this assert is? (gdb) bt #0 0xb7fbc410 in __kernel_vsyscall () #1 0x0056f690 in raise () from /lib/libc.so.6 #2 0x00570f91 in abort () from /lib/libc.so.6 #3 0x08286a8d in xlog_fatal (module_name=0x82afda5 "OSPF", where=0xbfbcf5e4 "area_router.cc:2706 maxage_reached", fmt=0x82b062f "LSA not in database: %s") at xlog.c:435 #4 0x08156d82 in AreaRouter::maxage_reached (this=0x8397708, lsar=@0xbfbd158c, i=171) at area_router.cc:2706 #5 0x0812c8cf in XorpMemberCallback0B2, ref_ptr, unsigned int>::dispatch (this=0x83b71e0) at ../libxorp/callback_nodebug.hh:895 #6 0x082a31c0 in OneoffTimerNode2::expire (this=0x83cd0d8) at timer.cc:184 #7 0x082a2031 in TimerList::expire_one (this=0xbfbd5580, worst_priority=4) at timer.cc:500 #8 0x082a2194 in TimerList::run (this=0xbfbd5580) at timer.cc:447 #9 0x0828e008 in EventLoop::run (this=0xbfbd557c) at eventloop.cc:83 #10 0x0804c0a6 in main (argv=0xbfbd5b54) at xorp_ospfv2.cc:72 Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com From pavlin at ICSI.Berkeley.EDU Mon Mar 10 17:12:37 2008 From: pavlin at ICSI.Berkeley.EDU (Pavlin Radoslavov) Date: Mon, 10 Mar 2008 17:12:37 -0700 Subject: [Xorp-hackers] Question on fea's use of sendmsg In-Reply-To: <47D5B188.30800@candelatech.com> References: <47D1C276.5000104@candelatech.com> <200803072314.m27NEdKA019428@fruitcake.ICSI.Berkeley.EDU> <47D1CED0.1050507@candelatech.com> <200803090912.m299CciV028057@fruitcake.ICSI.Berkeley.EDU> <47D5B188.30800@candelatech.com> Message-ID: <200803110012.m2B0CbpH015800@fruitcake.ICSI.Berkeley.EDU> > >> Also, I noticed that the sender socket had a large amount of packets > >> in it's rx buffer because nothing ever reads it. I added code to set > >> it's rx buflen to only 8k, and I'm now adding logic to read & discard > >> those packets in case they are somehow jamming up the system due to > >> consuming too many kernel buffers. > > > > FYI, there was a bug in the I/O code in the FEA: the size of the rx > > buflen was increased instead of the tx buflen for sockets used for > > transmission. The bug is now fixed in CVS. > > Sounds good. Did you also set the rx side to be very small? I > have been setting it to 4000, though something smaller might work > as well. Setting it to zero failed for reasons I never pursued. That's odd. What kernel version are you using? A quick check in linux-2.6.19 reveals that if you try to set the rcvbuf size to a very small value the kernel will automatically adjust it up (also confirmed by the socket(7) manual page). E.g., the linux/net/core.c kernel file has the following code: if ((val * 2) < SOCK_MIN_RCVBUF) sk->sk_rcvbuf = SOCK_MIN_RCVBUF; else sk->sk_rcvbuf = val * 2; where SOCK_MIN_RCVBUF is defined as 256. There is similar code for the sndbuf size which is automatically set to be at least 2048. Of course those values are Linux specific, so I'd rather use size 0 for SO_SNDBUF and SO_RCVBUF and let the kernel deal with it. Pavlin From greearb at candelatech.com Mon Mar 10 19:14:01 2008 From: greearb at candelatech.com (Ben Greear) Date: Mon, 10 Mar 2008 19:14:01 -0700 Subject: [Xorp-hackers] heavy CPU use for xorp_fea on startup In-Reply-To: <200803090025.m290PoMq009468@fruitcake.ICSI.Berkeley.EDU> References: <47CDA3D0.9050804@candelatech.com> <200803042350.m24Nokp7013701@fruitcake.ICSI.Berkeley.EDU> <47CDF0E4.2080205@candelatech.com> <200803050129.m251TsMl002543@fruitcake.ICSI.Berkeley.EDU> <47CDFE57.6070301@candelatech.com> <200803050220.m252Kq3i011978@fruitcake.ICSI.Berkeley.EDU> <47CE06B9.4070702@candelatech.com> <200803050255.m252tFEQ017992@fruitcake.ICSI.Berkeley.EDU> <47CEE35C.9060804@candelatech.com> <200803071738.m27Hcb9F008073@fruitcake.ICSI.Berkeley.EDU> <47D18384.3050105@candelatech.com> <200803090025.m290PoMq009468@fruitcake.ICSI.Berkeley.EDU> Message-ID: <47D5EAE9.8050506@candelatech.com> Pavlin Radoslavov wrote: >> From what I have been able to test, it seems the hashing >> is more important for performance than the per-interface >> sockets, btw. >> > > Ben, > > I just committed the pif_index mapping optimization to CVS. > Please give it a try and let me know whether it works for you and > whether the speedup is in the ballpark you have seen in your testing > so far. > This look good. I can't notice any big difference in CPU from my hash implementation, and from a brief look at the new code, it seems you managed a cleaner implementation that I had :) In the last test, all 30 of my virtual routers, with some 600 (virtual) interfaces in the machine total, all came up and negotiated OSPF to the 'Full' State. I need to do some more testing with dynamic changes, but certainly this is progress! Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com From greearb at candelatech.com Tue Mar 11 17:53:08 2008 From: greearb at candelatech.com (Ben Greear) Date: Tue, 11 Mar 2008 17:53:08 -0700 Subject: [Xorp-hackers] Update FEA patch for one-socket per descriptor. Message-ID: <47D72974.4060304@candelatech.com> An updated patch is attached. This patch also includes logic to clean up multicast bindings on interface removal and code to set the rx buffer for the tx-only socket very small. The multicast changes fix a race in interface removal (iface is removed before OSPF notices and removes the multicast binds, and since the iface is gone, fea can no longer run the unbind logic.) This is probably less critical when running one socket per iface, since the entire socket will already be cleaned up. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com -------------- next part -------------- A non-text attachment was scrubbed... Name: fea.patch Type: text/x-patch Size: 48833 bytes Desc: not available Url : http://mailman.ICSI.Berkeley.EDU/pipermail/xorp-hackers/attachments/20080311/f442bbcb/attachment-0001.bin From greearb at candelatech.com Wed Mar 12 10:31:43 2008 From: greearb at candelatech.com (Ben Greear) Date: Wed, 12 Mar 2008 10:31:43 -0700 Subject: [Xorp-hackers] FEA & interfaces Message-ID: <47D8137F.8030705@candelatech.com> I am continuing to try to optimize FEA to perform better when there are lots of interfaces in the system. Currently, it seems fea reads in all interfaces (and does so many times, especially when there is a commit). When one has hundreds or thousands of interfaces in the system, but only wants FEA (that xorp instance as a whole) to manage 15 of them, this becomes a very large amount of overhead. So, a question: Is there any chance fea could be modified somewhat easily to take a list of interfaces from the xorp config and only read/write those interfaces (and ignore all others)? I am going to start poking at this code myself, but if someone has suggestions for places to pay special attention to, please do let me know. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com From greearb at candelatech.com Thu Mar 13 09:28:36 2008 From: greearb at candelatech.com (Ben Greear) Date: Thu, 13 Mar 2008 09:28:36 -0700 Subject: [Xorp-hackers] FEA performance improvements: only 'pull' active interfaces. Message-ID: <47D95634.8020202@candelatech.com> As previously mentioned, I notice that fea pays attention to all interfaces, not just those it's configured to use. To help improve scalability, especially in a virtualized environement, I am attempting the following: 1) Only have the _pulled_config pull information for devices stored in the _local_config tree. This means asking netlink for specific if-index values instead of the entire tree. 2) The netlink observer will ignore anything not in the _local_config, and will remove interfaces from _local_config if it observes them unregistering from the system. 3) When adding an interface (though XRL), the ifconfig object will add it to _local_config, tell the pulled_config to pull it from the system, and if found, will save it in the _original_config as well in case we want to roll back to the original system state. Once added, nothing is removed from the _original_config. 4) There is an XRL method to configure all interfaces from the system. I am hoping this isn't actually needed and can be removed, as it would require reading the entire set of interfaces. I can (re)add code to support this if needed, but maybe it isn't really useful and could be removed? I am only implementing the optimizations for the netlink related portions. The remainder of the iftree-get/set logic will use the current method of reading all interfaces regardless of local config. I believe this will go a long way towards helping fea scale to 1000+ interfaces, but don't have performance numbers or working code quite yet. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com From atanu at ICSI.Berkeley.EDU Thu Mar 13 14:45:07 2008 From: atanu at ICSI.Berkeley.EDU (Atanu Ghosh) Date: Thu, 13 Mar 2008 14:45:07 -0700 Subject: [Xorp-hackers] OSPF problem with keep-alive messages. In-Reply-To: Message from Ben Greear of "Mon, 10 Mar 2008 15:01:38 PDT." <47D5AFC2.4000202@candelatech.com> Message-ID: <18415.1205444707@tigger.icir.org> Hi, For a router to be fully adjacent with another router it must see hello packets from its neighbour. The hello packet contains a list of the neighbours that a router has seen, it is a requirement that a router sees itself when receiving a hello packet to remain fully adjacent. Using a Link State Update to maintain an adjacency would break the protcol, it may cause one router to believe that it is fully adjacent when it isn't. Assuming that all 30 nodes are on the same subnet and the link-type is "broadcast" I don't think that the hello messages should be crowded out, if they are this is a problem. When the router (10.2.3.3) was restarted it should have have identified the DR and BDR and formed full adjacencies with only those two. The number of routers shouldn't make much difference apart from the background hello messages every 10 (default value) seconds from each router. Atanu. >>>>> "Ben" == Ben Greear writes: Ben> I notice something weird in my 30-node setup. Ben> 28 of my routers are up and functioning and in 'FULL' state. Ben> One had to be restarted because it had port conflicts with Ben> another. Now, the restarted one is not able to sync with it's Ben> peers. It seems to me that the problem is that during Ben> 'Loading' state, there is too much traffic and it crowds out Ben> the hello messages (I am running this over an emulated lower Ben> bandwidth and high latency link.) Ben> Should we consider an LS Update to be as good as a Hello message Ben> when trying to determine the 'Alive' status of a neighbor? Ben> Here is a snippet of a tshark dump for one of the interfaces on Ben> this router: Ben> 0.090164 10.2.3.2 -> 224.0.0.5 OSPF LS Update Ben> 0.090211 10.2.3.2 -> 224.0.0.5 OSPF LS Update Ben> 0.090261 10.2.3.2 -> 224.0.0.5 OSPF LS Update Ben> 0.100697 10.2.3.3 -> 10.2.3.2 OSPF LS Acknowledge Ben> 0.109600 10.2.3.3 -> 224.0.0.5 OSPF LS Update Ben> 0.136957 10.2.3.3 -> 224.0.0.5 OSPF LS Update Ben> 0.140972 10.2.3.3 -> 224.0.0.5 OSPF LS Update Ben> 0.143208 10.2.3.3 -> 224.0.0.5 OSPF LS Update Ben> 0.179049 10.2.3.3 -> 224.0.0.5 OSPF LS Update Ben> 0.182767 10.2.3.3 -> 224.0.0.5 OSPF LS Update Ben> 0.216704 10.2.3.3 -> 224.0.0.5 OSPF LS Update Ben> 0.229887 10.2.3.2 -> 224.0.0.5 OSPF LS Acknowledge Ben> 0.229942 10.2.3.2 -> 224.0.0.5 OSPF LS Update Ben> 0.229992 10.2.3.2 -> 224.0.0.5 OSPF LS Update Ben> 0.230042 10.2.3.2 -> 224.0.0.5 OSPF LS Update Ben> 0.230087 10.2.3.2 -> 224.0.0.5 OSPF LS Update Ben> 0.230132 10.2.3.2 -> 224.0.0.5 OSPF LS Acknowledge Ben> 0.230175 10.2.3.2 -> 10.2.3.3 OSPF LS Acknowledge Ben> 0.230223 10.2.3.2 -> 224.0.0.5 OSPF LS Update Ben> 0.230269 10.2.3.2 -> 224.0.0.5 OSPF LS Update Ben> Thanks, Ben> Ben Ben> -- Ben> Ben Greear Ben> Candela Technologies Inc http://www.candelatech.com Ben> _______________________________________________ Ben> Xorp-hackers mailing list Ben> Xorp-hackers at icir.org Ben> http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/xorp-hackers From pavlin at ICSI.Berkeley.EDU Thu Mar 13 14:58:39 2008 From: pavlin at ICSI.Berkeley.EDU (Pavlin Radoslavov) Date: Thu, 13 Mar 2008 14:58:39 -0700 Subject: [Xorp-hackers] FEA performance improvements: only 'pull' active interfaces. In-Reply-To: <47D95634.8020202@candelatech.com> References: <47D95634.8020202@candelatech.com> Message-ID: <200803132158.m2DLwdSH028393@fruitcake.ICSI.Berkeley.EDU> Ben Greear wrote: > As previously mentioned, I notice that fea pays attention to all > interfaces, not just those > it's configured to use. > > To help improve scalability, especially in a virtualized environement, I > am attempting > the following: > > 1) Only have the _pulled_config pull information for devices stored in > the _local_config > tree. This means asking netlink for specific if-index values instead of > the entire tree. The problem with this solution is that it will work only for Netlink on Linux. The (majoriyty of the) other mechanisms for obtaining the network interface information (getifaddrs(3), ioctl(2), syssctl(3), etc) don't allow the granularity for asking only the information for a specific interface. In addition, if you have lots of configured interfaces, it might actually be much more expensive to ask separately for each of them (i.e., to use a system call per interface) instead of using a single system call to obtain the information for all interfaces. > 2) The netlink observer will ignore anything not in the _local_config, > and will remove > interfaces from _local_config if it observes them unregistering from the > system. > > 3) When adding an interface (though XRL), the ifconfig object will add > it to _local_config, > tell the pulled_config to pull it from the system, and if found, will > save it in the _original_config > as well in case we want to roll back to the original system state. Once > added, nothing is removed > from the _original_config. You could keep track in _pulled_config of only those interfaces that are configured in XORP (i.e., those in _local_config). However, this could add more complexity to all the interface-related machinery (IfTree, IfConfig, etc). Also, given the extra search/complexity you need to do to maintain that state it becomes questionable how much performance gain there will be. Last but not least, don't forget that if there is any performance gain it will show only in the case where there are lots of interfaces in the system, but the XORP instance is configured with a very small number of that interfaces. > 4) There is an XRL method to configure all interfaces from the system. > I am hoping this isn't > actually needed and can be removed, as it would require reading the > entire set of interfaces. I > can (re)add code to support this if needed, but maybe it isn't really > useful and could be removed? Currently this method is used by some of the FEA test programs. > I am only implementing the optimizations for the netlink related > portions. The remainder of the iftree-get/set logic > will use the current method of reading all interfaces regardless of > local config. > > I believe this will go a long way towards helping fea scale to 1000+ > interfaces, but don't have performance > numbers or working code quite yet. As usual, I don't want to commit to any optimizations before I see numbers that justify the extra complexity :) Independent from the above, as you have noticed already the FEA calls pull_config() several times so the alternative solution would be to try to reduce that number. There is more than one pull_config() for both technical and historical reasons (see below). On startup the FEA calls pull_config() to get the original interface configuration (in case it needs to restore it on shutdown). This pull_config() would have to stay. However, each IfConfig::commit_transaction() triggers probably four pull_config(): * Once inside IfConfigSet::push_config() right after the interfaces/vifs have been created into the system (e.g., vlans). This pull_config() is needed to read information such as interface index that needs to be used right after that. Getting rid of this might add extra complexity to the IfConfigSet plugin API which should be avoided. This pull_config() might have to stay (or at least it shouldn't be the first one to optimize for). * In the beginning of IfConfig::commit_transaction() there is pull_config() to guarantee that we start the interface reconfiguration with what is currently in the system. This might be removed for systems that have a mechanism/plugin to observe the asynchronous changes to the system interfaces (e.g., BSD and Linux), but would have to stay for systems that don't have it (e.g., Windows). * Inside method IfConfig::push_config() there is another pull_config(). This one is probably redundant when push_config() is called from IfConfig::commit_transaction() and a good candidate for elimination. Of course, before doing so the rest of the code where push_config() is called needs to be examined to see what is the safe thing to do. * Toward the end of IfConfig::commit_transaction() there is another pull_config() right after the interface configuration was pushed. The reason for that is to make sure there is no mismatch with what actually went into the kernel and to aligh the current configuration. Optimizing this one could be tricky, because eliminating it could result in error prone interface configuration. I hope the above make is clear why there is more than one pull_config() and what could be done to eliminate or get around some of them. Regards, Pavlin From pavlin at ICSI.Berkeley.EDU Thu Mar 13 15:06:10 2008 From: pavlin at ICSI.Berkeley.EDU (Pavlin Radoslavov) Date: Thu, 13 Mar 2008 15:06:10 -0700 Subject: [Xorp-hackers] Update FEA patch for one-socket per descriptor. In-Reply-To: <47D72974.4060304@candelatech.com> References: <47D72974.4060304@candelatech.com> Message-ID: <200803132206.m2DM6AsI000094@fruitcake.ICSI.Berkeley.EDU> Ben Greear wrote: > An updated patch is attached. This patch also includes > logic to clean up multicast bindings on interface removal > and code to set the rx buffer for the tx-only socket very > small. Now that we have the pif_index optimized search in place, what are the performance numbers with and without the "one socket per interface" patch? > The multicast changes fix a race in interface removal > (iface is removed before OSPF notices and removes the > multicast binds, and since the iface is gone, fea can no > longer run the unbind logic.) This is probably less > critical when running one socket per iface, since the > entire socket will already be cleaned up. Is this a race that is in the vanilla FEA? If yes, what is the sequence of events/commands we can use to reproduce it? In any case, as discussed previously, please create a Bugzilla entry and add your patch to it. There will be more refactoring in the FEA, so during that refactoring your patch can be considered as well. Thanks, Pavlin From greearb at candelatech.com Fri Mar 14 00:51:05 2008 From: greearb at candelatech.com (Ben Greear) Date: Fri, 14 Mar 2008 00:51:05 -0700 Subject: [Xorp-hackers] Update FEA patch for one-socket per descriptor. In-Reply-To: <200803132206.m2DM6AsI000094@fruitcake.ICSI.Berkeley.EDU> References: <47D72974.4060304@candelatech.com> <200803132206.m2DM6AsI000094@fruitcake.ICSI.Berkeley.EDU> Message-ID: <47DA2E69.7000307@candelatech.com> Pavlin Radoslavov wrote: > Ben Greear wrote: > >> An updated patch is attached. This patch also includes >> logic to clean up multicast bindings on interface removal >> and code to set the rx buffer for the tx-only socket very >> small. > > Now that we have the pif_index optimized search in place, what are > the performance numbers with and without the "one socket per > interface" patch? I'll try to run these sometime soon. >> The multicast changes fix a race in interface removal >> (iface is removed before OSPF notices and removes the >> multicast binds, and since the iface is gone, fea can no >> longer run the unbind logic.) This is probably less >> critical when running one socket per iface, since the >> entire socket will already be cleaned up. > > Is this a race that is in the vanilla FEA? > If yes, what is the sequence of events/commands we can use to > reproduce it? Yes. I think all you have to do is remove an interface that was previously configured via xorpsh from the interface cfg, commit, then remove it from ospf config. When you remove it from ospf, it tries to unregister the multicast addrs, but fea has already deleted the iface, so it cannot figure out how to unregister. We discussed this months ago..I'll try to find those emails, as the above scenario is just from memory.... > In any case, as discussed previously, please create a Bugzilla entry > and add your patch to it. > There will be more refactoring in the FEA, so during that > refactoring your patch can be considered as well. Sounds good. Thanks, Ben > > Thanks, > Pavlin -- Ben Greear Candela Technologies Inc http://www.candelatech.com From greearb at candelatech.com Fri Mar 14 01:03:38 2008 From: greearb at candelatech.com (Ben Greear) Date: Fri, 14 Mar 2008 01:03:38 -0700 Subject: [Xorp-hackers] FEA performance improvements: only 'pull' active interfaces. In-Reply-To: <200803132158.m2DLwdSH028393@fruitcake.ICSI.Berkeley.EDU> References: <47D95634.8020202@candelatech.com> <200803132158.m2DLwdSH028393@fruitcake.ICSI.Berkeley.EDU> Message-ID: <47DA315A.9050208@candelatech.com> Pavlin Radoslavov wrote: > Ben Greear wrote: > >> As previously mentioned, I notice that fea pays attention to all >> interfaces, not just those >> it's configured to use. >> >> To help improve scalability, especially in a virtualized environement, I >> am attempting >> the following: >> >> 1) Only have the _pulled_config pull information for devices stored in >> the _local_config >> tree. This means asking netlink for specific if-index values instead of >> the entire tree. > > The problem with this solution is that it will work only for Netlink > on Linux. The (majoriyty of the) other mechanisms for obtaining the > network interface information (getifaddrs(3), ioctl(2), syssctl(3), > etc) don't allow the granularity for asking only the information for > a specific interface. > > In addition, if you have lots of configured interfaces, it might > actually be much more expensive to ask separately for each of them > (i.e., to use a system call per interface) instead of using a single > system call to obtain the information for all interfaces. Yes, this may only be useful for my scenario where I'm using a small number of interfaces per xorp instance, with large numbers of total interfaces. Only linux can virtualize routing tables, as far as I know, so this performance gain is only really important on Linux. >> 2) The netlink observer will ignore anything not in the _local_config, >> and will remove >> interfaces from _local_config if it observes them unregistering from the >> system. >> >> 3) When adding an interface (though XRL), the ifconfig object will add >> it to _local_config, >> tell the pulled_config to pull it from the system, and if found, will >> save it in the _original_config >> as well in case we want to roll back to the original system state. Once >> added, nothing is removed >> from the _original_config. > > You could keep track in _pulled_config of only those interfaces that > are configured in XORP (i.e., those in _local_config). > However, this could add more complexity to all the interface-related > machinery (IfTree, IfConfig, etc). > Also, given the extra search/complexity you need to do to maintain > that state it becomes questionable how much performance gain there > will be. > > Last but not least, don't forget that if there is any performance > gain it will show only in the case where there are lots of > interfaces in the system, but the XORP instance is configured with a > very small number of that interfaces. My assumption is that anything not in local_config can be ignored in the pull_config logic, and anything that searches for a device not in local config should not expect to find it. There will be a bit of special cases for adding new devices to the local_config, but not much else I think. So far, it seems the code complexity will not be great, it's mostly lots of small boring API changes. But, it's not working yet, so it may get more complex... > >> 4) There is an XRL method to configure all interfaces from the system. >> I am hoping this isn't >> actually needed and can be removed, as it would require reading the >> entire set of interfaces. I >> can (re)add code to support this if needed, but maybe it isn't really >> useful and could be removed? > > Currently this method is used by some of the FEA test programs. Ok, I'll (re)add it...it's not difficult since that is the default behaviour of the existing pull_config logic. > >> I am only implementing the optimizations for the netlink related >> portions. The remainder of the iftree-get/set logic >> will use the current method of reading all interfaces regardless of >> local config. >> >> I believe this will go a long way towards helping fea scale to 1000+ >> interfaces, but don't have performance >> numbers or working code quite yet. > > As usual, I don't want to commit to any optimizations before I see > numbers that justify the extra complexity :) > > > Independent from the above, as you have noticed already the FEA > calls pull_config() several times so the alternative solution would > be to try to reduce that number. There is more than one > pull_config() for both technical and historical reasons (see below). My problem is that each xorp is only interested in ~15 interfaces, but I have 30+ xorp instances (and would like 100+). This means that decreasing pull_config calls by a small number will not be nearly as important making the method scale linearly with *configured* devices as opposed to existing devices. I did look at the code you describe, and didn't immediately get any ideas for being able to remove calls. But, I'll look at all of that again when I get the logic to pull only configured devices functioning... Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com From pavlin at ICSI.Berkeley.EDU Fri Mar 14 08:31:42 2008 From: pavlin at ICSI.Berkeley.EDU (Pavlin Radoslavov) Date: Fri, 14 Mar 2008 08:31:42 -0700 Subject: [Xorp-hackers] Update FEA patch for one-socket per descriptor. In-Reply-To: <47DA2E69.7000307@candelatech.com> References: <47D72974.4060304@candelatech.com> <200803132206.m2DM6AsI000094@fruitcake.ICSI.Berkeley.EDU> <47DA2E69.7000307@candelatech.com> Message-ID: <200803141531.m2EFVhqQ002251@fruitcake.ICSI.Berkeley.EDU> > >> The multicast changes fix a race in interface removal > >> (iface is removed before OSPF notices and removes the > >> multicast binds, and since the iface is gone, fea can no > >> longer run the unbind logic.) This is probably less > >> critical when running one socket per iface, since the > >> entire socket will already be cleaned up. > > > > Is this a race that is in the vanilla FEA? > > If yes, what is the sequence of events/commands we can use to > > reproduce it? > > Yes. I think all you have to do is remove an interface > that was previously configured via xorpsh from the interface > cfg, commit, then remove it from ospf config. When > you remove it from ospf, it tries to unregister the multicast > addrs, but fea has already deleted the iface, so it cannot > figure out how to unregister. > > We discussed this months ago..I'll try to find those emails, > as the above scenario is just from memory.... No need to search for those emails. From your description above I understand the issue. It is something we are aware of and is not specific to OSPF. The current work-around is that if you want to delete an interface, you need to do it first at the protocol level, commit, and then do it at the interface level. Anyway, please add the description for the above issue to the Bugzilla entry you are going to create. Thanks, Pavlin > > In any case, as discussed previously, please create a Bugzilla entry > > and add your patch to it. > > There will be more refactoring in the FEA, so during that > > refactoring your patch can be considered as well. > > Sounds good. > > Thanks, > Ben > > > > > Thanks, > > Pavlin > > > -- > Ben Greear > Candela Technologies Inc http://www.candelatech.com > > _______________________________________________ > Xorp-hackers mailing list > Xorp-hackers at icir.org > http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/xorp-hackers From imipak at yahoo.com Fri Mar 14 13:25:15 2008 From: imipak at yahoo.com (Jonathan Day) Date: Fri, 14 Mar 2008 13:25:15 -0700 (PDT) Subject: [Xorp-hackers] Trivial fix Message-ID: <689692.69944.qm@web31513.mail.mud.yahoo.com> Hi, I'll get the Windows patches sorted out as a single unit. For now, however, there's a trivial fix needed in strptime.c. It includes strings.h automatically, whether or not configure found it. This needs to be replaced with the following: #ifdef HAVE_STRINGS_H #include #else #include #endif There seem to be some issues with POSIX vs. ISO C99 calls vs. secure alternatives to standard functions, but I'm still trying to figure out the "best" solution to this as this would touch a lot of source. I need to come up with a solution, as Windows complains bitterly about older calls, but it would be better if any version submitted into the main source tree was agreed on as the (nominally) best solution. Jonathan Day Jonathan ____________________________________________________________________________________ Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ From pavlin at ICSI.Berkeley.EDU Fri Mar 14 22:20:28 2008 From: pavlin at ICSI.Berkeley.EDU (Pavlin Radoslavov) Date: Fri, 14 Mar 2008 22:20:28 -0700 Subject: [Xorp-hackers] Trivial fix In-Reply-To: <689692.69944.qm@web31513.mail.mud.yahoo.com> References: <689692.69944.qm@web31513.mail.mud.yahoo.com> Message-ID: <200803150520.m2F5KSPD008869@fruitcake.ICSI.Berkeley.EDU> Jonathan Day wrote: > Hi, > > I'll get the Windows patches sorted out as a single > unit. For now, however, there's a trivial fix needed > in strptime.c. It includes strings.h automatically, > whether or not configure found it. This needs to be > replaced with the following: > > #ifdef HAVE_STRINGS_H > #include > #else > #include > #endif What OS (and OS version) are you using? If it is Windows, currently XORP works on only few Windows versions, and even then you need to install more things (see file BUILD_NOTES, Section 3.7). About the above patch, unfortunately it won't work the way it is, because the story with the include files in strptime.c in particular is slightly complicated (see the comments where _XOPEN_SOURCE is defined. Also, note that "config.h" is included after and therefore the "#ifdef HAVE_STRINGS_H" statement won't matter. The correct solution would be to try to move #include "config.h" before and then use something like: #ifdef HAVE_STRINGS_H #include #endif #ifdef HAVE_STRING_H #include #endif Unfortunately, given the fragile situation with the header file inclusion in strptime.c, doing even something like this needs to be carefully tested that it doesn't break the compilation on all platforms supported currently by XORP. Thanks, Pavlin > There seem to be some issues with POSIX vs. ISO C99 > calls vs. secure alternatives to standard functions, > but I'm still trying to figure out the "best" solution > to this as this would touch a lot of source. I need to > come up with a solution, as Windows complains bitterly > about older calls, but it would be better if any > version submitted into the main source tree was agreed > on as the (nominally) best solution. > > Jonathan Day > > Jonathan > > > ____________________________________________________________________________________ > Be a better friend, newshound, and > know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ > > _______________________________________________ > Xorp-hackers mailing list > Xorp-hackers at icir.org > http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/xorp-hackers From greearb at candelatech.com Mon Mar 17 12:09:16 2008 From: greearb at candelatech.com (Ben Greear) Date: Mon, 17 Mar 2008 12:09:16 -0700 Subject: [Xorp-hackers] Question on netlink_socket_utilities Message-ID: <47DEC1DC.5030607@candelatech.com> In the method FibConfigEntryGetNetlinkSocket::parse_buffer_netlink_socket(...) there is a big for loop, but most of the case statements seem to break out or return, especially the case RTM_NEWROUTE: case RTM_DELROUTE: case RTM_GETROUTE: Is this on purpose, or should the line below be changed to not return out of the method? return (NlmUtils::nlm_get_to_fte_cfg(iftree, fte, nlh, rtmsg, rta_len)); Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com From greearb at candelatech.com Mon Mar 17 15:35:04 2008 From: greearb at candelatech.com (Ben Greear) Date: Mon, 17 Mar 2008 15:35:04 -0700 Subject: [Xorp-hackers] FEA: Difference in _live_config v/s _pulled_config Message-ID: <47DEF218.3040200@candelatech.com> It seems to me that these two have significant overlap in meaning and perhaps could be consolidated? Why do we have both of them? Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com From greearb at candelatech.com Mon Mar 17 17:01:38 2008 From: greearb at candelatech.com (Ben Greear) Date: Mon, 17 Mar 2008 17:01:38 -0700 Subject: [Xorp-hackers] FEA performance improvements: only 'pull' active interfaces. In-Reply-To: <47DA315A.9050208@candelatech.com> References: <47D95634.8020202@candelatech.com> <200803132158.m2DLwdSH028393@fruitcake.ICSI.Berkeley.EDU> <47DA315A.9050208@candelatech.com> Message-ID: <47DF0662.2070300@candelatech.com> I have completed the first pass at my attempt to speed up FEA's handling of interfaces. Basically, I removed live_config entirely, using pulled_config in it's place. For pulled_config, I only pull info about configured interfaces, and for the observer, I ignore anything not in the configured interfaces. Interfaces are added to the various iftrees when they are configured by the user. In doing this, I hacked on a bunch of the xrl handler classes. Mostly cosmetic, but it makes the patch quite large. I tried to remove or minimize direct access to the ifconfig's iftree objects, but some seem necessary and remain. The patch also still has a lot of debugging code in it. But, it does appear to work. I tested my 30-node scenario with ~600 interfaces (about 10-15 of them associated with each xorp instance, and others not in any xorp). With my patch, the load still goes to around 30 when I'm heavily modifying interfaces in the xorps, but the time to make a xorpsh commit was a max of about 6 seconds and the system was generally responsive. fea also starts up quicker since it doesn't have to read all the interfaces in on startup. Without this latest addition, the system load went to 30, and xorpsh commits were taking 90+ seconds. So, it's definitely a winner for xorp on linux in my scenario. It's likely that if netlink is not used, or if there are few interfaces, or if there are lots of interfaces and xorp is using all of them, then there will NOT be a lot of gain from my patch (maybe even slightly worse performance since I'm not batch-reading all interfaces with netlink now.) Its also likely I broke the compile on some systems, as some of the netlink code was still using live_config(), but evidently was #ifdef'd out on Linux since it compiled fine for me. Systems that don't use netlink shouldn't be affected much either way, I think, but I've no way to really test them. The patch is too big for the mailing list, but can be downloaded from here: http://www.candelatech.com/oss/fea_iftree.patch Comments welcome. If there is something I can do or change to give this more of a chance of being accepted, please let me know. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com From greearb at candelatech.com Tue Mar 18 15:00:29 2008 From: greearb at candelatech.com (Ben Greear) Date: Tue, 18 Mar 2008 15:00:29 -0700 Subject: [Xorp-hackers] FEA performance improvements: only 'pull' active interfaces. In-Reply-To: <47DF0662.2070300@candelatech.com> References: <47D95634.8020202@candelatech.com> <200803132158.m2DLwdSH028393@fruitcake.ICSI.Berkeley.EDU> <47DA315A.9050208@candelatech.com> <47DF0662.2070300@candelatech.com> Message-ID: <47E03B7D.1090509@candelatech.com> Ben Greear wrote: > I have completed the first pass at my attempt to speed up > FEA's handling of interfaces. I found a bug in my socket-per-iface logic: I was not properly detecting the removal of interfaces. To fix, I added the ability to listen to certain iftree events, and made the io_ip_socket register as a listener. This appears to work much better when interfaces disappear and re-appear. The new consolidated fea patch is here: http://www.candelatech.com/oss/fea-08-03-18.patch Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com From greearb at candelatech.com Wed Mar 19 09:31:19 2008 From: greearb at candelatech.com (Ben Greear) Date: Wed, 19 Mar 2008 09:31:19 -0700 Subject: [Xorp-hackers] Doubt on VLAN implementation Message-ID: <47E13FD7.6030207@candelatech.com> I've been poking at the VLAN code in FEA. The part that probes the linux kernel seems a bit strange. First, if I read it correctly, it could end up with a iface:vif pair for the VLAN, and also a parent_iface:vif pairing. Second, is it possible to add virtual IPs on top of a VLAN (on top of an ethernet)? It would seem not if the VLAN device is a vif instead of an interface. Even if that works, Linux (2.6.23+) supports mac-vlans on top of ethernet and vlans on top of mac-vlans (and vice-versa), and virtual ips on top of any of these. That doesn't easily map to a single parent-child relationship. My suggestion is to make VLANs (and all other net-devices) the same as 'real' ethernet interfaces, but store parameters in the iface to allow it to rebuild the virtual devices if needed (parent-dev-name and VID for VLANs, parent-dev-name and MAC for mac-vlans, and so forth. This would require adding some new tags to the interface config logic, including device type (vlan, mac-vlan, etc), vlan-id, MAC-addr, and maybe others for new virtual devices. Please also note that with the advent of network namespaces in Linux, it may be possible to have VLANs with no visible parent device (it being in a different namespace). Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com From greearb at candelatech.com Wed Mar 19 11:59:10 2008 From: greearb at candelatech.com (Ben Greear) Date: Wed, 19 Mar 2008 11:59:10 -0700 Subject: [Xorp-hackers] Question of FEA strace Message-ID: <47E1627E.3060308@candelatech.com> Any idea what is causing those writev messages? It seems they are not passing much information, but doing so very often... Having a hard time grepping through the code to see the originator... 11:32:24.043438 writev(31, [{"Finder 0.2\nMsgType r\nSeqNo 5825\nMsgData 100 / \n", 47}], 1) = 47 11:32:24.043570 rt_sigaction(SIGPIPE, {SIG_DFL}, {SIG_IGN}, 8) = 0 11:32:24.043644 rt_sigaction(SIGPIPE, {SIG_IGN}, {SIG_DFL}, 8) = 0 11:32:24.043705 writev(32, [{"Finder 0.2\nMsgType r\nSeqNo 5828\nMsgData 100 / \n", 47}], 1) = 47 11:32:24.043843 rt_sigaction(SIGPIPE, {SIG_DFL}, {SIG_IGN}, 8) = 0 11:32:24.043910 clock_gettime(CLOCK_MONOTONIC, {768671, 208128378}) = 0 11:32:24.043957 clock_gettime(CLOCK_MONOTONIC, {768671, 208173677}) = 0 11:32:24.044004 select(68, [15 16 17 26 27 28 29 30 31 32 33 34 35 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 57 58 59 60 61 62 63 64 65 66 67], [28 30], [], {1, 397070}) = 2 (out [28 30], left {1, 397070}) 11:32:24.044102 clock_gettime(CLOCK_MONOTONIC, {768671, 208318973}) = 0 11:32:24.044250 rt_sigaction(SIGPIPE, {SIG_IGN}, {SIG_DFL}, 8) = 0 11:32:24.044400 writev(28, [{"Finder 0.2\nMsgType r\nSeqNo 5835\nMsgData 100 / \n", 47}], 1) = 47 11:32:24.044514 rt_sigaction(SIGPIPE, {SIG_DFL}, {SIG_IGN}, 8) = 0 11:32:24.044641 rt_sigaction(SIGPIPE, {SIG_IGN}, {SIG_DFL}, 8) = 0 11:32:24.044722 writev(30, [{"Finder 0.2\nMsgType r\nSeqNo 5832\nMsgData 100 / \n", 47}], 1) = 47 11:32:24.044864 rt_sigaction(SIGPIPE, {SIG_DFL}, {SIG_IGN}, 8) = 0 11:32:24.044937 clock_gettime(CLOCK_MONOTONIC, {768671, 209152611}) = 0 Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com From greearb at candelatech.com Wed Mar 19 13:45:05 2008 From: greearb at candelatech.com (Ben Greear) Date: Wed, 19 Mar 2008 13:45:05 -0700 Subject: [Xorp-hackers] Question of FEA strace In-Reply-To: <47E1627E.3060308@candelatech.com> References: <47E1627E.3060308@candelatech.com> Message-ID: <47E17B51.2050309@candelatech.com> Ben Greear wrote: > Any idea what is causing those writev messages? It seems they are not > passing much information, but doing so very often... Having a hard > time grepping through the code to see the originator... Nevermind...looks like xrl responses... Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com From pavlin at ICSI.Berkeley.EDU Wed Mar 19 21:47:23 2008 From: pavlin at ICSI.Berkeley.EDU (Pavlin Radoslavov) Date: Wed, 19 Mar 2008 21:47:23 -0700 Subject: [Xorp-hackers] Question on netlink_socket_utilities In-Reply-To: <47DEC1DC.5030607@candelatech.com> References: <47DEC1DC.5030607@candelatech.com> Message-ID: <200803200447.m2K4lO3A001014@fruitcake.ICSI.Berkeley.EDU> Ben Greear wrote: > In the method FibConfigEntryGetNetlinkSocket::parse_buffer_netlink_socket(...) > there is a big for loop, but most of the case statements seem to break out > or return, especially the > case RTM_NEWROUTE: > case RTM_DELROUTE: > case RTM_GETROUTE: > > Is this on purpose, or should the line below be > changed to not return out of the method? > > return (NlmUtils::nlm_get_to_fte_cfg(iftree, fte, nlh, rtmsg, > rta_len)); Yes, this is on purpose. We are parsing the reply entry for the single-entry query, so apart of the reply entry itself we don't care about the rest of the entries. Regards, Pavlin From pavlin at ICSI.Berkeley.EDU Wed Mar 19 21:57:29 2008 From: pavlin at ICSI.Berkeley.EDU (Pavlin Radoslavov) Date: Wed, 19 Mar 2008 21:57:29 -0700 Subject: [Xorp-hackers] FEA: Difference in _live_config v/s _pulled_config In-Reply-To: <47DEF218.3040200@candelatech.com> References: <47DEF218.3040200@candelatech.com> Message-ID: <200803200457.m2K4vUwf002864@fruitcake.ICSI.Berkeley.EDU> Ben Greear wrote: > It seems to me that these two have significant overlap in > meaning and perhaps could be consolidated? > > Why do we have both of them? Conceptually they are populated using different mechanisms, but pragmatically they should contain same information. The _live_config IfTree is populated/updated asynchronously by the IfConfig Observer that tracks the kernel upcalls. The _pulled_config is populated on demand (explicitly) by the pull_config() method. In other words, _live_config is a moving target, while _pulled_config is a snapshot. Regards, Pavlin From greearb at candelatech.com Wed Mar 19 22:01:42 2008 From: greearb at candelatech.com (Ben Greear) Date: Wed, 19 Mar 2008 22:01:42 -0700 Subject: [Xorp-hackers] FEA: Difference in _live_config v/s _pulled_config In-Reply-To: <200803200457.m2K4vUwf002864@fruitcake.ICSI.Berkeley.EDU> References: <47DEF218.3040200@candelatech.com> <200803200457.m2K4vUwf002864@fruitcake.ICSI.Berkeley.EDU> Message-ID: <47E1EFB6.9050706@candelatech.com> Pavlin Radoslavov wrote: > Ben Greear wrote: > > >> It seems to me that these two have significant overlap in >> meaning and perhaps could be consolidated? >> >> Why do we have both of them? >> > > Conceptually they are populated using different mechanisms, but > pragmatically they should contain same information. > > The _live_config IfTree is populated/updated asynchronously by the > IfConfig Observer that tracks the kernel upcalls. > The _pulled_config is populated on demand (explicitly) by the > pull_config() method. > In other words, _live_config is a moving target, while > _pulled_config is a snapshot. > I merged these, and it seems to work fine. Eventually, that might be a good way to get rid of some many pull_config() calls as well. I can't think of any reason why we'd ever *want* pulled_config to get stale, so letting observer update it seems valid. Maybe I'm missing something? Thanks, Ben > Regards, > Pavlin > -- Ben Greear Candela Technologies Inc http://www.candelatech.com From greearb at candelatech.com Thu Mar 20 00:28:13 2008 From: greearb at candelatech.com (Ben Greear) Date: Thu, 20 Mar 2008 00:28:13 -0700 Subject: [Xorp-hackers] Patch to get rid of two system calls per asyncio send. Message-ID: <47E2120D.1070606@candelatech.com> Asyncio was disabling and enabling SIGPIPE for each send. At least on Linux (and probably BSD), we can use MSG_NOSIGNAL in most cases. Attached is a patch that implements this. Not specifically benchmarked, but it's always good to get rid of extra system calls... Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com -------------- next part -------------- A non-text attachment was scrubbed... Name: libxorp.patch Type: text/x-patch Size: 2692 bytes Desc: not available Url : http://mailman.ICSI.Berkeley.EDU/pipermail/xorp-hackers/attachments/20080320/8e5c196f/attachment.bin From cevhers at gmail.com Thu Mar 20 07:20:48 2008 From: cevhers at gmail.com (=?ISO-8859-1?Q?Sel=E7uk_Cevher?=) Date: Thu, 20 Mar 2008 10:20:48 -0400 Subject: [Xorp-hackers] Java processes Message-ID: <803a75c30803200720k1c16ac0chfa7b82933d617d94@mail.gmail.com> Hi All, I am just a beginner on XORP software development. I am trying to create a new XORP process which will communicate with some other existing XORP processes such as RIB. Based on what I have read so far, it seems like we are only allowed to create processes using C++. Is it possible to create a Java process and make it communicate in an efficient way to an existing XORP process such as RIB or XorpRtrMgr ? Thanks. Selcuk Cevher -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.ICSI.Berkeley.EDU/pipermail/xorp-hackers/attachments/20080320/f561ae8c/attachment.html From pavlin at ICSI.Berkeley.EDU Thu Mar 20 08:34:58 2008 From: pavlin at ICSI.Berkeley.EDU (Pavlin Radoslavov) Date: Thu, 20 Mar 2008 08:34:58 -0700 Subject: [Xorp-hackers] FEA performance improvements: only 'pull' active interfaces. In-Reply-To: <47DF0662.2070300@candelatech.com> References: <47D95634.8020202@candelatech.com> <200803132158.m2DLwdSH028393@fruitcake.ICSI.Berkeley.EDU> <47DA315A.9050208@candelatech.com> <47DF0662.2070300@candelatech.com> Message-ID: <200803201535.m2KFYxVM009082@fruitcake.ICSI.Berkeley.EDU> Ben, At the high level (ignoring the extra complexity) I like the idea of the FEA dealing only with those interfaces that it needs to (i.e., only the configured interfaces). Though, with a large patch like yours that affects the FEA in number of ways it requires very careful integration. Hence, please add it to Bugzilla like the previous patch. I will leave it to you to decide whether to reuse the previous Bugzilla entry or open a new one. Also, please add a comment to the Bugzilla entry that the patch includes other changes like the per-interface socket. On the technical side, why did you have to merge live_config with pulled_config? From performance perspective it shouldn't make difference. Thanks, Pavlin Ben Greear wrote: > I have completed the first pass at my attempt to speed up > FEA's handling of interfaces. > > Basically, I removed live_config entirely, using pulled_config > in it's place. For pulled_config, I only pull info about configured > interfaces, and for the observer, I ignore anything not in the > configured interfaces. Interfaces are added to the various iftrees > when they are configured by the user. > > In doing this, I hacked on a bunch of the xrl handler classes. > Mostly cosmetic, but it makes the patch quite large. I tried to > remove or minimize direct access to the ifconfig's iftree objects, > but some seem necessary and remain. > > The patch also still has a lot of debugging code in it. > > But, it does appear to work. > > I tested my 30-node scenario with ~600 interfaces (about 10-15 of them > associated with each xorp instance, and others not in any xorp). > > With my patch, the load still goes to around 30 when I'm heavily modifying > interfaces in the xorps, but the time to make a xorpsh commit was a max of > about 6 seconds and the system was generally responsive. > > fea also starts up quicker since it doesn't have to read all the interfaces > in on startup. > > Without this latest addition, the system load went to 30, and xorpsh > commits were taking 90+ seconds. > > So, it's definitely a winner for xorp on linux in my scenario. It's > likely that if netlink is not used, or if there are few interfaces, or > if there are lots of interfaces and xorp is using all of them, then there > will NOT be a lot of gain from my patch (maybe even slightly worse performance > since I'm not batch-reading all interfaces with netlink now.) > > Its also likely I broke the compile on some systems, as some of the netlink > code was still using live_config(), but evidently was #ifdef'd out on Linux > since it compiled fine for me. > > Systems that don't use netlink shouldn't be affected much either way, I think, > but I've no way to really test them. > > The patch is too big for the mailing list, but can be downloaded > from here: > > http://www.candelatech.com/oss/fea_iftree.patch > > Comments welcome. If there is something I can do or change to give this more > of a chance of being accepted, please let me know. > > Thanks, > Ben > > -- > Ben Greear > Candela Technologies Inc http://www.candelatech.com > > _______________________________________________ > Xorp-hackers mailing list > Xorp-hackers at icir.org > http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/xorp-hackers From greearb at candelatech.com Thu Mar 20 09:29:27 2008 From: greearb at candelatech.com (Ben Greear) Date: Thu, 20 Mar 2008 09:29:27 -0700 Subject: [Xorp-hackers] FEA performance improvements: only 'pull' active interfaces. In-Reply-To: <200803201535.m2KFYxVM009082@fruitcake.ICSI.Berkeley.EDU> References: <47D95634.8020202@candelatech.com> <200803132158.m2DLwdSH028393@fruitcake.ICSI.Berkeley.EDU> <47DA315A.9050208@candelatech.com> <47DF0662.2070300@candelatech.com> <200803201535.m2KFYxVM009082@fruitcake.ICSI.Berkeley.EDU> Message-ID: <47E290E7.10304@candelatech.com> Pavlin Radoslavov wrote: > Ben, > > At the high level (ignoring the extra complexity) I like the idea of > the FEA dealing only with those interfaces that it needs to (i.e., > only the configured interfaces). > > Though, with a large patch like yours that affects the FEA in number > of ways it requires very careful integration. > Hence, please add it to Bugzilla like the previous patch. > I will leave it to you to decide whether to reuse the previous > Bugzilla entry or open a new one. > Also, please add a comment to the Bugzilla entry that the patch > includes other changes like the per-interface socket. > > On the technical side, why did you have to merge live_config with > pulled_config? From performance perspective it shouldn't make > difference. > Anything I can get rid of, I don't have to worry about keeping in sync, and at least theoretically, if we pull_config once, and then handle all subsequent updates from the observer, we should always be current and not need to pull-config again. My patch continues to get larger...I added code yesterday to filter on route-table updates (listening to only the routing-table entries that Xorp is configured to use, when it is configured to use a specific routing table). With all of this in place, fea now seems pretty efficient, or at least ospf and rtr-mgr are more visible in 'top' much of the time in my scenario. I also notice about 5 get-time-of-day (or equiv) calls per loop in fea and ospf. I'm guessing I can get rid of a bunch of those as well, which should also help improve performance more. I'll put the new patch in bugz when I get the final tweaks worked out. Thanks, Ben > Thanks, > Pavlin > > > Ben Greear wrote: > > >> I have completed the first pass at my attempt to speed up >> FEA's handling of interfaces. >> >> Basically, I removed live_config entirely, using pulled_config >> in it's place. For pulled_config, I only pull info about configured >> interfaces, and for the observer, I ignore anything not in the >> configured interfaces. Interfaces are added to the various iftrees >> when they are configured by the user. >> >> In doing this, I hacked on a bunch of the xrl handler classes. >> Mostly cosmetic, but it makes the patch quite large. I tried to >> remove or minimize direct access to the ifconfig's iftree objects, >> but some seem necessary and remain. >> >> The patch also still has a lot of debugging code in it. >> >> But, it does appear to work. >> >> I tested my 30-node scenario with ~600 interfaces (about 10-15 of them >> associated with each xorp instance, and others not in any xorp). >> >> With my patch, the load still goes to around 30 when I'm heavily modifying >> interfaces in the xorps, but the time to make a xorpsh commit was a max of >> about 6 seconds and the system was generally responsive. >> >> fea also starts up quicker since it doesn't have to read all the interfaces >> in on startup. >> >> Without this latest addition, the system load went to 30, and xorpsh >> commits were taking 90+ seconds. >> >> So, it's definitely a winner for xorp on linux in my scenario. It's >> likely that if netlink is not used, or if there are few interfaces, or >> if there are lots of interfaces and xorp is using all of them, then there >> will NOT be a lot of gain from my patch (maybe even slightly worse performance >> since I'm not batch-reading all interfaces with netlink now.) >> >> Its also likely I broke the compile on some systems, as some of the netlink >> code was still using live_config(), but evidently was #ifdef'd out on Linux >> since it compiled fine for me. >> >> Systems that don't use netlink shouldn't be affected much either way, I think, >> but I've no way to really test them. >> >> The patch is too big for the mailing list, but can be downloaded >> from here: >> >> http://www.candelatech.com/oss/fea_iftree.patch >> >> Comments welcome. If there is something I can do or change to give this more >> of a chance of being accepted, please let me know. >> >> Thanks, >> Ben >> >> -- >> Ben Greear >> Candela Technologies Inc http://www.candelatech.com >> >> _______________________________________________ >> Xorp-hackers mailing list >> Xorp-hackers at icir.org >> http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/xorp-hackers >> -- Ben Greear Candela Technologies Inc http://www.candelatech.com From pavlin at ICSI.Berkeley.EDU Thu Mar 20 09:31:43 2008 From: pavlin at ICSI.Berkeley.EDU (Pavlin Radoslavov) Date: Thu, 20 Mar 2008 09:31:43 -0700 Subject: [Xorp-hackers] Doubt on VLAN implementation In-Reply-To: <47E13FD7.6030207@candelatech.com> References: <47E13FD7.6030207@candelatech.com> Message-ID: <200803201631.m2KGViGp017857@fruitcake.ICSI.Berkeley.EDU> Ben Greear wrote: > I've been poking at the VLAN code in FEA. The part that probes the > linux kernel seems > a bit strange. First, if I read it correctly, it could end up with a > iface:vif pair for > the VLAN, and also a parent_iface:vif pairing. That's correct and is intentional. E.g., in your configuration you could use a VLAN in both ways: interface eth0 { vif vlan10 { ... } } OR interface vlan10 { vif vlan10 { ... } } Consider the second one as a backward compatibility feature for folks that were probably using VLANs by configuring them manually before starting older versions of XORP (i.e., before the VLAN support was added). The second mechanism might eventually disappear in the future. > Second, is it possible to add virtual IPs on top of a VLAN (on top of an > ethernet)? > It would seem not if the VLAN device is a vif instead of an interface. Could you clarify what you mean by "virtual IPs". You should be able to have a configuration like that would assign two IP addresses to vlan10: interface eth0 { vif vlan10 { address 1.2.3.4 { ... } address 5.6.7.8 { ... } } } > Even if that works, Linux (2.6.23+) supports mac-vlans on top of > ethernet and vlans on top of > mac-vlans (and vice-versa), and virtual ips on top of any of these. > That doesn't easily map to a single parent-child relationship. I have to admit that when comes to VLANs I am thinking the IEEE 802.1Q Standard. How the relationships you describe above fit with 802.1Q? If you have an URL with detailed description that would be useful. > My suggestion is to make VLANs (and all other net-devices) the same as > 'real' ethernet > interfaces, but store parameters in the iface to allow it to rebuild the > virtual devices if > needed (parent-dev-name and VID for VLANs, parent-dev-name and MAC for > mac-vlans, > and so forth. This would require adding some new tags to the interface > config logic, > including device type (vlan, mac-vlan, etc), vlan-id, MAC-addr, and > maybe others for > new virtual devices. Configuration-wise how would it look like? My primary interest is to have configuration that is consistent with other router vendors' configuration. > Please also note that with the advent of network namespaces in Linux, it > may be possible to have > VLANs with no visible parent device (it being in a different namespace). Could you provide more information (e.g., URL) re. network namespaces in Linux. Thanks, Pavlin From pavlin at ICSI.Berkeley.EDU Thu Mar 20 09:40:08 2008 From: pavlin at ICSI.Berkeley.EDU (Pavlin Radoslavov) Date: Thu, 20 Mar 2008 09:40:08 -0700 Subject: [Xorp-hackers] Question of FEA strace In-Reply-To: <47E1627E.3060308@candelatech.com> References: <47E1627E.3060308@candelatech.com> Message-ID: <200803201640.m2KGe8eq019552@fruitcake.ICSI.Berkeley.EDU> Ben Greear wrote: > Any idea what is causing those writev messages? It seems they are not > passing much information, but doing so very often... Having a hard > time grepping through the code to see the originator... Those are from the XRL mechanism. I believe there are periodic keepalives between the XRL Finder and each process that is controlled by that Finder, but they should be on the order of once every few seconds or so. Regards, Pavlin > 11:32:24.043438 writev(31, [{"Finder 0.2\nMsgType r\nSeqNo 5825\nMsgData 100 / \n", 47}], 1) = 47 > 11:32:24.043570 rt_sigaction(SIGPIPE, {SIG_DFL}, {SIG_IGN}, 8) = 0 > 11:32:24.043644 rt_sigaction(SIGPIPE, {SIG_IGN}, {SIG_DFL}, 8) = 0 > 11:32:24.043705 writev(32, [{"Finder 0.2\nMsgType r\nSeqNo 5828\nMsgData 100 / \n", 47}], 1) = 47 > 11:32:24.043843 rt_sigaction(SIGPIPE, {SIG_DFL}, {SIG_IGN}, 8) = 0 > 11:32:24.043910 clock_gettime(CLOCK_MONOTONIC, {768671, 208128378}) = 0 > 11:32:24.043957 clock_gettime(CLOCK_MONOTONIC, {768671, 208173677}) = 0 > 11:32:24.044004 select(68, [15 16 17 26 27 28 29 30 31 32 33 34 35 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 57 58 59 60 61 62 63 64 65 66 67], [28 30], [], {1, 397070}) = 2 (out [28 30], left {1, 397070}) > 11:32:24.044102 clock_gettime(CLOCK_MONOTONIC, {768671, 208318973}) = 0 > 11:32:24.044250 rt_sigaction(SIGPIPE, {SIG_IGN}, {SIG_DFL}, 8) = 0 > 11:32:24.044400 writev(28, [{"Finder 0.2\nMsgType r\nSeqNo 5835\nMsgData 100 / \n", 47}], 1) = 47 > 11:32:24.044514 rt_sigaction(SIGPIPE, {SIG_DFL}, {SIG_IGN}, 8) = 0 > 11:32:24.044641 rt_sigaction(SIGPIPE, {SIG_IGN}, {SIG_DFL}, 8) = 0 > 11:32:24.044722 writev(30, [{"Finder 0.2\nMsgType r\nSeqNo 5832\nMsgData 100 / \n", 47}], 1) = 47 > 11:32:24.044864 rt_sigaction(SIGPIPE, {SIG_DFL}, {SIG_IGN}, 8) = 0 > 11:32:24.044937 clock_gettime(CLOCK_MONOTONIC, {768671, 209152611}) = 0 > > Thanks, > Ben > > -- > Ben Greear > Candela Technologies Inc http://www.candelatech.com > > _______________________________________________ > Xorp-hackers mailing list > Xorp-hackers at icir.org > http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/xorp-hackers From pavlin at ICSI.Berkeley.EDU Thu Mar 20 09:48:22 2008 From: pavlin at ICSI.Berkeley.EDU (Pavlin Radoslavov) Date: Thu, 20 Mar 2008 09:48:22 -0700 Subject: [Xorp-hackers] FEA: Difference in _live_config v/s _pulled_config In-Reply-To: <47E1EFB6.9050706@candelatech.com> References: <47DEF218.3040200@candelatech.com> <200803200457.m2K4vUwf002864@fruitcake.ICSI.Berkeley.EDU> <47E1EFB6.9050706@candelatech.com> Message-ID: <200803201648.m2KGmMej021140@fruitcake.ICSI.Berkeley.EDU> Ben Greear wrote: > Pavlin Radoslavov wrote: > > Ben Greear wrote: > > > > > >> It seems to me that these two have significant overlap in > >> meaning and perhaps could be consolidated? > >> > >> Why do we have both of them? > >> > > > > Conceptually they are populated using different mechanisms, but > > pragmatically they should contain same information. > > > > The _live_config IfTree is populated/updated asynchronously by the > > IfConfig Observer that tracks the kernel upcalls. > > The _pulled_config is populated on demand (explicitly) by the > > pull_config() method. > > In other words, _live_config is a moving target, while > > _pulled_config is a snapshot. > > > I merged these, and it seems to work fine. Eventually, that might be a > good way > to get rid of some many pull_config() calls as well. I can't think of > any reason why > we'd ever *want* pulled_config to get stale, so letting observer update > it seems > valid. Maybe I'm missing something? At the high level, _pulled_config is used for synchronous purpose, while the _live_config is used for asyncronous purpose. However, just can't just get rid of pulled_config: * The Observer mechanism doesn't always exist (e.g., Windows). * In some cases (like when committing the interface configuration), in the middle of the commit we need to synchronously pull the interface configuration from the kernel (e.g., to fill-in kernel generated information such as the physical interface index). The Observer is asynchronous so we can't use it for the synchronous population of _pull_config. Hope that helps, Pavlin From pavlin at ICSI.Berkeley.EDU Thu Mar 20 09:58:09 2008 From: pavlin at ICSI.Berkeley.EDU (Pavlin Radoslavov) Date: Thu, 20 Mar 2008 09:58:09 -0700 Subject: [Xorp-hackers] Patch to get rid of two system calls per asyncio send. In-Reply-To: <47E2120D.1070606@candelatech.com> References: <47E2120D.1070606@candelatech.com> Message-ID: <200803201658.m2KGw9il022683@fruitcake.ICSI.Berkeley.EDU> Ben Greear wrote: > Asyncio was disabling and enabling SIGPIPE for each send. At least on Linux > (and probably BSD), we can use MSG_NOSIGNAL in most cases. Attached is a patch > that implements this. Not specifically benchmarked, but it's always good to > get rid of > extra system calls... I agree that we should get rid of extra system calls. However, this part of the code is very critical and we want to be very careful with it (e.g., it has been changed by a number of people in the past and it might be quite fragile). Said that, please add it to Bugzilla. Thanks, Pavlin > Thanks, > Ben > > -- > Ben Greear Candela Technologies Inc > http://www.candelatech.com > > > Index: asyncio.cc > =================================================================== > RCS file: /cvs/xorp/libxorp/asyncio.cc,v > retrieving revision 1.40 > diff -u -r1.40 asyncio.cc > --- asyncio.cc 4 Jan 2008 03:16:32 -0000 1.40 > +++ asyncio.cc 20 Mar 2008 07:23:56 -0000 > @@ -205,8 +205,12 @@ > _last_error = 0; > done = ::read(_fd, head->buffer() + head->offset(), > head->buffer_bytes() - head->offset()); > - if (done < 0) > + if (done < 0) { > _last_error = errno; > + XLOG_WARNING("read error: _fd: %i offset: %i total-len: %i error: %s\n", > + (int)(_fd), head->offset(), head->buffer_bytes(), > + strerror(errno)); > + } > errno = 0; > #endif // ! HOST_OS_WINDOWS > > @@ -571,8 +575,10 @@ > XLOG_ASSERT(! dst_addr.is_zero()); > > #ifndef HOST_OS_WINDOWS > +#ifndef MSG_NOSIGNAL // save two system calls if MSG_NOSIGNAL is supported. > sig_t saved_sigpipe = signal(SIGPIPE, SIG_IGN); > #endif > +#endif > > switch (dst_addr.af()) { > case AF_INET: > @@ -584,7 +590,11 @@ > > done = ::sendto(_fd, XORP_CONST_BUF_CAST(_iov[0].iov_base), > _iov[0].iov_len, > +#ifdef MSG_NOSIGNAL > + MSG_NOSIGNAL, > +#else > 0, > +#endif > reinterpret_cast(&sin), > sizeof(sin)); > break; > @@ -599,7 +609,11 @@ > > done = ::sendto(_fd, XORP_CONST_BUF_CAST(_iov[0].iov_base), > _iov[0].iov_len, > +#ifdef MSG_NOSIGNAL > + MSG_NOSIGNAL, > +#else > 0, > +#endif > reinterpret_cast(&sin6), > sizeof(sin6)); > break; > @@ -620,8 +634,10 @@ > } > > #ifndef HOST_OS_WINDOWS > +#ifndef MSG_NOSIGNAL > signal(SIGPIPE, saved_sigpipe); > #endif > +#endif > > } else { > // > @@ -654,16 +670,34 @@ > _last_error = (result == FALSE) ? GetLastError() : 0; > } > #else // ! HOST_OS_WINDOWS > - sig_t saved_sigpipe = signal(SIGPIPE, SIG_IGN); > > - errno = 0; > - _last_error = 0; > - done = ::writev(_fd, _iov, (int)iov_cnt); > - if (done < 0) > - _last_error = errno; > - errno = 0; > +#ifdef MSG_NOSIGNAL > + if (iov_cnt == 1) { > + // No need for coelesce, so use send. This saves us the two > + // sigaction calls since we can pass the MSG_NOSIGNAL flag. > + errno = 0; > + _last_error = 0; > + done = ::send(_fd, XORP_CONST_BUF_CAST(_iov[0].iov_base), > + _iov[0].iov_len, MSG_NOSIGNAL); > + if (done < 0) > + _last_error = errno; > + errno = 0; > + } > + else { > +#endif > + sig_t saved_sigpipe = signal(SIGPIPE, SIG_IGN); > > - signal(SIGPIPE, saved_sigpipe); > + errno = 0; > + _last_error = 0; > + done = ::writev(_fd, _iov, (int)iov_cnt); > + if (done < 0) > + _last_error = errno; > + errno = 0; > + > + signal(SIGPIPE, saved_sigpipe); > +#ifdef HOST_OS_LINUX > + } > +#endif > #endif // ! HOST_OS_WINDOWS > } > > _______________________________________________ > Xorp-hackers mailing list > Xorp-hackers at icir.org > http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/xorp-hackers From greearb at candelatech.com Thu Mar 20 10:24:46 2008 From: greearb at candelatech.com (Ben Greear) Date: Thu, 20 Mar 2008 10:24:46 -0700 Subject: [Xorp-hackers] Patch to get rid of two system calls per asyncio send. In-Reply-To: <200803201658.m2KGw9il022683@fruitcake.ICSI.Berkeley.EDU> References: <47E2120D.1070606@candelatech.com> <200803201658.m2KGw9il022683@fruitcake.ICSI.Berkeley.EDU> Message-ID: <47E29DDE.4000904@candelatech.com> Pavlin Radoslavov wrote: > Ben Greear wrote: > > >> Asyncio was disabling and enabling SIGPIPE for each send. At least on Linux >> (and probably BSD), we can use MSG_NOSIGNAL in most cases. Attached is a patch >> that implements this. Not specifically benchmarked, but it's always good to >> get rid of >> extra system calls... >> > > I agree that we should get rid of extra system calls. > However, this part of the code is very critical and we want to be > very careful with it (e.g., it has been changed by a number of > people in the past and it might be quite fragile). > Said that, please add it to Bugzilla. > Maybe open a slight 'unstable' period to merge the riskier patches and let us all do the testing from a common CVS? If the patches remain in bugz, very few people are actually going to be able to test the code, and if my code tree diverges too much from CVS, then it will become merge hell for me, and my testing will not be as useful for the general Xorp community. Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com From greearb at candelatech.com Thu Mar 20 10:31:03 2008 From: greearb at candelatech.com (Ben Greear) Date: Thu, 20 Mar 2008 10:31:03 -0700 Subject: [Xorp-hackers] FEA: Difference in _live_config v/s _pulled_config In-Reply-To: <200803201648.m2KGmMej021140@fruitcake.ICSI.Berkeley.EDU> References: <47DEF218.3040200@candelatech.com> <200803200457.m2K4vUwf002864@fruitcake.ICSI.Berkeley.EDU> <47E1EFB6.9050706@candelatech.com> <200803201648.m2KGmMej021140@fruitcake.ICSI.Berkeley.EDU> Message-ID: <47E29F57.1030309@candelatech.com> Pavlin Radoslavov wrote: > At the high level, _pulled_config is used for synchronous purpose, > while the _live_config is used for asyncronous purpose. > > However, just can't just get rid of pulled_config: > I basically got rid of live-config, and let the observer update the pulled_config tree. > * The Observer mechanism doesn't always exist (e.g., Windows). > > * In some cases (like when committing the interface > configuration), in the middle of the commit we need to > synchronously pull the interface configuration from the kernel > (e.g., to fill-in kernel generated information such as the > physical interface index). > The Observer is asynchronous so we can't use it for the synchronous > population of _pull_config. > I think that will work fine with how I'm trying to use it. Xorp isn't multi-threaded (thank god), so no observer will be updating the tree while we are doing a sync pull_config. Any messages that come in afterwards should be either duplicate information (which should cause no harm), or be new changes since the sync pull completed. There *might* be a way to get in a weird state if the observer has queued msgs from before the sync pull, but I can't think of any offhand since it seems they should just be redundant in that case. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com From pavlin at ICSI.Berkeley.EDU Thu Mar 20 10:37:59 2008 From: pavlin at ICSI.Berkeley.EDU (Pavlin Radoslavov) Date: Thu, 20 Mar 2008 10:37:59 -0700 Subject: [Xorp-hackers] FEA performance improvements: only 'pull' active interfaces. In-Reply-To: <47E290E7.10304@candelatech.com> References: <47D95634.8020202@candelatech.com> <200803132158.m2DLwdSH028393@fruitcake.ICSI.Berkeley.EDU> <47DA315A.9050208@candelatech.com> <47DF0662.2070300@candelatech.com> <200803201535.m2KFYxVM009082@fruitcake.ICSI.Berkeley.EDU> <47E290E7.10304@candelatech.com> Message-ID: <200803201737.m2KHbxO1000171@fruitcake.ICSI.Berkeley.EDU> > My patch continues to get larger...I added code yesterday to filter on > route-table > updates (listening to only the routing-table entries that Xorp is > configured to use, when it > is configured to use a specific routing table). That's good. Please add this to the patch you are going to place in Bugzilla. > With all of this in place, fea now seems pretty efficient, or at least > ospf and > rtr-mgr are more visible in 'top' much of the time in my scenario. > > I also notice about 5 get-time-of-day (or equiv) calls per loop in fea > and ospf. > I'm guessing I can get rid of a bunch of those as well, which should > also help > improve performance more. Those gettimeofday() are in the eventloop. I should warn you that removing some of those might be tricky, and that this part of the code also has been rewritten a number of times. Thanks, Pavlin From bms at incunabulum.net Thu Mar 20 10:41:23 2008 From: bms at incunabulum.net (Bruce M Simpson) Date: Thu, 20 Mar 2008 17:41:23 +0000 Subject: [Xorp-hackers] Java processes In-Reply-To: <803a75c30803200720k1c16ac0chfa7b82933d617d94@mail.gmail.com> References: <803a75c30803200720k1c16ac0chfa7b82933d617d94@mail.gmail.com> Message-ID: <47E2A1C3.5000503@incunabulum.net> Sel?uk Cevher wrote: > Hi All, > > I am just a beginner on XORP software development. > I am trying to create a new XORP process which will communicate with > some other existing XORP processes such as RIB. > Based on what I have read so far, it seems like we are only allowed to > create processes using C++. > Is it possible to create a Java process and make it communicate in an > efficient way to an existing XORP process such as RIB or XorpRtrMgr ? This is something we want to address, because we believe users should not be limited to using C++ for implementing routing processes. At the moment, the XRL layer, libxipc, and the XRL client stub libraries, make the assumption that you are using C++. This is because of how callbacks are implemented and dispatched, leaving aside the type conversion issues. I did some initial research in this area, using Python as the target language, and SWIG to generate the callback stubs. SWIG will get you as far as building proxy classes for your chosen target language -- I made trivial changes to the clnt-gen and tgt-gen scripts to generate SWIG .i files. However, I haven't gotten my head around typemaps or how those can be used to deal with the callback problem, and documentation for these features in SWIG seems incomplete and hazy to me. later BMS From pavlin at ICSI.Berkeley.EDU Thu Mar 20 11:04:36 2008 From: pavlin at ICSI.Berkeley.EDU (Pavlin Radoslavov) Date: Thu, 20 Mar 2008 11:04:36 -0700 Subject: [Xorp-hackers] Patch to get rid of two system calls per asyncio send. In-Reply-To: <47E29DDE.4000904@candelatech.com> References: <47E2120D.1070606@candelatech.com> <200803201658.m2KGw9il022683@fruitcake.ICSI.Berkeley.EDU> <47E29DDE.4000904@candelatech.com> Message-ID: <200803201804.m2KI4bbX005715@fruitcake.ICSI.Berkeley.EDU> Ben Greear wrote: > Pavlin Radoslavov wrote: > > Ben Greear wrote: > > > > > >> Asyncio was disabling and enabling SIGPIPE for each send. At least on Linux > >> (and probably BSD), we can use MSG_NOSIGNAL in most cases. Attached is a patch > >> that implements this. Not specifically benchmarked, but it's always good to > >> get rid of > >> extra system calls... > >> > > > > I agree that we should get rid of extra system calls. > > However, this part of the code is very critical and we want to be > > very careful with it (e.g., it has been changed by a number of > > people in the past and it might be quite fragile). > > Said that, please add it to Bugzilla. > > > Maybe open a slight 'unstable' period to merge the riskier patches and > let us all > do the testing from a common CVS? If the patches remain in bugz, very few > people are actually going to be able to test the code, and if my code > tree diverges > too much from CVS, then it will become merge hell for me, and my testing > will not be as useful for the general Xorp community. OK, given that the patch is relatively small, please try to clean it up by eliminating the extra #ifdef and try to see if you can reduce the code duplication when using different system calls. I will make it a high priority for me to double-check and commit the patch, and will leave it to the community to test it :) Thanks, Pavlin From greearb at candelatech.com Thu Mar 20 16:35:05 2008 From: greearb at candelatech.com (Ben Greear) Date: Thu, 20 Mar 2008 16:35:05 -0700 Subject: [Xorp-hackers] Patch to get rid of two system calls per asyncio send. In-Reply-To: <200803201804.m2KI4bbX005715@fruitcake.ICSI.Berkeley.EDU> References: <47E2120D.1070606@candelatech.com> <200803201658.m2KGw9il022683@fruitcake.ICSI.Berkeley.EDU> <47E29DDE.4000904@candelatech.com> <200803201804.m2KI4bbX005715@fruitcake.ICSI.Berkeley.EDU> Message-ID: <47E2F4A9.8010809@candelatech.com> Pavlin Radoslavov wrote: > OK, given that the patch is relatively small, please try to clean it > up by eliminating the extra #ifdef and try to see if you can reduce > the code duplication when using different system calls. > I will make it a high priority for me to double-check and commit the > patch, and will leave it to the community to test it :) Thanks, the patch is attached. I tried to get all of the #ifdefs in one place in the code. This appears to work fine on Linux, but might have to futz with it a bit to compile on Windows, maybe by faking a #define SIGPIPE as I alluded to in the comment. Please let me know if you want any more changes. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com -------------- next part -------------- A non-text attachment was scrubbed... Name: libxorp.patch Type: text/x-patch Size: 3076 bytes Desc: not available Url : http://mailman.ICSI.Berkeley.EDU/pipermail/xorp-hackers/attachments/20080320/3af00a68/attachment.bin From greearb at candelatech.com Fri Mar 21 13:49:45 2008 From: greearb at candelatech.com (Ben Greear) Date: Fri, 21 Mar 2008 13:49:45 -0700 Subject: [Xorp-hackers] New OSPF assert, probably related to FEA. Message-ID: <47E41F69.4090000@candelatech.com> This could be caused by my changes to FEA, but I'm curious if anyone knows what sorts of things could cause this message: [ 2008/03/21 13:35:35 WARNING xorp_ospfv2:27182 OSPF area_router.cc:5759 routing_router_link_transitV2 ] LSA in database MaxAge Network-LSA: LS age 3600 Options 0x2 DC: 0 EA: 0 N/P: 0 MC: 0 E: 1 LS type 0x2 Link State ID 99.1.1.21 Advertising Router 127.1.0.21 LS sequence number 0x80000004 LS checksum 0xc6d9 length 36 Network Mask 0xffffff00 Attached Router 127.1.0.21 Attached Router 127.1.0.1 Attached Router 127.1.0.11 [ 2008/03/21 13:35:41 WARNING xorp_ospfv2:27182 OSPF area_router.cc:5759 routing_router_link_transitV2 ] LSA in database MaxAge Network-LSA: LS age 3600 Options 0x2 DC: 0 EA: 0 N/P: 0 MC: 0 E: 1 LS type 0x2 Link State ID 99.1.1.21 Advertising Router 127.1.0.21 LS sequence number 0x80000004 LS checksum 0xc6d9 length 36 Network Mask 0xffffff00 Attached Router 127.1.0.21 Attached Router 127.1.0.1 Attached Router 127.1.0.11 [ 2008/03/21 13:35:48 WARNING xorp_fea XrlFeaTarget ] Handling method for raw_packet4/0.1/join_multicast_group failed: XrlCmdError 102 Command failed Cannot join group 224.0.0.6 on interface 7.16.7 vif 7.16.7 protocol 89 receiver ospfv2-7ac9ea602e05bc450f1ea8c6a1245d13 at 127.0.0.1: not registered [ 2008/03/21 13:35:48 FATAL xorp_ospfv2:27182 OSPF xrl_io.cc:640 join_multicast_group_cb ] Cannot join a multicast group on interface 7.16.7 vif 7.16.7: 102 Command failed Cannot join group 224.0.0.6 on interface 7.16.7 vif 7.16.7 protocol 89 receiver ospfv2-7ac9ea602e05bc450f1ea8c6a1245d13 at 127.0.0.1: not registered (BAD_ARGS, CMD_FAILED, INTERNAL_ERR) (I added the suggested work-around to keep ospf from asserting about the LSA, but I'm hitting this a few minutes after those LSA asserts would have happened, it seems.) Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com From pavlin at ICSI.Berkeley.EDU Fri Mar 21 14:34:47 2008 From: pavlin at ICSI.Berkeley.EDU (Pavlin Radoslavov) Date: Fri, 21 Mar 2008 14:34:47 -0700 Subject: [Xorp-hackers] New OSPF assert, probably related to FEA. In-Reply-To: <47E41F69.4090000@candelatech.com> References: <47E41F69.4090000@candelatech.com> Message-ID: <200803212134.m2LLYmQC024154@fruitcake.ICSI.Berkeley.EDU> Before sending the join_multicast_group XRL, the OSPF process is suppose to be registered by using the register_receiver XRL. If for some reason it has unregistered (e.g., by sending unregister_receiver) or because the FEA unregistered it on its own, then you will get the "not registered" error. Hope that helps debugging the problem. Pavlin Ben Greear wrote: > This could be caused by my changes to FEA, but I'm curious > if anyone knows what sorts of things could cause this message: > [ 2008/03/21 13:35:35 WARNING xorp_ospfv2:27182 OSPF area_router.cc:5759 routing_router_link_transitV2 ] LSA in database MaxAge > Network-LSA: > LS age 3600 Options 0x2 DC: 0 EA: 0 N/P: 0 MC: 0 E: 1 LS type 0x2 Link State ID 99.1.1.21 Advertising Router 127.1.0.21 LS sequence number 0x80000004 LS checksum 0xc6d9 length 36 > Network Mask 0xffffff00 > Attached Router 127.1.0.21 > Attached Router 127.1.0.1 > Attached Router 127.1.0.11 > [ 2008/03/21 13:35:41 WARNING xorp_ospfv2:27182 OSPF area_router.cc:5759 routing_router_link_transitV2 ] LSA in database MaxAge > Network-LSA: > LS age 3600 Options 0x2 DC: 0 EA: 0 N/P: 0 MC: 0 E: 1 LS type 0x2 Link State ID 99.1.1.21 Advertising Router 127.1.0.21 LS sequence number 0x80000004 LS checksum 0xc6d9 length 36 > Network Mask 0xffffff00 > Attached Router 127.1.0.21 > Attached Router 127.1.0.1 > Attached Router 127.1.0.11 > [ 2008/03/21 13:35:48 WARNING xorp_fea XrlFeaTarget ] Handling method for raw_packet4/0.1/join_multicast_group failed: XrlCmdError 102 Command failed Cannot join group 224.0.0.6 on interface 7.16.7 vif 7.16.7 protocol 89 receiver ospfv2-7ac9ea602e05bc450f1ea8c6a1245d13 at 127.0.0.1: not registered > [ 2008/03/21 13:35:48 FATAL xorp_ospfv2:27182 OSPF xrl_io.cc:640 join_multicast_group_cb ] Cannot join a multicast group on interface 7.16.7 vif 7.16.7: 102 Command failed Cannot join group 224.0.0.6 on interface 7.16.7 vif 7.16.7 protocol 89 receiver ospfv2-7ac9ea602e05bc450f1ea8c6a1245d13 at 127.0.0.1: not registered (BAD_ARGS, CMD_FAILED, INTERNAL_ERR) > > > (I added the suggested work-around to keep ospf from asserting about > the LSA, but I'm hitting this a few minutes after those LSA asserts > would have happened, it seems.) > > Thanks, > Ben > > -- > Ben Greear > Candela Technologies Inc http://www.candelatech.com > > _______________________________________________ > Xorp-hackers mailing list > Xorp-hackers at icir.org > http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/xorp-hackers From greearb at candelatech.com Fri Mar 21 14:37:12 2008 From: greearb at candelatech.com (Ben Greear) Date: Fri, 21 Mar 2008 14:37:12 -0700 Subject: [Xorp-hackers] FEA performance improvements: only 'pull' active interfaces. In-Reply-To: <47E427DE.7020908@icsi.berkeley.edu> References: <47D95634.8020202@candelatech.com> <200803132158.m2DLwdSH028393@fruitcake.ICSI.Berkeley.EDU> <47DA315A.9050208@candelatech.com> <47DF0662.2070300@candelatech.com> <200803201535.m2KFYxVM009082@fruitcake.ICSI.Berkeley.EDU> <47E290E7.10304@candelatech.com> <200803201737.m2KHbxO1000171@fruitcake.ICSI.Berkeley.EDU> <47E427DE.7020908@icsi.berkeley.edu> Message-ID: <47E42A88.1040606@candelatech.com> Bruce M. Simpson wrote: > In any event, XORP should now prefer the use of clock_gettime() in the > event loop where it's available, and trying to optimize the calls away > might not buy that much CPU back. > > It is very easy to hose the event loop by making the wrong changes here, > speaking from experience. You are correct that it is using clock_gettime, but it's still a system call and I'd prefer to get rid of as many as those as possible. select(67, [15 16 17 25 26 27 28 29 30 31 32 33 34 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 56 57 58 59 60 61 62 63 64 65 66], [54], [], {19, 872830}) = 2 (in [61], out [54], left {19, 872830}) clock_gettime(CLOCK_MONOTONIC, {952353, 316825858}) = 0 send(54, "STCP\1\1\0\4\27\330\0\3\0\0\0d\0\0\0\0\0\0\0\4\314\0\0\0", 28, MSG_NOSIGNAL) = 28 recvmsg(61, {msg_name(16)={sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("10.25.26.25")}, msg_iov(1)=[{"E\300\0\314*\354\0\0\1Y\210\366\n\31\32\31\340\0\0\5\2\4\0\270\177\1\0\31\0\0\0\0"..., 65536}], msg_controllen=24, {cmsg_len=24, cmsg_level=SOL_IP, cmsg_type=, ...}, msg_flags=0}, 0) = 204 clock_gettime(CLOCK_MONOTONIC, {952353, 317042741}) = 0 clock_gettime(CLOCK_MONOTONIC, {952353, 317069312}) = 0 select(67, [15 16 17 25 26 27 28 29 30 31 32 33 34 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 56 57 58 59 60 61 62 63 64 65 66], [], [], {19, 871169}) = 1 (in [66], left {19, 871169}) clock_gettime(CLOCK_MONOTONIC, {952353, 317322836}) = 0 read(66, "STCP\1\1\0\3\t\274\0\3\0\0\0d\0\0\0\0\0\0\0\4\314\0\0\0STCP"..., 193376) = 56 clock_gettime(CLOCK_MONOTONIC, {952353, 317437482}) = 0 clock_gettime(CLOCK_MONOTONIC, {952353, 317463985}) = 0 clock_gettime(CLOCK_MONOTONIC, {952353, 317489131}) = 0 clock_gettime(CLOCK_MONOTONIC, {952353, 317514145}) = 0 clock_gettime(CLOCK_MONOTONIC, {952353, 317548782}) = 0 select(67, [15 16 17 25 26 27 28 29 30 31 32 33 34 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 56 57 58 59 60 61 62 63 64 65 66], [], [], {0, 0}) = 0 (Timeout) clock_gettime(CLOCK_MONOTONIC, {952353, 317627603}) = 0 clock_gettime(CLOCK_MONOTONIC, {952353, 317654158}) = 0 clock_gettime(CLOCK_MONOTONIC, {952353, 317679387}) = 0 select(67, [15 16 17 25 26 27 28 29 30 31 32 33 34 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 56 57 58 59 60 61 62 63 64 65 66], [], [], {19, 870559}) = 1 (in [54], left {19, 868000}) clock_gettime(CLOCK_MONOTONIC, {952353, 320980102}) = 0 But, it seems to me that we can *probably* fix up the code to only grab time once, right after select() returns and then pass the 'now' value to whatever code needs it. This means that we might be off by a few ms here and there, but probably this won't matter. Anyway, I have some other issues to resolve before I start hacking on this... Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com From pavlin at ICSI.Berkeley.EDU Fri Mar 21 14:42:29 2008 From: pavlin at ICSI.Berkeley.EDU (Pavlin Radoslavov) Date: Fri, 21 Mar 2008 14:42:29 -0700 Subject: [Xorp-hackers] Patch to get rid of two system calls per asyncio send. In-Reply-To: <47E2F4A9.8010809@candelatech.com> References: <47E2120D.1070606@candelatech.com> <200803201658.m2KGw9il022683@fruitcake.ICSI.Berkeley.EDU> <47E29DDE.4000904@candelatech.com> <200803201804.m2KI4bbX005715@fruitcake.ICSI.Berkeley.EDU> <47E2F4A9.8010809@candelatech.com> Message-ID: <200803212142.m2LLgTrV025999@fruitcake.ICSI.Berkeley.EDU> Ben Greear wrote: > Pavlin Radoslavov wrote: > > > OK, given that the patch is relatively small, please try to clean it > > up by eliminating the extra #ifdef and try to see if you can reduce > > the code duplication when using different system calls. > > I will make it a high priority for me to double-check and commit the > > patch, and will leave it to the community to test it :) > > Thanks, the patch is attached. I tried to get all of the #ifdefs > in one place in the code. This appears to work fine on > Linux, but might have to futz with it a bit to compile on > Windows, maybe by faking a #define SIGPIPE as I alluded to > in the comment. > > Please let me know if you want any more changes. Patch committed to CVS (with some modifications). Thanks, Pavlin From bms at incunabulum.net Sat Mar 22 05:06:08 2008 From: bms at incunabulum.net (Bruce M Simpson) Date: Sat, 22 Mar 2008 12:06:08 +0000 Subject: [Xorp-hackers] FEA performance improvements: only 'pull' active interfaces. In-Reply-To: <47E42A88.1040606@candelatech.com> References: <47D95634.8020202@candelatech.com> <200803132158.m2DLwdSH028393@fruitcake.ICSI.Berkeley.EDU> <47DA315A.9050208@candelatech.com> <47DF0662.2070300@candelatech.com> <200803201535.m2KFYxVM009082@fruitcake.ICSI.Berkeley.EDU> <47E290E7.10304@candelatech.com> <200803201737.m2KHbxO1000171@fruitcake.ICSI.Berkeley.EDU> <47E427DE.7020908@icsi.berkeley.edu> <47E42A88.1040606@candelatech.com> Message-ID: <47E4F630.8020007@incunabulum.net> Ben Greear wrote: > ... > But, it seems to me that we can *probably* fix up the code to only > grab time once, right after select() returns and then pass the 'now' > value to whatever code needs it. > > This means that we might be off by a few ms here and there, but probably > this won't matter. Hmm, the SystemClock class should already be doing this. There is an update method which only retrieves the time from the kernel if we absolutely have to, and the EventLoop should already be doing this. I seem to recall Orion introduced this code around 3.5 years ago. If SystemClock is not correctly caching its time value, that needs to be looked at. Cheers BMS From bms at incunabulum.net Mon Mar 24 13:25:20 2008 From: bms at incunabulum.net (Bruce M Simpson) Date: Mon, 24 Mar 2008 20:25:20 +0000 Subject: [Xorp-hackers] Patch to get rid of two system calls per asyncio send. In-Reply-To: <200803201658.m2KGw9il022683@fruitcake.ICSI.Berkeley.EDU> References: <47E2120D.1070606@candelatech.com> <200803201658.m2KGw9il022683@fruitcake.ICSI.Berkeley.EDU> Message-ID: <47E80E30.7050709@incunabulum.net> Hi, This seems like a good time and place to lay down the law about how asyncio.cc got more complicated, when I was dragged into the game to make it work inside Windows... Pavlin Radoslavov wrote: > Ben Greear wrote: > >> Asyncio was disabling and enabling SIGPIPE for each send. At least on Linux >> (and probably BSD), we can use MSG_NOSIGNAL in most cases. Attached is a patch >> that implements this. Not specifically benchmarked, but it's always good to >> get rid of >> extra system calls... >> > > I agree that we should get rid of extra system calls. > However, this part of the code is very critical and we want to be > very careful with it (e.g., it has been changed by a number of > people in the past and it might be quite fragile). > I second Pavlin. It is code which is risky to modify, without performing detailed testing across all the supported platforms. It took MONTHS of pain to get asyncio.cc working correctly under Windows, and even then, I didn't completely understand what was going on. So, I cheated. What follows is a tour down my memory lane... At one point I was proposing turning the I/O model upside down to fit what NT does, obviously I had to reconsider my approach as this would have taken too much development time, as well as being an overly intrusive change. There is some special magic going on there, which is necessary to make sure data gets in and out of Winsock's I/O thread without resorting to radical design change. To summarise: 1, In NT, all read and write operations block -- there is no such thing as non-blocking I/O for "ordinary" NT file descriptors. Winsock attempts to emulate it up to a point, however only for very specific APIs. The MSDN documentation explicitly states, in a number of places, that I/O Completion Ports are the preferred mechanism for high volume/low latency Winsock processing. [We do more special magic to enable XORP processes, such as xorpsh, to read from an NT console or pipe in an apparently non-blocking way, see win_con_read() and win_pipe_read() in win_io.c.] 2. In Winsock, socket events dispatched using the WSAEventSelect() mechanism are edge-triggered, not level-triggered (in the sense of digital logic design). The NT synchronisation primitives used to actually signal conditions are Event objects, created via the WSACreateEvent() API. 3. The generation of IOT_READ ("this file descriptor has data pending to be read") requires that a context switch to Winsock's thread is forced in order for background I/O processing to happen. Attempting to read data without such a context switch will simply cause the process's primary thread to block forever. Furthermore, it is possible for unread data to sit in one of Winsock's buffer *without* the IOT_READ event having been generated, in which case taking the context switch is unnecessarily expensive, and slows things down until the Winsock I/O thread effects a poll on our behalf ("Oh, I forgot to tell you, there's data waiting for you...") -- this is why the call to FIONREAD is there, otherwise it plays havoc with XRL latency. See the EDGE_TRIGGERED_READ_LATENCY define for the code which implements this path. 4. The disposition of IOT_WRITE ("this file descriptor may be written to") is edge triggered in Winsock, not level triggered as POSIX select() is; writes are also handled in the Winsock I/O thread. We cannot simply write() as much as we can, block, and have our event handler invoked as is the case in POSIX environments; instead we must reenter the EventLoop, causing a call to WaitForMultipleObjects() and thus a context switch. As such it's necessary to add a XorpTask upfront in order to service writes, as there is no way of knowing that the descriptor is ready to write to, *until* we have forced a context switch, giving Winsock a chance to tell us that it is! See the EDGE_TRIGGERED_WRITE define for the code which implements this path. 5. IOT_DISCONNECT is signalled as a separate Winsock event, see BufferedAsyncReader. The above probably sounds very clear, and straightforward, in hindsight, but it's worth bearing in mind it took several months of speculative work to pull it off. We had to make these design changes because the emulation of select() in Windows may only be used with sockets, and furthermore, it cannot deal with mixed address families, which was a dealbreaker for IPv6 support. Obviously these techniques aren't necessary if using NT I/O Completion Ports or NT threads as the dispatch mechanism, however, those are out of scope for XORP, for reasons which should be self explanatory from the above, if not, read the future thread on cross-language support. The knowledge herein should probably be more widely disseminated, for the benefit of folk porting POSIX applications to native Windows. Please don't break any of it :-) cheers BMS From bms at incunabulum.net Mon Mar 24 13:30:52 2008 From: bms at incunabulum.net (Bruce M Simpson) Date: Mon, 24 Mar 2008 20:30:52 +0000 Subject: [Xorp-hackers] Doubt on VLAN implementation In-Reply-To: <200803201631.m2KGViGp017857@fruitcake.ICSI.Berkeley.EDU> References: <47E13FD7.6030207@candelatech.com> <200803201631.m2KGViGp017857@fruitcake.ICSI.Berkeley.EDU> Message-ID: <47E80F7C.80800@incunabulum.net> Pavlin Radoslavov wrote: > I have to admit that when comes to VLANs I am thinking the IEEE > 802.1Q Standard. > How the relationships you describe above fit with 802.1Q? > If you have an URL with detailed description that would be useful. > There are folk out there using multiple 802.1Q encapsulation headers, particularly for stuff like Metro Ethernet. It is sometimes called "Q-in-Q". The encapsulation thing was a requirement crunch when rethinking the FreeBSD ether_input() path this time last year -- and all I wanted to do was make 802.1p work with my VoIP ATA... I didn't see any reason to support more than 2 levels of encapsulation on the same link layer, however it's entirely possible people are doing more than 2. Having said that, I'd like to see how Q-in-Q is configured on vendor equipment before making any radical changes anywhere. For XORP, personally I prefer the interface foo0 { vif vlan10 { ... } } style syntax, but how to capture Q-in-Q in that syntax is an open question. cheers BMS From pavlin at ICSI.Berkeley.EDU Mon Mar 24 13:44:10 2008 From: pavlin at ICSI.Berkeley.EDU (Pavlin Radoslavov) Date: Mon, 24 Mar 2008 13:44:10 -0700 Subject: [Xorp-hackers] Patch to get rid of two system calls per asyncio send. In-Reply-To: <47E80E30.7050709@incunabulum.net> References: <47E2120D.1070606@candelatech.com> <200803201658.m2KGw9il022683@fruitcake.ICSI.Berkeley.EDU> <47E80E30.7050709@incunabulum.net> Message-ID: <200803242044.m2OKiAb2006114@fruitcake.ICSI.Berkeley.EDU> Bruce M Simpson wrote: > Hi, > > This seems like a good time and place to lay down the law about how > asyncio.cc got more complicated, when I was dragged into the game to > make it work inside Windows... > Please don't break any of it :-) The (modified) patch from Ben that was committed keeps the original Windows behavior. Please let me know if you find something odd on Windows. Thanks, Pavlin From bms at incunabulum.net Mon Mar 24 13:45:53 2008 From: bms at incunabulum.net (Bruce M Simpson) Date: Mon, 24 Mar 2008 20:45:53 +0000 Subject: [Xorp-hackers] Question of FEA strace In-Reply-To: <200803201640.m2KGe8eq019552@fruitcake.ICSI.Berkeley.EDU> References: <47E1627E.3060308@candelatech.com> <200803201640.m2KGe8eq019552@fruitcake.ICSI.Berkeley.EDU> Message-ID: <47E81301.1030602@incunabulum.net> Pavlin Radoslavov wrote: > Ben Greear wrote: > > >> Any idea what is causing those writev messages? It seems they are not >> passing much information, but doing so very often... Having a hard >> time grepping through the code to see the originator... >> > > Those are from the XRL mechanism. > I believe there are periodic keepalives between the XRL Finder and > each process that is controlled by that Finder, but they should be > on the order of once every few seconds or so. > > Regards, > Pavlin > > >> 11:32:24.043438 writev(31, [{"Finder 0.2\nMsgType r\nSeqNo 5825\nMsgData 100 / \n", 47}], 1) = 47 >> 11:32:24.043570 rt_sigaction(SIGPIPE, {SIG_DFL}, {SIG_IGN}, 8) = 0 >> 11:32:24.043644 rt_sigaction(SIGPIPE, {SIG_IGN}, {SIG_DFL}, 8) = 0 >> 11:32:24.043705 writev(32, [{"Finder 0.2\nMsgType r\nSeqNo 5828\nMsgData 100 / \n", 47}], 1) = 47 >> 11:32:24.043843 rt_sigaction(SIGPIPE, {SIG_DFL}, {SIG_IGN}, 8) = 0 >> 11:32:24.043910 clock_gettime(CLOCK_MONOTONIC, {768671, 208128378}) = 0 >> 11:32:24.043957 clock_gettime(CLOCK_MONOTONIC, {768671, 208173677}) = 0 >> Just to recap: I introduced the change to use CLOCK_MONOTONIC in order to address a long-standing problem we had with event callbacks and busy xorp_bgp processes, as well as when NTP made changes to the system clock. Occasionally the BGP process would get very busy, and start to get late in dispatching callbacks, to the point of losing track of time. Or NTP would update the clock, causing deltas based on gettimeofday() to become completely bogus by appearing to go back in time. CLOCK_MONOTONIC is guaranteed by POSIX to be monotonically increasing, independently of wall-clock time, and to increment in SI units. Usually it's implemented in terms of a hardware clock, whose origin 0 is the time of system boot. As it's a timer used to calculate deltas in time, rather than display the wall-clock time, there is no need for it to correspond directly to the wall-clock time, although of course the underlying time base is subject to the laws of physics. [This means it's going to be subject to general relativity, although that is *currently* not a XORP design issue until we start installing XORP on communications satellites, hopefully well within the next 5 years.] In FreeBSD, CLOCK_MONOTONIC goes more or less directly to the timecounter code in the kernel, although there is a fast path and a precise path. XORP just requests the default, which is precise. The Linux 2.6 timer code looks eerily similar, although the effect is the same -- CLOCK_MONOTONIC can be difficult to cache, which is why the SystemClock class exists. Windows uses GetSystemTimeAsFileTime(), as this is the path of least resistance. The "right thing" to do there is probably to use GetTickCount64(), however this is Vista only -- it's entirely possible, though unlikely, that a XORP process timer could be scheduled 50 days in advance, or that the delta could span 50 days. [Having to read the uptime from the registry is the absolute worst case scenario -- it will return the correct consistent monotonic result we want, at a high API call cost.] cheers BMS From greearb at candelatech.com Mon Mar 24 14:28:44 2008 From: greearb at candelatech.com (Ben Greear) Date: Mon, 24 Mar 2008 14:28:44 -0700 Subject: [Xorp-hackers] Question of FEA strace In-Reply-To: <47E81301.1030602@incunabulum.net> References: <47E1627E.3060308@candelatech.com> <200803201640.m2KGe8eq019552@fruitcake.ICSI.Berkeley.EDU> <47E81301.1030602@incunabulum.net> Message-ID: <47E81D0C.3090807@candelatech.com> Bruce M Simpson wrote: > [This means it's going to be subject to general relativity, although > that is *currently* not a XORP design issue until we start installing > XORP on communications satellites, hopefully well within the next 5 years.] You'll have way more drift based on heat of the chip than relativity I think :) > Windows uses GetSystemTimeAsFileTime(), as this is the path of least > resistance. The "right thing" to do there is probably to use > GetTickCount64(), however this is Vista only -- it's entirely possible, > though unlikely, that a XORP process timer could be scheduled 50 days in > advance, or that the delta could span 50 days. Window's file-timer only updates every 10ms or so. This is probably fine for Xorp, but it plays merry hell if you want to do fine-grain timer things... I have code that uses the windows cycle timer and periodically re-correlates this with the file-time (which may be updated with ntp or equiv). If you ever need better than 10ms resolution, let me know and I'll send you my hackings. My code seems to work on Windows 2000 and XP, haven't tried it on Vista. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com From sureshkannan at gmail.com Mon Mar 24 21:35:17 2008 From: sureshkannan at gmail.com (Suresh Kannan) Date: Tue, 25 Mar 2008 10:05:17 +0530 Subject: [Xorp-hackers] Doubt on VLAN implementation In-Reply-To: <47E80F7C.80800@incunabulum.net> References: <47E13FD7.6030207@candelatech.com> <200803201631.m2KGViGp017857@fruitcake.ICSI.Berkeley.EDU> <47E80F7C.80800@incunabulum.net> Message-ID: <84f679e0803242135k7b7f37bgda993c55d50f0bd@mail.gmail.com> Hi Bruce & all, interface foo { vif vlan 10.200 { } } Is the above syntax makes clear view for QinQ support ?. Thanks, Regards, Suresh kannan. On Tue, Mar 25, 2008 at 2:00 AM, Bruce M Simpson wrote: > Pavlin Radoslavov wrote: > > I have to admit that when comes to VLANs I am thinking the IEEE > > 802.1Q Standard. > > How the relationships you describe above fit with 802.1Q? > > If you have an URL with detailed description that would be useful. > > > > There are folk out there using multiple 802.1Q encapsulation headers, > particularly for stuff like Metro Ethernet. It is sometimes called > "Q-in-Q". > > The encapsulation thing was a requirement crunch when rethinking the > FreeBSD ether_input() path this time last year -- and all I wanted to do > was make 802.1p work with my VoIP ATA... > > I didn't see any reason to support more than 2 levels of encapsulation > on the same link layer, however it's entirely possible people are doing > more than 2. > > Having said that, I'd like to see how Q-in-Q is configured on vendor > equipment before making any radical changes anywhere. For XORP, > personally I prefer the interface foo0 { vif vlan10 { ... } } style > syntax, but how to capture Q-in-Q in that syntax is an open question. > > cheers > BMS > > _______________________________________________ > Xorp-hackers mailing list > Xorp-hackers at icir.org > http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/xorp-hackers > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.ICSI.Berkeley.EDU/pipermail/xorp-hackers/attachments/20080325/2f8c940d/attachment.html From cevhers at gmail.com Mon Mar 24 22:26:10 2008 From: cevhers at gmail.com (=?ISO-8859-1?Q?Sel=E7uk_Cevher?=) Date: Tue, 25 Mar 2008 01:26:10 -0400 Subject: [Xorp-hackers] bootstrap Message-ID: <803a75c30803242226kb597526rf3b0d5cbf7cb682e@mail.gmail.com> Hi All, I created a .xif file containing a new XRL interface in xrl/interfaces folder. I also modified Makefile.am file in xrl/interfaces folder properly to build the related library, .hh, and .cc client files (I am basically talking about clnt-gen script). However, when I run the bootstrap script at the top-level of XORP tree, neither the library for the new interface nor the stub code for the caller (.hh and .cc) files are created in xrl/interfaces folder. What type of modifications should I make to get the library, .cc and .hh files for a newly added interface ? Thanks. Selcuk. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.ICSI.Berkeley.EDU/pipermail/xorp-hackers/attachments/20080325/9e222322/attachment-0001.html From bms at incunabulum.net Tue Mar 25 01:09:00 2008 From: bms at incunabulum.net (Bruce M Simpson) Date: Tue, 25 Mar 2008 08:09:00 +0000 Subject: [Xorp-hackers] Doubt on VLAN implementation In-Reply-To: <84f679e0803242135k7b7f37bgda993c55d50f0bd@mail.gmail.com> References: <47E13FD7.6030207@candelatech.com> <200803201631.m2KGViGp017857@fruitcake.ICSI.Berkeley.EDU> <47E80F7C.80800@incunabulum.net> <84f679e0803242135k7b7f37bgda993c55d50f0bd@mail.gmail.com> Message-ID: <47E8B31C.8050303@incunabulum.net> Suresh Kannan wrote: > Hi Bruce & all, > > interface foo { > vif vlan 10.200 { > } > } > > Is the above syntax makes clear view for QinQ support ?. > That was on the tip of my tongue, but I didn't dare speak it, because in some implementations, fooXX.YY can be used to mean "VLAN YY on interface fooXX". So I've been wary of using a period as a delimiter for the vlan term, given that overloaded meanings quickly lead to problems for network engineers during deployment, and it makes sense to make things easier for your user base. (Yes, I'd like to just get the damn bikeshed painted so the code can happen...) Even once we solve this simple problem of how to invoke a thing, we are left with the problem of the manifestation of the thing. Currently XORP knows how to make VLANs for Linux and FreeBSD, and to deal with that in the FEA block's syntax. Juniper has a very specific syntax for dual-tagging: http://www.juniper.net/techpubs/software/junos/junos90/swconfig-network-interfaces/flexible-vlan-tagging.html#id-13039477 I can see why they've done this. It is easier IMHO to treat dual-tagging as a special case, because it's not the default, and most open source forwarding plane implementations out there are geared towards dealing with a single VLAN tag. I left Q-in-Q as an exercise for the reader in FreeBSD; my refactoring there was just so that I could use 802.1p. At the moment the way to accomplish Q-in-Q there is to use Netgraph. obviously this is purely software plane and thus isn't optimal, and I wager newer cards are actually able to support Q-in-Q in ASIC, so it makes sense to go about solving the VLAN problem in a way which is able to capture these new MPLS/Metro Ethernet oriented use cases. cheers BMS From bms at incunabulum.net Tue Mar 25 01:27:30 2008 From: bms at incunabulum.net (Bruce M Simpson) Date: Tue, 25 Mar 2008 08:27:30 +0000 Subject: [Xorp-hackers] bootstrap In-Reply-To: <803a75c30803242226kb597526rf3b0d5cbf7cb682e@mail.gmail.com> References: <803a75c30803242226kb597526rf3b0d5cbf7cb682e@mail.gmail.com> Message-ID: <47E8B772.3010603@incunabulum.net> Hi, The bootstrap script doesn't generate the XIF files, it just regenerates the configure script and Makefile.in files from the Makefile.am and configure.in files, so that GNU make knows how to build what you've added to those files. Sel?uk Cevher wrote: > Hi All, > > I created a .xif file containing a new XRL interface in xrl/interfaces > folder. I also modified Makefile.am file in > xrl/interfaces folder properly to build the related library, .hh, and > .cc client files (I am basically talking about clnt-gen script). > However, when I run the bootstrap script at the top-level of XORP > tree, neither the library for the new interface nor the stub code for > the caller (.hh and .cc) files are created in xrl/interfaces folder. > > What type of modifications should I make to get the library, .cc and > .hh files for a newly added interface ? If you've added your XRL interface to the xrl/interfaces directory, then in order to generate the cc/hh files, you need to run "gmake" from within that directory. There are suffix rules in that Makefile.am which should do this automatically, providing you have something which depends upon the cc/hh files, usually one of the noinst_LTLIBRARIES primaries. These suffix rules are at the very end of Makefile.am. GNU make should see your library target, and build it unconditionally, unless you've added other conditionals around it. If it doesn't do this, something else is going wrong. In the automake world of build engineering, the order of directory search is strictly controlled by the top level Makefile.am. libxipc in particular needs to see some generated files before the xrl directory is actually reached in the build order, look at its Makefile.am for more information. The newer Boost.BuildV2 build magic will automatically rebuild all targets depending on the XIF, including the cc/hh files, providing something depends upon it in the global dependency graph, and does NOT require that you run bootstrap every time you change something. That however is not in CVS yet... cheers BMS From sureshkannan at gmail.com Tue Mar 25 03:02:50 2008 From: sureshkannan at gmail.com (Suresh Kannan) Date: Tue, 25 Mar 2008 15:32:50 +0530 Subject: [Xorp-hackers] Doubt on VLAN implementation In-Reply-To: <47E8B31C.8050303@incunabulum.net> References: <47E13FD7.6030207@candelatech.com> <200803201631.m2KGViGp017857@fruitcake.ICSI.Berkeley.EDU> <47E80F7C.80800@incunabulum.net> <84f679e0803242135k7b7f37bgda993c55d50f0bd@mail.gmail.com> <47E8B31C.8050303@incunabulum.net> Message-ID: <84f679e0803250302o158d7339j4d0b7b0fa09fa7a@mail.gmail.com> On Tue, Mar 25, 2008 at 1:39 PM, Bruce M Simpson wrote: > Suresh Kannan wrote: > > Hi Bruce & all, > > > > interface foo { > > vif vlan 10.200 { > > } > > } > > > > Is the above syntax makes clear view for QinQ support ?. > > > > That was on the tip of my tongue, but I didn't dare speak it, because in > some implementations, fooXX.YY can be used to mean "VLAN YY on interface > fooXX". > > So I've been wary of using a period as a delimiter for the vlan term, > given that overloaded meanings quickly lead to problems for network > engineers during deployment, and it makes sense to make things easier > for your user base. delimiter can be anything; need not be period. May be supporting flexibility in delimiter (at some extend) makes good for various user base. > > > (Yes, I'd like to just get the damn bikeshed painted so the code can > happen...) > > Even once we solve this simple problem of how to invoke a thing, we are > left with the problem of the manifestation of the thing. Currently XORP > knows how to make VLANs for Linux and FreeBSD, and to deal with that in > the FEA block's syntax. This is interesting part i yet to peek into. Is there any plan to extend XORP towards switching platform?. As above mentioned, XORP understand VLAN for linux and its mac-based. Implementing XORP for learning of MACS and other L2 functions, would help people to try out switching side and would explore possibilites of MPLS/Metro oriented use cases as BMS mentioned. > > Juniper has a very specific syntax for dual-tagging: > > http://www.juniper.net/techpubs/software/junos/junos90/swconfig-network-interfaces/flexible-vlan-tagging.html#id-13039477 > having tpid.vlanid is looks good. but do user need to specify always when the configure?. interface foo { vif vlan 10.200 { inner-tpid 0x9100 /* only if user want to over-ride the default (0x8100) settings */ } } Thanks, Regards, Suresh kannan. > > I can see why they've done this. It is easier IMHO to treat dual-tagging > as a special case, because it's not the default, and most open source > forwarding plane implementations out there are geared towards dealing > with a single VLAN tag. > > I left Q-in-Q as an exercise for the reader in FreeBSD; my refactoring > there was just so that I could use 802.1p. At the moment the way to > accomplish Q-in-Q there is to use Netgraph. obviously this is purely > software plane and thus isn't optimal, and I wager newer cards are > actually able to support Q-in-Q in ASIC, so it makes sense to go about > solving the VLAN problem in a way which is able to capture these new > MPLS/Metro Ethernet oriented use cases. > > cheers > BMS > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.ICSI.Berkeley.EDU/pipermail/xorp-hackers/attachments/20080325/9f82c8be/attachment.html From greearb at candelatech.com Tue Mar 25 09:23:19 2008 From: greearb at candelatech.com (Ben Greear) Date: Tue, 25 Mar 2008 09:23:19 -0700 Subject: [Xorp-hackers] Doubt on VLAN implementation In-Reply-To: <84f679e0803250302o158d7339j4d0b7b0fa09fa7a@mail.gmail.com> References: <47E13FD7.6030207@candelatech.com> <200803201631.m2KGViGp017857@fruitcake.ICSI.Berkeley.EDU> <47E80F7C.80800@incunabulum.net> <84f679e0803242135k7b7f37bgda993c55d50f0bd@mail.gmail.com> <47E8B31C.8050303@incunabulum.net> <84f679e0803250302o158d7339j4d0b7b0fa09fa7a@mail.gmail.com> Message-ID: <47E926F7.6050601@candelatech.com> Suresh Kannan wrote: > > > (Yes, I'd like to just get the damn bikeshed painted so the code can > happen...) > > Even once we solve this simple problem of how to invoke a thing, > we are > left with the problem of the manifestation of the thing. Currently > XORP > knows how to make VLANs for Linux and FreeBSD, and to deal with > that in > the FEA block's syntax. > > > This is interesting part i yet to peek into. Is there any plan to > extend XORP towards switching platform?. As above mentioned, XORP > understand VLAN for linux and its mac-based. Implementing XORP for > learning of MACS and other L2 functions, would help people to try out > switching side and would explore possibilites of MPLS/Metro oriented > use cases as BMS mentioned. In linux, you could just use a bridge device and add the proper VLAN interfaces (and other non-tagged interfaces) to the bridge. Bridge devices are another place where the current iface/vif CLI commands do not map well, btw. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com From cevhers at gmail.com Wed Mar 26 04:29:50 2008 From: cevhers at gmail.com (=?ISO-8859-1?Q?Sel=E7uk_Cevher?=) Date: Wed, 26 Mar 2008 07:29:50 -0400 Subject: [Xorp-hackers] problem regarding autotools Message-ID: <803a75c30803260429p42ffb985m8b84e145d200ce65@mail.gmail.com> Hi All, I recently added a new XRL interface (.xif file) into xrl/interfaces folder, and properly modified Makefile.am in that folder. After running bootstrap and then ./configure scripts at the top-level of the XORP tree to regenerate Makefile from the most recent Makefile.in, I ran gmake in xrl/interfaces folder. I had to use gmake with option -k to get the .hh and .cc files (stub code for the caller) for the newly added interface. libtool that XORP package uses seems to be an old version since I got the error message " libtool: unrecognized option `--tag=CXX' " when I used gmake. I got the .cc and .hh files now but it did not build the .lo and .o files regarding the new XRL interface. Below are the versions of autotools currently used: automake (GNU automake) 1.9.6 autoconf (GNU Autoconf) 2.59 ltmain.sh (GNU libtool) 1.5.22 (1.1220.2.365 2005/12/18 22:14:06) Thanks. Selcuk -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.ICSI.Berkeley.EDU/pipermail/xorp-hackers/attachments/20080326/ab3cd097/attachment.html From bms at ICSI.Berkeley.EDU Wed Mar 26 07:54:59 2008 From: bms at ICSI.Berkeley.EDU (Bruce M. Simpson) Date: Wed, 26 Mar 2008 14:54:59 +0000 Subject: [Xorp-hackers] FEA performance improvements: only 'pull' active interfaces. In-Reply-To: <47DA315A.9050208@candelatech.com> References: <47D95634.8020202@candelatech.com> <200803132158.m2DLwdSH028393@fruitcake.ICSI.Berkeley.EDU> <47DA315A.9050208@candelatech.com> Message-ID: <47EA63C3.8020002@icsi.berkeley.edu> Ben Greear wrote: > Pavlin Radoslavov wrote: > >> The problem with this solution is that it will work only for Netlink >> on Linux. The (majoriyty of the) other mechanisms for obtaining the >> network interface information (getifaddrs(3), ioctl(2), syssctl(3), >> etc) don't allow the granularity for asking only the information for >> a specific interface. > Yes, this may only be useful for my scenario where I'm using a small > number of interfaces per xorp instance, with large numbers of total interfaces. > Only linux can virtualize routing tables, as far as I know, so this performance > gain is only really important on Linux. Unfortunately as Pavlin points out getifaddrs() retrieves information for *all* interfaces configured in the system. It gets expensive for large N because it doesn't build a tree, it just uses a linked list. FreeBSD is about to see some virtualization support in this area, so a cross platform solution needs to be carefully considered. cheers BMS From pavlin at ICSI.Berkeley.EDU Wed Mar 26 08:24:25 2008 From: pavlin at ICSI.Berkeley.EDU (Pavlin Radoslavov) Date: Wed, 26 Mar 2008 08:24:25 -0700 Subject: [Xorp-hackers] problem regarding autotools In-Reply-To: <803a75c30803260429p42ffb985m8b84e145d200ce65@mail.gmail.com> References: <803a75c30803260429p42ffb985m8b84e145d200ce65@mail.gmail.com> Message-ID: <200803261524.m2QFOPXP016211@fruitcake.ICSI.Berkeley.EDU> > I recently added a new XRL interface (.xif file) into xrl/interfaces folder, > and properly modified Makefile.am in that folder. > After running bootstrap and then ./configure > scripts at the top-level of the XORP tree to regenerate Makefile from > the most recent Makefile.in, I ran > gmake in xrl/interfaces folder. I had to use gmake with option -k to > get the .hh and .cc files (stub code for the caller) for the newly added > interface. libtool that > XORP package uses seems to be an old version since I got the error > message " libtool: unrecognized option `--tag=CXX' " when I used > gmake. I got the .cc and .hh files now but it did not build the .lo > and .o files regarding the new XRL interface. > > Below are the versions of autotools currently used: > > automake (GNU automake) 1.9.6 > autoconf (GNU Autoconf) 2.59 > ltmain.sh (GNU libtool) 1.5.22 (1.1220.2.365 2005/12/18 22:14:06) The XORP autotools setup is configured and tested only with the following versions (listed in README): - autoconf version 2.61 - automake version 1.10 - libtool version 1.5.24 Number of things change between major versions of autoconf/automake/libtool, so you should install versions that are as close as possible to those listed above (e.g., the libtool-1.5.22 you have might be OK, but you should update autoconf and automake). Regards, Pavlin From greearb at candelatech.com Wed Mar 26 09:23:01 2008 From: greearb at candelatech.com (Ben Greear) Date: Wed, 26 Mar 2008 09:23:01 -0700 Subject: [Xorp-hackers] FEA performance improvements: only 'pull' active interfaces. In-Reply-To: <47EA63C3.8020002@icsi.berkeley.edu> References: <47D95634.8020202@candelatech.com> <200803132158.m2DLwdSH028393@fruitcake.ICSI.Berkeley.EDU> <47DA315A.9050208@candelatech.com> <47EA63C3.8020002@icsi.berkeley.edu> Message-ID: <47EA7865.1090704@candelatech.com> Bruce M. Simpson wrote: > Ben Greear wrote: >> Pavlin Radoslavov wrote: >> >>> The problem with this solution is that it will work only for Netlink >>> on Linux. The (majoriyty of the) other mechanisms for obtaining the >>> network interface information (getifaddrs(3), ioctl(2), syssctl(3), >>> etc) don't allow the granularity for asking only the information for >>> a specific interface. >> Yes, this may only be useful for my scenario where I'm using a small >> number of interfaces per xorp instance, with large numbers of total >> interfaces. >> Only linux can virtualize routing tables, as far as I know, so this >> performance >> gain is only really important on Linux. > > Unfortunately as Pavlin points out getifaddrs() retrieves information > for *all* interfaces configured in the system. It gets expensive for > large N because it doesn't build a tree, it just uses a linked list. > > FreeBSD is about to see some virtualization support in this area, so a > cross platform solution needs to be carefully considered. Well, if the base OS doesn't support the needed API, then there isn't much you can do. Linux does seem to support ways to optimize things...so I made the attempt. Soon, there will be a way to filter the Netlink routing-table update events as well, so I'll be able to optimize things even more for Linux. I don't think any of my changes would make BSD any less efficient, at least. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com From greearb at candelatech.com Wed Mar 26 14:25:18 2008 From: greearb at candelatech.com (Ben Greear) Date: Wed, 26 Mar 2008 14:25:18 -0700 Subject: [Xorp-hackers] OSPF assert scenario. Message-ID: <47EABF3E.5060500@candelatech.com> I have another OSPF assert scenario. First two can happen in either order, but close in time. 1 New interface is configured in fea and ospf via xorpsh 2 New interface is discovered by FEA observer and/or polling. 3 ospf learns the interface really exists and sends msg to fea to register an mcast addr. 4 interface is removed from OS and FEA notices 5 fea receives msg from ospf to register mcast addr 6 Cannot find iface, so return error, and OSPF aborts. I can fix this easily enough by just not asserting when OSPF notices the failure, but thought I'd post the scenario while it was fresh in my head. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com From atanu at ICSI.Berkeley.EDU Wed Mar 26 14:56:38 2008 From: atanu at ICSI.Berkeley.EDU (Atanu Ghosh) Date: Wed, 26 Mar 2008 14:56:38 -0700 Subject: [Xorp-hackers] OSPF assert scenario. In-Reply-To: Message from Ben Greear of "Wed, 26 Mar 2008 14:25:18 PDT." <47EABF3E.5060500@candelatech.com> Message-ID: <92202.1206568598@tigger.icir.org> Hi, The error is no longer fatal, hopefully this change won't cause any other problems. Atanu. Revision Changes Path 1.49 +3 -3; commitid: dd1647eac62f41a7; xorp/ospf/xrl_io.cc >>>>> "Ben" == Ben Greear writes: Ben> I have another OSPF assert scenario. First two can happen in Ben> either order, but close in time. Ben> 1 New interface is configured in fea and ospf via xorpsh 2 New Ben> interface is discovered by FEA observer and/or polling. Ben> 3 ospf learns the interface really exists and sends msg to fea Ben> to register an mcast addr. 4 interface is removed from OS and Ben> FEA notices 5 fea receives msg from ospf to register mcast addr Ben> 6 Cannot find iface, so return error, and OSPF aborts. Ben> I can fix this easily enough by just not asserting when OSPF Ben> notices the failure, but thought I'd post the scenario while it Ben> was fresh in my head. Ben> Thanks, Ben Ben> -- Ben Greear Candela Technologies Ben> Inc http://www.candelatech.com Ben> _______________________________________________ Xorp-hackers Ben> mailing list Xorp-hackers at icir.org Ben> http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/xorp-hackers From greearb at candelatech.com Thu Mar 27 11:26:06 2008 From: greearb at candelatech.com (Ben Greear) Date: Thu, 27 Mar 2008 11:26:06 -0700 Subject: [Xorp-hackers] ifa_index is 'int', not u16 Message-ID: <47EBE6BE.9090105@candelatech.com> I noticed that fea is using u_short to store ifaddrmsg->ifa_index in this method: nlm_cond_newdeladdr_to_fea_cfg According to Linux man pages, the ifa_index is an integer. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com From pavlin at ICSI.Berkeley.EDU Thu Mar 27 14:11:10 2008 From: pavlin at ICSI.Berkeley.EDU (Pavlin Radoslavov) Date: Thu, 27 Mar 2008 14:11:10 -0700 Subject: [Xorp-hackers] ifa_index is 'int', not u16 In-Reply-To: <47EBE6BE.9090105@candelatech.com> References: <47EBE6BE.9090105@candelatech.com> Message-ID: <200803272111.m2RLBAdJ009239@fruitcake.ICSI.Berkeley.EDU> Ben Greear wrote: > I noticed that fea is using u_short to store ifaddrmsg->ifa_index > in this method: > > nlm_cond_newdeladdr_to_fea_cfg > > According to Linux man pages, the ifa_index is an integer. You are right that we shouldn't be using u_short (the type was probably mixed up when the code was ported from FreeBSD which itself is using u_short). However, the Linux manual page is wrong. I just checked the header files for a number of Linux distributions (Fedora, Debian, Ubuntu, Gentoo 2006.1 and Gentoo 2007.?). Only in the older Gentoo distribution the ifa_index type is int; in all other the type is u32. Also, by definition the interface index shouldn't be a negative number. Hence I changed the local if_index variable in all leftover places from u_short to uint32_t: Revision Changes Path 1.17 +2 -2; commitid: 1011747ec0c5d41a7; xorp/fea/data_plane/control_socket/routing_socket_utilities.cc 1.21 +3 -3; commitid: 1011747ec0c5d41a7; xorp/fea/data_plane/ifconfig/ifconfig_get_proc_linux.cc 1.16 +3 -3; commitid: 1011747ec0c5d41a7; xorp/fea/data_plane/ifconfig/ifconfig_parse_getifaddrs.cc 1.15 +3 -3; commitid: 1011747ec0c5d41a7; xorp/fea/data_plane/ifconfig/ifconfig_parse_ioctl.cc 1.18 +5 -5; commitid: 1011747ec0c5d41a7; xorp/fea/data_plane/ifconfig/ifconfig_parse_netlink_socket.cc 1.21 +15 -15; commitid: 1011747ec0c5d41a7; xorp/fea/data_plane/ifconfig/ifconfig_parse_routing_socket.cc Thanks, Pavlin From bms at incunabulum.net Fri Mar 28 13:50:34 2008 From: bms at incunabulum.net (Bruce M Simpson) Date: Fri, 28 Mar 2008 20:50:34 +0000 Subject: [Xorp-hackers] ifa_index is 'int', not u16 In-Reply-To: <200803272111.m2RLBAdJ009239@fruitcake.ICSI.Berkeley.EDU> References: <47EBE6BE.9090105@candelatech.com> <200803272111.m2RLBAdJ009239@fruitcake.ICSI.Berkeley.EDU> Message-ID: <47ED5A1A.5020502@incunabulum.net> Pavlin Radoslavov wrote: > Hence I changed the local if_index variable in all leftover places > from u_short to uint32_t: > Thanks for the catch, I must have missed these instances. The original motivation for the change was that Windows uses 32-bit wide interface IDs in both the Winsock IPv4 stack and at the NDIS driver layer. It makes sense to move to 32-bit-wide IDs for scalability anyway. cheers BMS