From marat at vyatta.com Mon Jul 3 16:33:57 2006 From: marat at vyatta.com (Marat Nepomnyashy) Date: Mon, 3 Jul 2006 16:33:57 -0700 Subject: [Xorp-hackers] Tiny patch to detect missing operators to prevent logic error / unreachable code crash Message-ID: <006b01c69ef9$285963b0$6502a8c0@CPQ16151965929> Hi Pavlin, Please consider the attached patch for inclusion. Patch prevents an unreachable code crash in file rtrmgr/config_operators.cc method operator_to_str(...). Patch modifies file rtrmgr/template_base_command.cc, method AllowOperatorsCommand::verify_variables(...), that is shallower in the call stack from where the crash occurs, to return false when the ConfigOperator is OP_NONE, signaling that the configuration node is not valid and preventing the execution from reaching the point of crash. Thanks, Marat -------------- next part -------------- A non-text attachment was scrubbed... Name: patch-check-operator-01 Type: application/octet-stream Size: 464 bytes Desc: not available Url : http://mailman.ICSI.Berkeley.EDU/pipermail/xorp-hackers/attachments/20060703/90f59469/attachment.obj From pavlin at icir.org Mon Jul 3 16:47:39 2006 From: pavlin at icir.org (Pavlin Radoslavov) Date: Mon, 03 Jul 2006 16:47:39 -0700 Subject: [Xorp-hackers] Tiny patch to detect missing operators to prevent logic error / unreachable code crash In-Reply-To: Message from "Marat Nepomnyashy" of "Mon, 03 Jul 2006 16:33:57 PDT." <006b01c69ef9$285963b0$6502a8c0@CPQ16151965929> Message-ID: <200607032347.k63NldRX024743@possum.icir.org> > Please consider the attached patch for inclusion. > > Patch prevents an unreachable code crash in file rtrmgr/config_operators.cc > method operator_to_str(...). > > Patch modifies file rtrmgr/template_base_command.cc, method > AllowOperatorsCommand::verify_variables(...), that is shallower in the call > stack from where the crash occurs, to return false when the ConfigOperator is > OP_NONE, signaling that the configuration node is not valid and preventing the > execution from reaching the point of crash. Marat, Could you send the traceback stack from the coredump and instructions how to reproduce the crash. It looks like probably there is another problem somewhere else, and applying this patch might mask that problem. Thanks, Pavlin From marat at vyatta.com Mon Jul 3 17:52:13 2006 From: marat at vyatta.com (Marat Nepomnyashy) Date: Mon, 3 Jul 2006 17:52:13 -0700 Subject: [Xorp-hackers] Tiny patch to detect missing operators to prevent logic error / unreachable code crash References: <200607032347.k63NldRX024743@possum.icir.org> Message-ID: <00dc01c69f04$17744820$6502a8c0@CPQ16151965929> Sure Pavlin, Please find the traceback attached. This problem does not become apparent in xorpsh, but only in xgdaemon -- it is only reproducible in the Vyatta GUI. The xorpsh will work fine without the patch. The reason for this is that the xorpsh and xgdaemon modify the config in slightly different ways -- while the xorpsh completely prohibits setting and committing invalid values to config nodes, the Vyatta GUI xgdaemon only prohibits committing invalid configs, but still allows setting. The xgdaemon simply alerts the user when invalid values are temporarily set without a commit, and then the user can make corrections later based on the warning alert notifications. The GUI has been implemented slightly differently because by its nature it allows the user to specify multiple valid and invalid values at once, and the xgdaemon has to cache and keep track of all of them. The crash occurred when the xgdaemon started checking the whole tree for invalid nodes to determine if the configuration was committable with a call to ConfigTreeNode::check_config_tree(...). This patch allows the check to correctly return false when config nodes with missing required operators are present on the tree. -- Marat ----- Original Message ----- From: "Pavlin Radoslavov" To: "Marat Nepomnyashy" Cc: "Pavlin Radoslavov" ; Sent: Monday, July 03, 2006 4:47 PM Subject: Re: [Xorp-hackers] Tiny patch to detect missing operators to prevent logic error / unreachable code crash >> Please consider the attached patch for inclusion. >> >> Patch prevents an unreachable code crash in file >> rtrmgr/config_operators.cc >> method operator_to_str(...). >> >> Patch modifies file rtrmgr/template_base_command.cc, method >> AllowOperatorsCommand::verify_variables(...), that is shallower in the >> call >> stack from where the crash occurs, to return false when the >> ConfigOperator is >> OP_NONE, signaling that the configuration node is not valid and >> preventing the >> execution from reaching the point of crash. > > Marat, > > Could you send the traceback stack from the coredump and > instructions how to reproduce the crash. > > It looks like probably there is another problem somewhere else, and > applying this patch might mask that problem. > > Thanks, > Pavlin -------------- next part -------------- A non-text attachment was scrubbed... Name: traceback-check-operator-crash Type: application/octet-stream Size: 2627 bytes Desc: not available Url : http://mailman.ICSI.Berkeley.EDU/pipermail/xorp-hackers/attachments/20060703/502f0b32/attachment.obj From pavlin at icir.org Wed Jul 5 12:04:52 2006 From: pavlin at icir.org (Pavlin Radoslavov) Date: Wed, 05 Jul 2006 12:04:52 -0700 Subject: [Xorp-hackers] Tiny patch to detect missing operators to prevent logic error / unreachable code crash In-Reply-To: Message from "Marat Nepomnyashy" of "Mon, 03 Jul 2006 17:52:13 PDT." <00dc01c69f04$17744820$6502a8c0@CPQ16151965929> Message-ID: <200607051904.k65J4qFx085309@possum.icir.org> > This problem does not become apparent in xorpsh, but only in xgdaemon -- it is > only reproducible in the Vyatta GUI. The xorpsh will work fine without the > patch. > > The reason for this is that the xorpsh and xgdaemon modify the config in > slightly different ways -- while the xorpsh completely prohibits setting and > committing invalid values to config nodes, the Vyatta GUI xgdaemon only > prohibits committing invalid configs, but still allows setting. The xgdaemon > simply alerts the user when invalid values are temporarily set without a > commit, and then the user can make corrections later based on the warning > alert notifications. The GUI has been implemented slightly differently > because by its nature it allows the user to specify multiple valid and invalid > values at once, and the xgdaemon has to cache and keep track of all of them. > > The crash occurred when the xgdaemon started checking the whole tree for > invalid nodes to determine if the configuration was committable with a call to > ConfigTreeNode::check_config_tree(...). This patch allows the check to > correctly return false when config nodes with missing required operators are > present on the tree. Thank you for the detailed explanation, now it makes perfect sense. Fix committed to XORP CVS: Revision Changes Path 1.22 +6 -1; commitid: bada44ac096c7ea6; xorp/rtrmgr/template_base_command.cc Pavlin From pavlin at icir.org Mon Jul 10 13:02:07 2006 From: pavlin at icir.org (Pavlin Radoslavov) Date: Mon, 10 Jul 2006 13:02:07 -0700 Subject: [Xorp-hackers] XORP IGMPv3/MLDv2 implementation is completed Message-ID: <200607102002.k6AK274P080686@possum.icir.org> [A number of people had asked in the past for IGMPv3/MLDv2 support, so here it is] This is a prerelease (unofficial) announcement that as of July 10, 2006, the XORP CVS repository contains a dual IGMPv3/MLDv2 implementation that will be also in the forthcoming XORP-1.3 release. The code can be accessed from the anonymous CVS repository (http://www.xorp.org/cvs.html). For the time being, the default IGMP and MLD versions will continue to be 2 and 1 respectively. To enable the IGMPv3 support, only the "version" statement must be set to "3": protocols { igmp { interface eth0 { vif eth0 { version: 3 } } ... } } Enabling the MLDv2 support is similar: protocols { mld { interface eth0 { vif eth0 { version: 2 } } ... } } The code should be reasonably stable, but it could benefit from additional (and independent) testing. Hence, please let us know if you run into any issues. Note that the OS kernel must have IGMPv3 or MLDv2 support to run host SSM application, but this kernel support is not required to run the router-side only of IGMPv3/MLDv2 (such as in XORP). Special thanks to Guillaume Leclanche who did an initial IGMPv3 implementation for XORP approximately 1 year ago. A number of ideas in the final implementation came from his implementation. Pavlin From marat at vyatta.com Mon Jul 10 14:28:01 2006 From: marat at vyatta.com (Marat Nepomnyashy) Date: Mon, 10 Jul 2006 14:28:01 -0700 Subject: [Xorp-hackers] Patch to add method ConfigTreeNode::set_operator_without_verification(...) Message-ID: <010b01c6a467$b9b2d710$6502a8c0@CPQ16151965929> Hi Pavlin, Recently I added operator support to xgdaemon, and so I added method ConfigTreeNode::set_operator_without_verification(...) which is to operators what ConfigTreeNode::set_value_without_verification(...) is to values. Please consider this patch for inclusion into XORP. Thanks, Marat -------------- next part -------------- A non-text attachment was scrubbed... Name: patch-sowv-01 Type: application/octet-stream Size: 1157 bytes Desc: not available Url : http://mailman.ICSI.Berkeley.EDU/pipermail/xorp-hackers/attachments/20060710/6ec0251d/attachment.obj From pavlin at icir.org Tue Jul 11 00:13:27 2006 From: pavlin at icir.org (Pavlin Radoslavov) Date: Tue, 11 Jul 2006 00:13:27 -0700 Subject: [Xorp-hackers] Patch to add method ConfigTreeNode::set_operator_without_verification(...) In-Reply-To: Message from "Marat Nepomnyashy" of "Mon, 10 Jul 2006 14:28:01 PDT." <010b01c6a467$b9b2d710$6502a8c0@CPQ16151965929> Message-ID: <200607110713.k6B7DRoS087664@possum.icir.org> > Recently I added operator support to xgdaemon, and so I added method > ConfigTreeNode::set_operator_without_verification(...) which is to operators > what ConfigTreeNode::set_value_without_verification(...) is to values. > > Please consider this patch for inclusion into XORP. Patch committed to CVS: Revision Changes Path 1.113 +10 -3; commitid: 812d44b34ef07ea6; xorp/rtrmgr/conf_tree_node.cc 1.65 +2 -1; commitid: 812d44b34ef07ea6; xorp/rtrmgr/conf_tree_node.hh Thanks, Pavlin From atanu at ICSI.Berkeley.EDU Wed Jul 19 12:13:51 2006 From: atanu at ICSI.Berkeley.EDU (Atanu Ghosh) Date: Wed, 19 Jul 2006 12:13:51 -0700 Subject: [Xorp-hackers] Announcing XORP Release Candidate 1.3 Message-ID: <35211.1153336431@tigger.icir.org> On behalf of the entire XORP team, I'm delighted to announce the XORP 1.3 Release Candidate, which is now available from . Once the release candidate has proven to be stable, the actual 1.3 release will be prepared. This is planned to occur in the next two weeks. In the intervening period we will be fixing minor problems and updating the documentation. There are still a number of non-critical bugs that we know about which will not be addressed until the 1.4 release; these are documented in the errata section below. In general, to test XORP, we run automated regression tests on a daily basis with various operating systems and compilers. We also run a number of PCs as XORP routers. We have enabled as many protocols as feasible on those routers to test protocol interactions (for example a BGP IPv6 multicast feed being used by PIM-SM). In addition, automated scripts are run to externally toggle BGP peerings. Finally, we have automated scripts that interact directly with the xorpsh to change the configuration settings. We have put significant effort into testing but obviously we have not found all the problems. This is where you can help us to make XORP more stable, by downloading and using it! As always we'd welcome your comments - xorp-users at xorp.org is the right place for general discussion, and private feedback to the XORP core team can be sent to feedback at xorp.org. - The XORP Team P.S. Release notes and errata are included below. ------------------------------------------------------------------ XORP RELEASE NOTES Release 1.3-RC (2006/07/19) ========================= ALL: - Numerous improvements, bug fixes and cleanup. - XORP now builds on Linux Fedora Core5, DragonFlyBSD-1.4, FreeBSD-6.1. - Implementation of IGMPv3 (RFC 3376) and MLDv2 (RFC 3810). Those are necessary to complete the Source-Specific Multicast support. CONFIGURATION: - Addition of new OSPF configuration statement as part of the MD5 keys: * max-time-drift: u32 (default to 3600, i.e., 1 hour) It is used to set the maximum time drift (in seconds) among all OSPF routers. The allowed values are in the range [0--65535]. If the value is 65535, the time drift is unlimited. - The following statements for configuring static routes have been deprecated: route4, route6, interface-route4, interface-route6, mrib-route4, mrib-route6, mrib-interface-route4, mrib-interface-route6. The new replacement statements are: route, interface-route, mrib-route, mrib-interface-route. Each of the new statements can be used to configure either IPv4Net or IPv6Net route. - The following statements for configuring RIP and RIPng have been renamed: * route-expiry-secs -> route-timeout * route-deletion-secs -> deletion-delay * table-request-secs -> request-interval * interpacket-delay-msecs -> interpacket-delay - The following statements for configuring RIP and RIPng random intervals have been replaced: * triggered-update-min-secs and triggered-update-max-secs with triggered-delay and triggered-jitter * table-announce-min-secs and table-announce-max-secs with update-interval and update-jitter Previously, each interval was specified as [foo-min, foo-max]. Now each interval is specified as [foo - foo * jitter / 100, foo + foo * jitter / 100] where "jitter" is specified as a percentage (an integer in the interval [0, 100]) of the value of "foo". - The "version" statement for configuring an IGMP interface/vif allows values in the range [1-3]. Previously, the allowed range was [1-2]. - The "version" statement for configuring a MLD interface/vif allows values in the range [1-3]. Previously, the allowed range was [1-1]. - The following statement for configuring PIM-SM (pimsm4 and pimsm6) has been renamed: interval-sec -> interval - If a "then" policy block contains "accept" or "reject" statement, now all statements inside the "then" block are evaluated regardless of their position. - Addition of a new "exit" operational mode command that is equivalent to the "quit" operational mode command. - The "create" and "set" configuration commands are merged, so now the new "set" command can be used for setting values and for creating new configuration nodes. For backward compatibility, the obsoleted "create" command is preserved as an alias for the new "set" command, though it may be removed in the future. LIBXORP: - Few bug fixes in the RefTrie implementation. LIBXIPC: - Minor improvement in parsing XRL requests. LIBFEACLIENT: - No significant changes. XRL: - No significant changes. RTRMGR: - Various bug fixes. XORPSH: - Previously, the "commit" command was not available in configuration mode if there were no pending configuration changes. Now the "commit" command is always available, but the following message will be printed instead: "No configuration changes to commit." - Various bug fixes. POLICY: - Various bug fixes. FEA/MFEA: - Bug fix in transmitting large packets on Linux when using IP raw sockets. - Linux-related netlink socket code refactoring and bug fix. - Bug fix in obtaining the incoming interface for raw packets (in case of *BSD). - Bug fix in parsing the ancillary data from recvmsg(). - Accept zeroed source addresses of raw packets, because of protocols like IGMPv3. RIB: - Several bug fixes and improvements. RIP: - Various bug fixes in the MD5 authentication support. - Remove route flap when applying/deleting RIP-related import policies. - Fix an issue with INFINITY cost routes that might be bounced indefinitely between two XORP routers. OSPF: - Various bug fixes in the MD5 authentication support. BGP: - Prefix limits on a per peer basis. - Various bug fixes. STATIC_ROUTES: - No significant changes. MLD/IGMP: - Implementation of IGMPv3 (RFC 3376) and MLDv2 (RFC 3810). - Unification of the IGMP and MLD execution path. PIM-SM: - Bug fix related to the SPT switch (the bug is *BSD specific). - Use the RPF interface toward the BSR when transmitting a Cand-RP Advertisement message. Previously the first interface that is UP was chosen. - Use the RPF interface toward the RP when transmitting PIM Register messages toward the RP. Previously the interface of the directly connected source was chosen. FIB2MRIB: - No significant changes. CLI: - No significant changes. SNMP: - No significant changes. ------------------------------------------------------------------ XORP ERRATA ALL: - Parallel building (e.g., "gmake -j 4") may fail on multi-CPU machines. The simplest work-around is to rerun gmake or not to use the -j flag. - The following compiler is known to be buggy, and should not be used to compile XORP: gcc34 (GCC) 3.4.0 20040310 (prerelease) [FreeBSD] A newer compiler such as the following should be used instead: gcc34 (GCC) 3.4.2 20040827 (prerelease) [FreeBSD] - If you run BGP, RIB, FIB2MRIB, and PIM-SM at the same time, the propagation latency for the BGP routes to reach the kernel is increased. We are investigating the problem. LIBXORP: - No known issues. LIBXIPC: - No known issues. LIBFEACLIENT: - No known issues. XRL: - No known issues. RTRMGR: - There are several known issues, but none of them is considered critical. The list of known issues is available from: http://www.xorp.org/bugzilla/query.cgi - Using the rtrmgr "-r" command-line option to restart processes that have failed does not work if a process fails while being reconfigured via xorpsh. If that happens, the rtrmgr itself may coredump. Therefore, using the "-r" command-line option is not recommended! Also, note that a process that has been killed by SIGTERM or SIGKILL will not be restarted (this is a feature rather than a bug). Ideally, we want to monitor the processes status using the finder rather than the forked children process status, therefore in the future when we have a more robust implementation the "-r" switch will be removed and will be enabled by default. XORPSH: - There are several known issues, but none of them is considered critical. The list of known issues is available from: http://www.xorp.org/bugzilla/query.cgi FEA/MFEA: - On Linux with kernel 2.6 (e.g., RedHat FC2 with kernel 2.6.5-1.358), some of the tests may fail (with or without an error message), but no coredump image. Some of those failures can be contributed to a kernel problem. E.g., running "dmesg" can show kernel "Oops" messages like: Unable to handle kernel NULL pointer dereference at virtual address 00000000 printing eip: ... This appears to be a kernel bug triggered by ioctl(SIOCGIFNAME) which itself is called by if_indextoname(3). Currently, there is no known solution, but it appears the problem may have been fixed for more recent Linux kernel versions: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=121697 - On Linux with kernel older than linux-2.6.15-rc7 there is a kernel bug that prevents the FEA to receive netlink(7) notifications about added/deleted IPv6 network addresses and routes: http://www.uwsg.indiana.edu/hypermail/linux/kernel/0512.2/2121.html Typically, this could be an issue only if someone is running IPv6 PIM-SM on Linux, and only if the unicast routes may be modified while XORP is running. In that case the fix would be to replace "RTMGRP_IPV6_IFADDR" with "(RTMGRP_IPV6_IFADDR >> 1)" inside fea/ifconfig_observer_netlink.cc, and to replace "RTMGRP_IPV6_ROUTE" with "(RTMGRP_IPV6_ROUTE >> 1)" inside fticonfig_entry_observer_netlink.cc and fticonfig_table_observer_netlink.cc. - On Linux, adding and deleting multiple IPv4 addresses per interface may trigger an error: typically, if the primary IPv4 address is deleted, the kernel automatically deletes all secondary IPv4 addresses on that interface. In Linux kernel 2.6.12 and later, enabling the new sysctl net.ipv4.conf.all.promote_secondaries (or one of the interface specific variants) can be used to automatically promote one of the secondary addresses to become the new primary address. - The mechanism for tracking the network interface link status may not work for the following OS-es because the kernel for those systems does not provide a mechanism for asynchronous notification of userland programs when the link status changes: FreeBSD-5.2 and earlier and MacOS X (note: if the Windows kernel supports this feature, it is not used yet in XORP). Though, for those systems the link status should be read properly on startup. RIB: - In some rare cases, the RIB may fail to delete an existing route: http://www.xorp.org/bugzilla/show_bug.cgi?id=62 We are aware of the issue and will attempt to fix it in the future. RIP: - No known issues. OSPF: - There are several known issues, but none of them is considered critical. The list of known issues is available from: http://www.xorp.org/bugzilla/query.cgi BGP: - If the RIB bug above (failure to delete an existing route) is triggered by BGP, then the deletion failure error received by BGP from the RIB is considered by BGP as a fatal error. This is not a BGP problem, but a RIB problem that will be fixed in the future. - The BGP configuration mandates that an IPv4 nexthop must be supplied. Unfortunately it is necessary to provide an IPv4 nexthop even for an IPv6 only peering. Even more unfortunately it is not possible to force the IPv6 nexthop. - It is *essential* for an IPv6 peering that an IPv6 nexthop is provided. Unfortunately the configuration does not enforce this requrement. This will be fixed in the future. STATIC_ROUTES: - No known issues. MLD/IGMP: - If MLD/IGMP is started on Linux with a relatively large number of interfaces (e.g., on the order of 10), then it may fail with the following error: [ 2004/06/14 12:58:56 ERROR test_pim:16548 MFEA +666 mfea_proto_comm.cc join_multicast_group ] Cannot join group 224.0.0.2 on vif eth8: No buffer space available The solution is to increase the multicast group membership limit. E.g., to increase the value from 20 (the default) to 200, run as a root: echo 200 > /proc/sys/net/ipv4/igmp_max_memberships PIM-SM: - If the kernel does not support PIM-SM, or if PIM-SM is not enabled in the kernel, then running PIM-SM will fail with the following error message: [ 2004/06/12 10:26:41 ERROR xorp_fea:444 MFEA +529 mfea_mrouter.cc start_mrt ] setsockopt(MRT_INIT, 1) failed: Operation not supported - On Linux, if the unicast Reverse Path Forwarding information is different from the multicast Reverse Path Forwarding information, the Reverse Path Filtering should be disabled. E.g., as root: echo 0 > /proc/sys/net/ipv4/conf/all/rp_filter OR echo 0 > /proc/sys/net/ipv4/conf/eth0/rp_filter echo 0 > /proc/sys/net/ipv4/conf/eth1/rp_filter ... Otherwise, the router will ignore packets if they don't arrive on the reverse-path interface. For more information about Reverse Path Filtering see: http://www.tldp.org/HOWTO/Adv-Routing-HOWTO/lartc.kernel.rpf.html - Currently, the PIM-SM implementation does not support unnumbered point-to-point links. Furthermore, even on numbered point-to-point links the next-hop information in the routing entries should use an IP address instead of an interface name. For example, if there is a GRE tunnel on Linux and if we want to add a route that uses that tunnel, we should use a command like: route add -net gw instead of: route add -net - If PIM-SM is configured to run over a large number of interfaces (e.g., more than 31 VLANs), it might fail with the following error: [ 2006/07/04 11:56:23 ERROR xorp_fea:28353 MFEA +967 mfea_mrouter.cc add_multicast_vif ] setsockopt(MRT_ADD_VIF, vif eth0.4) failed: Too many open files in system The reason for that error is that by default majority of the UNIX kernels cannot support more than 32 interfaces enabled for multicast forwarding (one interface is always used as the internal PIM Register virtual interface). The solution is to increase the MAXVIFS limit in the kernel (typically defined in the "netinet/ip_mroute.h" (BSD) or the "include/linux/mroute.h" (Linux) kernel file), and recompile the kernel. It should be increased also in the corresponding system header file as well: or . After that XORP should be recompiled to take into account the MAXVIFS increase. If modifying the system header files is not acceptable, then the following should be added toward the end of file "xorp/mrt/max_vifs.h" before recompiling XORP: #undef MAX_VIFS #define MAX_VIFS 50 FIB2MRIB: - No known issues. CLI: - No known issues. SNMP: - On some versions of Linux, there are some bugs in net-snmp versions 5.0.8 and 5.0.9, which prevent dynamic loading from working. See the following URL for links to the net-snmp patches that solve the problems: http://www.xorp.org/snmp.html - Version 5.1 of net-snmp requires a simple modification, otherwise XORP will fail to compile. See the following URL for a link to the net-snmp patch that solves the problems: http://www.xorp.org/snmp.html From kohler at cs.ucla.edu Wed Jul 19 13:53:51 2006 From: kohler at cs.ucla.edu (Eddie Kohler) Date: Wed, 19 Jul 2006 13:53:51 -0700 Subject: [Xorp-hackers] Announcing XORP Release Candidate 1.3 In-Reply-To: <35211.1153336431@tigger.icir.org> References: <35211.1153336431@tigger.icir.org> Message-ID: <44BE9BDF.10101@cs.ucla.edu> Very cool, all! Eddie Atanu Ghosh wrote: > On behalf of the entire XORP team, I'm delighted to announce the XORP > 1.3 Release Candidate, which is now available from . > > Once the release candidate has proven to be stable, the actual 1.3 > release will be prepared. This is planned to occur in the next two > weeks. In the intervening period we will be fixing minor problems and > updating the documentation. > > There are still a number of non-critical bugs that we know about which > will not be addressed until the 1.4 release; these are documented in > the errata section below. > > In general, to test XORP, we run automated regression tests on a daily > basis with various operating systems and compilers. We also run a > number of PCs as XORP routers. We have enabled as many protocols as > feasible on those routers to test protocol interactions (for example a > BGP IPv6 multicast feed being used by PIM-SM). In addition, automated > scripts are run to externally toggle BGP peerings. Finally, we have > automated scripts that interact directly with the xorpsh to change the > configuration settings. > > We have put significant effort into testing but obviously we have not > found all the problems. This is where you can help us to make XORP > more stable, by downloading and using it! > > As always we'd welcome your comments - xorp-users at xorp.org is the > right place for general discussion, and private feedback to the XORP > core team can be sent to feedback at xorp.org. > > - The XORP Team > > P.S. > Release notes and errata are included below. > > ------------------------------------------------------------------ > XORP RELEASE NOTES > > Release 1.3-RC (2006/07/19) > ========================= > ALL: > - Numerous improvements, bug fixes and cleanup. > > - XORP now builds on Linux Fedora Core5, DragonFlyBSD-1.4, > FreeBSD-6.1. > > - Implementation of IGMPv3 (RFC 3376) and MLDv2 (RFC 3810). > Those are necessary to complete the Source-Specific Multicast > support. > > CONFIGURATION: > - Addition of new OSPF configuration statement as part of the MD5 > keys: > > * max-time-drift: u32 (default to 3600, i.e., 1 hour) > > It is used to set the maximum time drift (in seconds) among all > OSPF routers. The allowed values are in the range [0--65535]. If > the value is 65535, the time drift is unlimited. > > - The following statements for configuring static routes have been > deprecated: > route4, route6, interface-route4, interface-route6, mrib-route4, > mrib-route6, mrib-interface-route4, mrib-interface-route6. > > The new replacement statements are: > route, interface-route, mrib-route, mrib-interface-route. > > Each of the new statements can be used to configure either IPv4Net > or IPv6Net route. > > - The following statements for configuring RIP and RIPng have been > renamed: > > * route-expiry-secs -> route-timeout > > * route-deletion-secs -> deletion-delay > > * table-request-secs -> request-interval > > * interpacket-delay-msecs -> interpacket-delay > > - The following statements for configuring RIP and RIPng random > intervals have been replaced: > > * triggered-update-min-secs and triggered-update-max-secs with > triggered-delay and triggered-jitter > > * table-announce-min-secs and table-announce-max-secs with > update-interval and update-jitter > > Previously, each interval was specified as [foo-min, foo-max]. > Now each interval is specified as > [foo - foo * jitter / 100, foo + foo * jitter / 100] > where "jitter" is specified as a percentage (an integer in the > interval [0, 100]) of the value of "foo". > > - The "version" statement for configuring an IGMP interface/vif > allows values in the range [1-3]. Previously, the allowed range > was [1-2]. > > - The "version" statement for configuring a MLD interface/vif allows > values in the range [1-3]. Previously, the allowed range was [1-1]. > > - The following statement for configuring PIM-SM (pimsm4 and pimsm6) > has been renamed: > > interval-sec -> interval > > - If a "then" policy block contains "accept" or "reject" statement, > now all statements inside the "then" block are evaluated > regardless of their position. > > - Addition of a new "exit" operational mode command that is > equivalent to the "quit" operational mode command. > > - The "create" and "set" configuration commands are merged, so now > the new "set" command can be used for setting values and for > creating new configuration nodes. For backward compatibility, > the obsoleted "create" command is preserved as an alias for the > new "set" command, though it may be removed in the future. > > LIBXORP: > - Few bug fixes in the RefTrie implementation. > > LIBXIPC: > - Minor improvement in parsing XRL requests. > > LIBFEACLIENT: > - No significant changes. > > XRL: > - No significant changes. > > RTRMGR: > - Various bug fixes. > > XORPSH: > - Previously, the "commit" command was not available in > configuration mode if there were no pending configuration changes. > Now the "commit" command is always available, but the following > message will be printed instead: > "No configuration changes to commit." > > - Various bug fixes. > > POLICY: > - Various bug fixes. > > FEA/MFEA: > - Bug fix in transmitting large packets on Linux when using IP raw > sockets. > > - Linux-related netlink socket code refactoring and bug fix. > > - Bug fix in obtaining the incoming interface for raw packets > (in case of *BSD). > > - Bug fix in parsing the ancillary data from recvmsg(). > > - Accept zeroed source addresses of raw packets, because of > protocols like IGMPv3. > > RIB: > - Several bug fixes and improvements. > > RIP: > - Various bug fixes in the MD5 authentication support. > > - Remove route flap when applying/deleting RIP-related import > policies. > > - Fix an issue with INFINITY cost routes that might be bounced > indefinitely between two XORP routers. > > OSPF: > - Various bug fixes in the MD5 authentication support. > > BGP: > - Prefix limits on a per peer basis. > > - Various bug fixes. > > STATIC_ROUTES: > - No significant changes. > > MLD/IGMP: > - Implementation of IGMPv3 (RFC 3376) and MLDv2 (RFC 3810). > > - Unification of the IGMP and MLD execution path. > > PIM-SM: > - Bug fix related to the SPT switch (the bug is *BSD specific). > > - Use the RPF interface toward the BSR when transmitting a Cand-RP > Advertisement message. Previously the first interface that is UP > was chosen. > > - Use the RPF interface toward the RP when transmitting PIM Register > messages toward the RP. Previously the interface of the directly > connected source was chosen. > > FIB2MRIB: > - No significant changes. > > CLI: > - No significant changes. > > SNMP: > - No significant changes. > > ------------------------------------------------------------------ > XORP ERRATA > > ALL: > - Parallel building (e.g., "gmake -j 4") may fail on multi-CPU > machines. The simplest work-around is to rerun gmake or not to use > the -j flag. > > - The following compiler is known to be buggy, and should not be used > to compile XORP: > gcc34 (GCC) 3.4.0 20040310 (prerelease) [FreeBSD] > A newer compiler such as the following should be used instead: > gcc34 (GCC) 3.4.2 20040827 (prerelease) [FreeBSD] > > - If you run BGP, RIB, FIB2MRIB, and PIM-SM at the same time, > the propagation latency for the BGP routes to reach the kernel > is increased. We are investigating the problem. > > LIBXORP: > - No known issues. > > LIBXIPC: > - No known issues. > > LIBFEACLIENT: > - No known issues. > > XRL: > - No known issues. > > RTRMGR: > - There are several known issues, but none of them is considered > critical. The list of known issues is available from: > > http://www.xorp.org/bugzilla/query.cgi > > - Using the rtrmgr "-r" command-line option to restart processes > that have failed does not work if a process fails while being > reconfigured via xorpsh. If that happens, the rtrmgr itself may > coredump. Therefore, using the "-r" command-line option is not > recommended! Also, note that a process that has been killed by > SIGTERM or SIGKILL will not be restarted (this is a feature rather > than a bug). Ideally, we want to monitor the processes status > using the finder rather than the forked children process status, > therefore in the future when we have a more robust implementation > the "-r" switch will be removed and will be enabled by default. > > XORPSH: > - There are several known issues, but none of them is considered > critical. The list of known issues is available from: > > http://www.xorp.org/bugzilla/query.cgi > > FEA/MFEA: > - On Linux with kernel 2.6 (e.g., RedHat FC2 with kernel > 2.6.5-1.358), some of the tests may fail (with or without an error > message), but no coredump image. Some of those failures can be > contributed to a kernel problem. E.g., running "dmesg" can show > kernel "Oops" messages like: > > Unable to handle kernel NULL pointer dereference at virtual > address 00000000 printing eip: > ... > > This appears to be a kernel bug triggered by ioctl(SIOCGIFNAME) > which itself is called by if_indextoname(3). Currently, there > is no known solution, but it appears the problem may have been > fixed for more recent Linux kernel versions: > > https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=121697 > > - On Linux with kernel older than linux-2.6.15-rc7 there is a > kernel bug that prevents the FEA to receive netlink(7) notifications > about added/deleted IPv6 network addresses and routes: > > http://www.uwsg.indiana.edu/hypermail/linux/kernel/0512.2/2121.html > > Typically, this could be an issue only if someone is running > IPv6 PIM-SM on Linux, and only if the unicast routes may be modified > while XORP is running. In that case the fix would be to replace > "RTMGRP_IPV6_IFADDR" with "(RTMGRP_IPV6_IFADDR >> 1)" inside > fea/ifconfig_observer_netlink.cc, and to replace "RTMGRP_IPV6_ROUTE" > with "(RTMGRP_IPV6_ROUTE >> 1)" inside > fticonfig_entry_observer_netlink.cc and > fticonfig_table_observer_netlink.cc. > > - On Linux, adding and deleting multiple IPv4 addresses per interface > may trigger an error: typically, if the primary IPv4 address > is deleted, the kernel automatically deletes all secondary IPv4 > addresses on that interface. In Linux kernel 2.6.12 and later, > enabling the new sysctl net.ipv4.conf.all.promote_secondaries > (or one of the interface specific variants) can be used to > automatically promote one of the secondary addresses to become > the new primary address. > > - The mechanism for tracking the network interface link status > may not work for the following OS-es because the kernel for those > systems does not provide a mechanism for asynchronous notification > of userland programs when the link status changes: FreeBSD-5.2 and > earlier and MacOS X (note: if the Windows kernel supports this > feature, it is not used yet in XORP). Though, for those systems > the link status should be read properly on startup. > > RIB: > - In some rare cases, the RIB may fail to delete an existing route: > > http://www.xorp.org/bugzilla/show_bug.cgi?id=62 > > We are aware of the issue and will attempt to fix it in the > future. > > RIP: > - No known issues. > > OSPF: > - There are several known issues, but none of them is considered > critical. The list of known issues is available from: > > http://www.xorp.org/bugzilla/query.cgi > > BGP: > - If the RIB bug above (failure to delete an existing route) is > triggered by BGP, then the deletion failure error received by > BGP from the RIB is considered by BGP as a fatal error. This is > not a BGP problem, but a RIB problem that will be fixed in the > future. > > - The BGP configuration mandates that an IPv4 nexthop must be > supplied. Unfortunately it is necessary to provide an IPv4 nexthop > even for an IPv6 only peering. Even more unfortunately it is not > possible to force the IPv6 nexthop. > > - It is *essential* for an IPv6 peering that an IPv6 nexthop is > provided. Unfortunately the configuration does not enforce this > requrement. This will be fixed in the future. > > STATIC_ROUTES: > - No known issues. > > MLD/IGMP: > - If MLD/IGMP is started on Linux with a relatively large number of > interfaces (e.g., on the order of 10), then it may fail with the > following error: > > [ 2004/06/14 12:58:56 ERROR test_pim:16548 MFEA +666 > mfea_proto_comm.cc join_multicast_group ] Cannot join group > 224.0.0.2 on vif eth8: No buffer space available > > The solution is to increase the multicast group membership limit. > E.g., to increase the value from 20 (the default) to 200, run as a > root: > > echo 200 > /proc/sys/net/ipv4/igmp_max_memberships > > PIM-SM: > - If the kernel does not support PIM-SM, or if PIM-SM is not enabled > in the kernel, then running PIM-SM will fail with the following > error message: > > [ 2004/06/12 10:26:41 ERROR xorp_fea:444 MFEA +529 mfea_mrouter.cc > start_mrt ] setsockopt(MRT_INIT, 1) failed: Operation not supported > > - On Linux, if the unicast Reverse Path Forwarding information is > different from the multicast Reverse Path Forwarding information, > the Reverse Path Filtering should be disabled. E.g., as root: > > echo 0 > /proc/sys/net/ipv4/conf/all/rp_filter > > OR > > echo 0 > /proc/sys/net/ipv4/conf/eth0/rp_filter > echo 0 > /proc/sys/net/ipv4/conf/eth1/rp_filter > ... > > Otherwise, the router will ignore packets if they don't arrive on > the reverse-path interface. For more information about Reverse > Path Filtering see: > > http://www.tldp.org/HOWTO/Adv-Routing-HOWTO/lartc.kernel.rpf.html > > - Currently, the PIM-SM implementation does not support unnumbered > point-to-point links. Furthermore, even on numbered point-to-point > links the next-hop information in the routing entries should use > an IP address instead of an interface name. For example, > if there is a GRE tunnel on Linux and if we want to add a route > that uses that tunnel, we should use a command like: > > route add -net gw > > instead of: > > route add -net > > - If PIM-SM is configured to run over a large number of interfaces > (e.g., more than 31 VLANs), it might fail with the following error: > > [ 2006/07/04 11:56:23 ERROR xorp_fea:28353 MFEA +967 mfea_mrouter.cc > add_multicast_vif ] > setsockopt(MRT_ADD_VIF, vif eth0.4) failed: Too many open files in system > > The reason for that error is that by default majority of the UNIX > kernels cannot support more than 32 interfaces enabled for multicast > forwarding (one interface is always used as the internal PIM Register > virtual interface). > > The solution is to increase the MAXVIFS limit in the kernel > (typically defined in the "netinet/ip_mroute.h" (BSD) or the > "include/linux/mroute.h" (Linux) kernel file), and recompile the > kernel. It should be increased also in the corresponding system > header file as well: or . > After that XORP should be recompiled to take into account the > MAXVIFS increase. If modifying the system header files is not > acceptable, then the following should be added toward the end of > file "xorp/mrt/max_vifs.h" before recompiling XORP: > > #undef MAX_VIFS > #define MAX_VIFS 50 > > FIB2MRIB: > - No known issues. > > CLI: > - No known issues. > > SNMP: > - On some versions of Linux, there are some bugs in net-snmp > versions 5.0.8 and 5.0.9, which prevent dynamic loading from > working. See the following URL for links to the net-snmp > patches that solve the problems: > > http://www.xorp.org/snmp.html > > - Version 5.1 of net-snmp requires a simple modification, otherwise > XORP will fail to compile. > See the following URL for a link to the net-snmp patch that solves > the problems: > > http://www.xorp.org/snmp.html > > _______________________________________________ > Xorp-hackers mailing list > Xorp-hackers at icir.org > http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/xorp-hackers From mike at vyatta.com Thu Jul 20 09:32:35 2006 From: mike at vyatta.com (Michael Larson) Date: Thu, 20 Jul 2006 09:32:35 -0700 (PDT) Subject: [Xorp-hackers] potential race condition in run_command.cc? Message-ID: <28154885.121153413155347.JavaMail.root@mail.vyatta.com> Hello all, We've been occasionally seeing a hang of the rtrmgr on startup. In tracing the cause it looks as if there is a race condition in how the run_command.cc executes the popen2() method. Specifically, libxorp/run_command.cc (snippet from RunCommandBase::execute()): _pid = popen2(_command, _argument_list, _stdout_stream, _stderr_stream, redirect_stderr_to_stdout()); // XLOG_TRACE(true, "RunCommandBase::execute() Executing program: 6"); if (_stdout_stream == NULL) { XLOG_ERROR("Failed to execute command \"%s\"", final_command.c_str()); cleanup(); _exec_id.restore_saved_exec_id(error_msg); return (XORP_ERROR); } // Insert the new process to the map of processes XLOG_ASSERT(pid2command.find(_pid) == pid2command.end()); pid2command[_pid] = this; The std::map pid2command inserts the pid after executing popen2(). Inside of popen2() the process is forked and the _command is executed in the child process. When the process has completed it is expected to signal the parent process through the method child_handler() (in run_command.cc): child_handler(int signo) { XLOG_ASSERT(signo == SIGCHLD); // // XXX: Wait for any child process. // If we are executing any child process outside of the RunProcess // mechanism, then the waitpid() here may create a wait() blocking // problem for those processes. If this ever becomes an issue, then // we should try non-blocking waitpid() for each pid in the // pid2command map. // do { pid_t pid = 0; int wait_status = 0; map::iterator iter; pid = waitpid(-1, &wait_status, WUNTRACED | WNOHANG); debug_msg("pid=%d, wait status=%d\n", XORP_INT_CAST(pid), wait_status); if (pid <= 0) return; // XXX: no more child processes XLOG_ASSERT(pid > 0); popen2_mark_as_closed(pid, wait_status); iter = pid2command.find(pid); if (iter == pid2command.end()) { However, if the pid is not found in the map pid2command the child_handler() continues to loop and wait for the child process to complete (and signal its completion via waitpid). So, the problem is that on commands that immediately return/complete the child_handler() before the pid2command map is updated, the child_handler() will be called before the pid2command has the new pid and the command will never complete. Our version of xorp has more program statements in the template files therefore it seems as if it is a bit easier for us to stumble on this condition. Finally, I have a fix for this condition, but it is more of a patch than an elegant fix. I think the fix should be something along the lines where the pid2command needs to be updated before the exec() call is made in popen. Mike Vyatta From marat at vyatta.com Thu Jul 20 12:14:01 2006 From: marat at vyatta.com (Marat Nepomnyashy) Date: Thu, 20 Jul 2006 12:14:01 -0700 Subject: [Xorp-hackers] Bug 644 Message-ID: <033701c6ac30$a9c22b00$6502a8c0@CPQ16151965929> Hi Pavlin, I just attached a possible patch to bug 644 "Bad terminal backspace behavior for columns > 80" (http://www.xorp.org/bugzilla/show_bug.cgi?id=644) that I opened. Please consider the patch for inclusion. Thanks, Marat -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.ICSI.Berkeley.EDU/pipermail/xorp-hackers/attachments/20060720/c5ed93a8/attachment.html From pavlin at icir.org Thu Jul 20 15:50:21 2006 From: pavlin at icir.org (Pavlin Radoslavov) Date: Thu, 20 Jul 2006 15:50:21 -0700 Subject: [Xorp-hackers] potential race condition in run_command.cc? In-Reply-To: Message from Michael Larson of "Thu, 20 Jul 2006 09:32:35 PDT." <28154885.121153413155347.JavaMail.root@mail.vyatta.com> Message-ID: <200607202250.k6KMoL5m067054@possum.icir.org> > We've been occasionally seeing a hang of the rtrmgr on startup. In > tracing the cause it looks as if there is a race condition in how > the run_command.cc executes the popen2() method. > > Specifically, > > libxorp/run_command.cc (snippet from RunCommandBase::execute()): > > _pid = popen2(_command, _argument_list, _stdout_stream, _stderr_stream, > redirect_stderr_to_stdout()); > // XLOG_TRACE(true, "RunCommandBase::execute() Executing program: 6"); > if (_stdout_stream == NULL) { > XLOG_ERROR("Failed to execute command \"%s\"", final_command.c_str()); > cleanup(); > _exec_id.restore_saved_exec_id(error_msg); > return (XORP_ERROR); > } > // Insert the new process to the map of processes > XLOG_ASSERT(pid2command.find(_pid) == pid2command.end()); > pid2command[_pid] = this; > > > > The std::map pid2command inserts the pid after executing > popen2(). Inside of popen2() the process is forked and the > _command is executed in the child process. > > When the process has completed it is expected to signal the parent > process through the method child_handler() (in run_command.cc): > > child_handler(int signo) > { > XLOG_ASSERT(signo == SIGCHLD); > > // > // XXX: Wait for any child process. > // If we are executing any child process outside of the RunProcess > // mechanism, then the waitpid() here may create a wait() blocking > // problem for those processes. If this ever becomes an issue, then > // we should try non-blocking waitpid() for each pid in the > // pid2command map. > // > do { > pid_t pid = 0; > int wait_status = 0; > map::iterator iter; > > pid = waitpid(-1, &wait_status, WUNTRACED | WNOHANG); > debug_msg("pid=%d, wait status=%d\n", XORP_INT_CAST(pid), wait_status); > if (pid <= 0) > return; // XXX: no more child processes > > XLOG_ASSERT(pid > 0); > popen2_mark_as_closed(pid, wait_status); > iter = pid2command.find(pid); > if (iter == pid2command.end()) { > > > > However, if the pid is not found in the map pid2command the > child_handler() continues to loop and wait for the child process > to complete (and signal its completion via waitpid). > > So, the problem is that on commands that immediately > return/complete the child_handler() before the pid2command map is > updated, the child_handler() will be called before the pid2command > has the new pid and the command will never complete. What you describe in the above paragraph shouldn't happen, because we call block_child_signals() right before popen2(). Can you provide more specific details about the problem you are seeing. E.g., is the rtrmgr stuck forever in the "do..while" loop, or are there child processes which have terminated, but the waitpid() never detects them? If this is not the case, then are you saying that child_handler() is indeed called before popen2() completes despite the block_child_signals(). Note that child_handler() might be called when any child terminates, but then the "do..while" loop will take care of waitpid() for all child processes that have terminated. Hence, the long comment inside child_handler() is probably not true (i.e., the code should work even if there was a child process that wasn't executed by the RunCommand mechanism). Regards, Pavlin > Our version of xorp has more program statements in the template > files therefore it seems as if it is a bit easier for us to > stumble on this condition. Finally, I have a fix for this > condition, but it is more of a patch than an elegant fix. I think > the fix should be something along the lines where the pid2command > needs to be updated before the exec() call is made in popen. From mike at vyatta.com Thu Jul 20 16:24:21 2006 From: mike at vyatta.com (Michael Larson) Date: Thu, 20 Jul 2006 16:24:21 -0700 (PDT) Subject: [Xorp-hackers] potential race condition in run_command.cc? Message-ID: <669571.451153437861838.JavaMail.root@mail.vyatta.com> Pavlin, OK--here are the details that I can provide (to get additional details I'll need to created another image). It looks as if the call to block_child_signals() does not block signals from the child. Below is the sequence of events that I observed: 1) main thread calls popen2() 2) main thread spawns child process and child executes command 3) child_handler is called and waitpid returns the pid of the created process in step #2 above 4) child_handler fails to find pid in pid2command map after calling the non-blocking waitpid a second time 5) popen2() returns and execution continues within RunCommandBase::execute(), where the pid is now inserted into the pid2command 6) at this point the rtrmgr will hang. I have not isolated the exact location, but I know that the done callback is never called for this command when these sequence of steps occur. Now with that said you've given my an idea what is up here. In a version of xorp here we are linking to pthreads--not doing much with threads beyond linking to the library at this point, but I'm wondering if this is what is causing sigprocmask() to fail in block_child_signals()? Mike ----- Original Message ----- From: Pavlin Radoslavov To: Michael Larson Cc: xorp-hackers at xorp.org Sent: Thursday, July 20, 2006 3:50:21 PM GMT-0800 Subject: Re: [Xorp-hackers] potential race condition in run_command.cc? > We've been occasionally seeing a hang of the rtrmgr on startup. In > tracing the cause it looks as if there is a race condition in how > the run_command.cc executes the popen2() method. > > Specifically, > > libxorp/run_command.cc (snippet from RunCommandBase::execute()): > > _pid = popen2(_command, _argument_list, _stdout_stream, _stderr_stream, > redirect_stderr_to_stdout()); > // XLOG_TRACE(true, "RunCommandBase::execute() Executing program: 6"); > if (_stdout_stream == NULL) { > XLOG_ERROR("Failed to execute command \"%s\"", final_command.c_str()); > cleanup(); > _exec_id.restore_saved_exec_id(error_msg); > return (XORP_ERROR); > } > // Insert the new process to the map of processes > XLOG_ASSERT(pid2command.find(_pid) == pid2command.end()); > pid2command[_pid] = this; > > > > The std::map pid2command inserts the pid after executing > popen2(). Inside of popen2() the process is forked and the > _command is executed in the child process. > > When the process has completed it is expected to signal the parent > process through the method child_handler() (in run_command.cc): > > child_handler(int signo) > { > XLOG_ASSERT(signo == SIGCHLD); > > // > // XXX: Wait for any child process. > // If we are executing any child process outside of the RunProcess > // mechanism, then the waitpid() here may create a wait() blocking > // problem for those processes. If this ever becomes an issue, then > // we should try non-blocking waitpid() for each pid in the > // pid2command map. > // > do { > pid_t pid = 0; > int wait_status = 0; > map::iterator iter; > > pid = waitpid(-1, &wait_status, WUNTRACED | WNOHANG); > debug_msg("pid=%d, wait status=%d\n", XORP_INT_CAST(pid), wait_status); > if (pid <= 0) > return; // XXX: no more child processes > > XLOG_ASSERT(pid > 0); > popen2_mark_as_closed(pid, wait_status); > iter = pid2command.find(pid); > if (iter == pid2command.end()) { > > > > However, if the pid is not found in the map pid2command the > child_handler() continues to loop and wait for the child process > to complete (and signal its completion via waitpid). > > So, the problem is that on commands that immediately > return/complete the child_handler() before the pid2command map is > updated, the child_handler() will be called before the pid2command > has the new pid and the command will never complete. What you describe in the above paragraph shouldn't happen, because we call block_child_signals() right before popen2(). Can you provide more specific details about the problem you are seeing. E.g., is the rtrmgr stuck forever in the "do..while" loop, or are there child processes which have terminated, but the waitpid() never detects them? If this is not the case, then are you saying that child_handler() is indeed called before popen2() completes despite the block_child_signals(). Note that child_handler() might be called when any child terminates, but then the "do..while" loop will take care of waitpid() for all child processes that have terminated. Hence, the long comment inside child_handler() is probably not true (i.e., the code should work even if there was a child process that wasn't executed by the RunCommand mechanism). Regards, Pavlin > Our version of xorp has more program statements in the template > files therefore it seems as if it is a bit easier for us to > stumble on this condition. Finally, I have a fix for this > condition, but it is more of a patch than an elegant fix. I think > the fix should be something along the lines where the pid2command > needs to be updated before the exec() call is made in popen. _______________________________________________ Xorp-hackers mailing list Xorp-hackers at icir.org http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/xorp-hackers From pavlin at icir.org Thu Jul 20 17:40:13 2006 From: pavlin at icir.org (Pavlin Radoslavov) Date: Thu, 20 Jul 2006 17:40:13 -0700 Subject: [Xorp-hackers] potential race condition in run_command.cc? In-Reply-To: Message from Michael Larson of "Thu, 20 Jul 2006 16:24:21 PDT." <669571.451153437861838.JavaMail.root@mail.vyatta.com> Message-ID: <200607210040.k6L0eDxC071703@possum.icir.org> A non-text attachment was scrubbed... Name: run_command.cc.patch Type: text/x-c++ Size: 921 bytes Desc: not available Url : http://mailman.ICSI.Berkeley.EDU/pipermail/xorp-hackers/attachments/20060720/7f38debe/attachment.bin From pavlin at icir.org Thu Jul 20 17:46:03 2006 From: pavlin at icir.org (Pavlin Radoslavov) Date: Thu, 20 Jul 2006 17:46:03 -0700 Subject: [Xorp-hackers] potential race condition in run_command.cc? In-Reply-To: Message from Pavlin Radoslavov of "Thu, 20 Jul 2006 17:40:13 PDT." <200607210040.k6L0eDxC071703@possum.icir.org> Message-ID: <200607210046.k6L0k37N090262@possum.icir.org> > From your description it appears that the SIGCHLD blocking hasn't > taken effect, which seems odd. > > Can you apply a simple patch (included below) to print the parent's > signal mask before and after popen2(). You might want to print the signal mask inside the child_handler() as well. Make sure you print the pid of the process as well to make sure you are in the parent process :) Pavlin From pavlin at icir.org Thu Jul 20 18:10:05 2006 From: pavlin at icir.org (Pavlin Radoslavov) Date: Thu, 20 Jul 2006 18:10:05 -0700 Subject: [Xorp-hackers] Bug 644 In-Reply-To: Message from "Marat Nepomnyashy" of "Thu, 20 Jul 2006 12:14:01 PDT." <033701c6ac30$a9c22b00$6502a8c0@CPQ16151965929> Message-ID: <200607210110.k6L1A5oF091361@possum.icir.org> > I just attached a possible patch to bug 644 "Bad terminal > backspace behavior for columns > 80" > (http://www.xorp.org/bugzilla/show_bug.cgi?id=644) that I opened. > > Please consider the patch for inclusion. Nice catch! Patch committed to CVS: 1.51 +16 -10; commitid: 114344c025d17ea6; xorp/cli/cli_node_net.cc Thanks, Pavlin