From pavlin@icir.org  Wed Mar  9 23:17:43 2005
From: pavlin@icir.org (Pavlin Radoslavov)
Date: Wed, 09 Mar 2005 15:17:43 -0800
Subject: [Xorp-hackers] HEADS UP: XORP configuration syntax changes
Message-ID: <200503092317.j29NHhjn057544@possum.icir.org>

All,

As of today, there are few changes in the XORP configuration
syntax in XORP-current, and those changes will be in the forthcoming
XORP-1.1 release as well.

The changes are:

 * All "enabled: true/false" XORP configuration flags are now
   renamed to "disable: false/true".
   Furthermore, now all of them have default value of "false".
   In other words, if a configuration section doesn't contain
   the "disable" flag, it is implicitly enabled.
   [The reason for this renaming is for better consistency with the
   configuration of other router vendors.]

   Therefore, if you have "enabled: true" in your configuration, you
   should either change it to "disable: false" or you you can simply
   delete it (because it is same as the default).

   However, if you have "enabled: false", then you should change it
   it to "disable: true".
	
 * Change the syntax for configuring the IPv4/IPv6 forwarding
   (for consistency with the above renaming):
	
   OLD:
   fea {
       enable-unicast-forwarding4: true
       enable-unicast-forwarding6: true
   }
	
   NEW:
   fea {
       unicast-forwarding4 {
           disable: false
       }

       unicast-forwarding6 {
           disable: false
       }
   }

   OR JUST (if you want to enable the IPv4 and IPv6 forwarding):

   fea {
       unicast-forwarding4 {
       }

       unicast-forwarding6 {
       }
   }


Now the old syntax is marked as %deprecated in the rtrmgr templates,
so the appropriate error message is printed on startup if someone
tries to use a configuration file with the old syntax.


Please let us know if you find any problems because of the above
change.

Thanks,
The XORP Team

From Jim Greene <greene.jim@gmail.com>  Wed Mar 16 22:55:18 2005
From: Jim Greene <greene.jim@gmail.com> (Jim Greene)
Date: Wed, 16 Mar 2005 17:55:18 -0500
Subject: [Xorp-hackers] Running FEA on a separate host
Message-ID: <275c605305031614554f4c8eb4@mail.gmail.com>

Hi.

Is it possible to run FEA on a different host than the reset of the processes?

Is there a configuration file that tells the RIB how to communicate
with a non-local FEA?

Thanks,
Jim

From atanu@ICSI.Berkeley.EDU  Thu Mar 17 04:45:18 2005
From: atanu@ICSI.Berkeley.EDU (Atanu Ghosh)
Date: Wed, 16 Mar 2005 20:45:18 -0800
Subject: [Xorp-hackers] Running FEA on a separate host
In-Reply-To: Message from Jim Greene <greene.jim@gmail.com>
 of "Wed, 16 Mar 2005 17:55:18 EST." <275c605305031614554f4c8eb4@mail.gmail.com>
Message-ID: <75760.1111034718@tigger.icir.org>

    Jim> Hi.  Is it possible to run FEA on a different host than the
    Jim> reset of the processes?

Yes.

    Jim> Is there a configuration file that tells the RIB how to
    Jim> communicate with a non-local FEA?

No.

We have just checked in some changes that should allow non local processes:

Lets call the host on which you want to run the fea "feahost" and the
host on which the rest of XORP is running "controlhost".

By default the XRL code listens on the loopback address. All the
processes will need to be started with the environment variable
XORP_FINDER_CLIENT_ADDRESS set to appropriate interface address.

Start the router manager with the -i flag, the address of the
interface facing "feahost" and the -a flag to allow the feahost to
communicate with the router manager.

controlhost# XORP_FINDER_CLIENT_ADDRESS=controlhost
controlhost# export XORP_FINDER_CLIENT_ADDRESS
controlhost# ./xorp_rtrmgr -i controlhost -a feahost

When you run the router manager it must start a fea process. Replace the
xorp_fea binary with a script that starts the remote fea. The fea will
need to know the host where the router manager is running, this
information can be provided with the -F flag.

----------------------------------------
#!/bin/sh
# 
ssh feahost XORP_FINDER_CLIENT_ADDRESS=feahost /usr/local/xorp/fea/xorp_fea -F controlhost
----------------------------------------

If you want you can start the fea manually, but you will need to start
the router manager within 10 seconds. Its important that the router
manager starts a binary that it believes is the fea. If you don't use
the script above replace xorp_fea with a program that doesn't exit.

I have used "main() { pause();}".

In my tests I started the remote fea manually.

	Atanu.

From edrt@citiz.net  Tue Mar 22 14:02:53 2005
From: edrt@citiz.net (edrt)
Date: Tue, 22 Mar 2005 22:02:53 +0800
Subject: [Xorp-hackers] XORP IPC mechanism is stressed by data plane traffic
Message-ID: <200503221354.j2MDslZo072629@wyvern.icir.org>

Hi XORP developers,


I have a problem might be related to XORP architecture. 

I ported multicast part of XORP platform to my target board. 
Recently I do some stress testing of the ported system and find
that when feeding multicast packets overloading the board's 
processing power, the XORP components' XRL communication are
effected which result at least the following problems I observed
until now:

 * configuring XORP components on the fly through call_xrl always
   failed. (XrlPFSTCPSender::die with errno EWOULDBLOCK/ENOBUFS...) 

 * because I'm implementing configure XORP components through
   call_xrl, when start XORP system in a high volume multicast
   traffic enviroment, all components starting after MFEA are 
   failed. (because MFEA put the vif into multicast promisc mode)

I'm not yet tried to stress the system using unicast traffic which
might be a different view, but I think the general reason lies
in the control plane IPC traffic is effected by large volume data
plane traffic.

So, if I only want to run all XORP components in a single physical
node, how can I avoid the above problem (besides optimizing device
driver):

  * Can the network stack (FreeBSD-like) differentiate internal IPC
    traffic with external data traffic and implement internal QOS-like
    resource reserving and procssing?

  * Or, have anybody ever successfully tried to integrate XORP
    components using XrlPFInProcListener+XrlPFInProcSender?

  * Any other suggestions to solve/alleviate the problem?


I'll be very grateful if anyone could throw light on this issue. 


Thanks
Eddy


From Timothy.Griffin@cl.cam.ac.uk  Tue Mar 22 14:20:18 2005
From: Timothy.Griffin@cl.cam.ac.uk (Timothy Griffin)
Date: Tue, 22 Mar 2005 14:20:18 +0000
Subject: [Xorp-hackers] Configuration managment ....
Message-ID: <E1DDkF0-00062Z-00@mta1.cl.cam.ac.uk>

hi, 

perhaps this idea has been kicked around before: 
how about using an SQL database (such as MySQL) to 
manage all xorp configuration data? 

if we run with this idea a bit, then we can 

   --- forget about implementing our own cli, just use sql
          --- this might be immediately appealing for config commands 
              like "bgp neighbor ...", since we are just populating 
              a "database".  But what about things like "show ip bgp ..."? 
              well, i'm perverse enough to think of this as populating 
              a database table (or tables) that can then be further queried 
              using SQL .... 
   --- forget about implementing configuration "transaction management", 
       just use the technology provided by the database system. 
   --- forget about implementing configuration "access control", 
       just use the technology provided by the database system. 
   --- forget about nice user interfaces, just ride the database technology 
       curve (lots of open source front ends for MySQL out there..., Web, XML, ...) 

in short, ride the database technology curve!  MySQL 5.0, which is now under
development, will have stored procedures as well as triggers.  this would make it 
much easier to implement an XRL wrapper (config changes come in as sql, then triggers 
generate xrl messages to other processes...). 

The database could be used to store logs as well. 

We can even imagine a single database (perhaps with backup mirrors) holding the entire
configuration of a network (wow, the config database actually being 
the "database of record" --- what a concept!) 

comments? 

cheers,
tim 
http://www.cl.cam.ac.uk/~tgg22 


From adam@hiddennet.net  Tue Mar 22 14:29:38 2005
From: adam@hiddennet.net (Adam Greenhalgh)
Date: Tue, 22 Mar 2005 14:29:38 +0000
Subject: [Xorp-hackers] Configuration managment ....
In-Reply-To: <E1DDkF0-00062Z-00@mta1.cl.cam.ac.uk>
References: <E1DDkF0-00062Z-00@mta1.cl.cam.ac.uk>
Message-ID: <1111501778.8695.192.camel@cellini.cs.ucl.ac.uk>

Neat idea. 

The only question I have is where do you want the database to be
running ? Is a database too heavy to be run on a low end device ?

However I always thought that the cli interface to xorp was replaceable
with something else so it should be fairly *cough* easy to replace it.

Adam

On Tue, 2005-03-22 at 14:20 +0000, Timothy Griffin wrote:
> hi, 
> 
> perhaps this idea has been kicked around before: 
> how about using an SQL database (such as MySQL) to 
> manage all xorp configuration data? 
> 
> if we run with this idea a bit, then we can 
> 
>    --- forget about implementing our own cli, just use sql
>           --- this might be immediately appealing for config commands 
>               like "bgp neighbor ...", since we are just populating 
>               a "database".  But what about things like "show ip bgp ..."? 
>               well, i'm perverse enough to think of this as populating 
>               a database table (or tables) that can then be further queried 
>               using SQL .... 
>    --- forget about implementing configuration "transaction management", 
>        just use the technology provided by the database system. 
>    --- forget about implementing configuration "access control", 
>        just use the technology provided by the database system. 
>    --- forget about nice user interfaces, just ride the database technology 
>        curve (lots of open source front ends for MySQL out there..., Web, XML, ...) 
> 
> in short, ride the database technology curve!  MySQL 5.0, which is now under
> development, will have stored procedures as well as triggers.  this would make it 
> much easier to implement an XRL wrapper (config changes come in as sql, then triggers 
> generate xrl messages to other processes...). 
> 
> The database could be used to store logs as well. 
> 
> We can even imagine a single database (perhaps with backup mirrors) holding the entire
> configuration of a network (wow, the config database actually being 
> the "database of record" --- what a concept!) 
> 
> comments? 
> 
> cheers,
> tim 
> http://www.cl.cam.ac.uk/~tgg22 
> 
> _______________________________________________
> Xorp-hackers mailing list
> Xorp-hackers@icir.org
> http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/xorp-hackers


From Timothy.Griffin@cl.cam.ac.uk  Tue Mar 22 14:43:18 2005
From: Timothy.Griffin@cl.cam.ac.uk (Timothy Griffin)
Date: Tue, 22 Mar 2005 14:43:18 +0000
Subject: [Xorp-hackers] Configuration managment ....
In-Reply-To: Your message of "Tue, 22 Mar 2005 14:29:38 GMT."
 <1111501778.8695.192.camel@cellini.cs.ucl.ac.uk>
Message-ID: <E1DDkbH-0006SS-00@mta1.cl.cam.ac.uk>


Adam Greenhalgh <adam@hiddennet.net> wrote:

> The only question I have is where do you want the database to be
> running ? Is a database too heavy to be run on a low end device ?

it would be up to the network operator to chose where to run the database 
process (just as is the case with most [all?] xorp processes...) 

mysql runs very well on low-end boxes, including laptops. 


From javier@cozybit.com  Tue Mar 22 19:21:56 2005
From: javier@cozybit.com (Javier Cardona)
Date: Tue, 22 Mar 2005 11:21:56 -0800
Subject: [Xorp-hackers] Configuration managment ....
In-Reply-To: <E1DDkF0-00062Z-00@mta1.cl.cam.ac.uk>
References: <E1DDkF0-00062Z-00@mta1.cl.cam.ac.uk>
Message-ID: <200503221121.56259.javier@cozybit.com>

Hi,

On Tuesday 22 March 2005 06:20 am, Timothy Griffin wrote:
> hi,
>
> perhaps this idea has been kicked around before:
> how about using an SQL database (such as MySQL) to
> manage all xorp configuration data?
> (...)
> comments?

It's a good idea: the proof is that a similar architecture is being used 
elsewhere with good success:  the WindManage suite, from Wind River 
( http://www.windriver.com/products/device_technologies/management/ ). 

Similar to what you propose, that solution uses a single database to store all 
the configuration data (termed the backplane) which is accessible through 
different interfaces (CLI, Web, XML/SOAP and SNMP).  The database is 
optimized for embedded use, linked to the different interface processes (i.e. 
no standalone db server process required), and, of course, can only run on 
the managed device.  At the time I integrated Net-SNMP with XORP I considered 
implementing a similar solution, but I wasn't sure of what was covered by 
different patent applications Wind River had filed on this.  I think the most 
relevant is 
http://appft1.uspto.gov/netacgi/nph-Parser?TERM1=20030115575+&Sect1=PTO1&Sect2=HITOFF&d=PG01&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.html&r=0&f=S&l=50
which it has been rejected recently.  (Deciphering the claims on that patent 
is beyond my capabilities, given that Legal English is not my mother tongue.)

I just thought you might be interested on that.

Cheers,

Javier

-- 
Javier Cardona
cozybit Inc.
t 415 664 1088
f 415 664 1010
c 415 630 0627
w http://www.cozybit.com
e javier@cozybit.com

From bms@spc.org  Tue Mar 22 22:00:49 2005
From: bms@spc.org (Bruce M Simpson)
Date: Tue, 22 Mar 2005 14:00:49 -0800
Subject: [Xorp-hackers] XORP IPC mechanism is stressed by data plane traffic
In-Reply-To: <200503221354.j2MDslZo072629@wyvern.icir.org>
References: <200503221354.j2MDslZo072629@wyvern.icir.org>
Message-ID: <20050322220049.GM747@empiric.icir.org>

On Tue, Mar 22, 2005 at 10:02:53PM +0800, edrt wrote:
>   * Can the network stack (FreeBSD-like) differentiate internal IPC
>     traffic with external data traffic and implement internal QOS-like
>     resource reserving and procssing?

Under FreeBSD you could try the following:-

 1) run ALTQ on the loopback interface and prioritize control plane traffic
    passing through the IP stack at the expense of data plane traffic;

 2) use a different underlying IPC mechanism for the control plane traffic,
    e.g. POSIX message queues.

That's all I can think off of the top of my head right now...

BMS

From bms@spc.org  Tue Mar 22 22:04:10 2005
From: bms@spc.org (Bruce M Simpson)
Date: Tue, 22 Mar 2005 14:04:10 -0800
Subject: [Xorp-hackers] Configuration managment ....
In-Reply-To: <E1DDkbH-0006SS-00@mta1.cl.cam.ac.uk>
References: <1111501778.8695.192.camel@cellini.cs.ucl.ac.uk> <E1DDkbH-0006SS-00@mta1.cl.cam.ac.uk>
Message-ID: <20050322220410.GN747@empiric.icir.org>

On Tue, Mar 22, 2005 at 02:43:18PM +0000, Timothy Griffin wrote:
> > The only question I have is where do you want the database to be
> > running ? Is a database too heavy to be run on a low end device ?
> 
> it would be up to the network operator to chose where to run the database 
> process (just as is the case with most [all?] xorp processes...) 
> 
> mysql runs very well on low-end boxes, including laptops. 

Why aren't people doing this more widely? I would like to see firm figures
for resource consumption by MySQL on embedded targets.

Has anyone tried running MySQL on a machine without an MMU, for example?

Surely if one wishes to go the SQL route, SQLite would be more appropriate
for embedded targets?

The code footprint given for SQLite on the web page is a typical ~250KB
on x86 for gcc, this doesn't say anything about the run-time memory
footprint, and I would imagine with better compilers it could get smaller.

BMS

From javier@cozybit.com  Tue Mar 22 23:11:03 2005
From: javier@cozybit.com (Javier Cardona)
Date: Tue, 22 Mar 2005 15:11:03 -0800
Subject: [Xorp-hackers] Configuration managment ....
In-Reply-To: <20050322194802.GA5464@pix.net>
References: <E1DDkF0-00062Z-00@mta1.cl.cam.ac.uk> <200503221121.56259.javier@cozybit.com> <20050322194802.GA5464@pix.net>
Message-ID: <200503221511.03351.javier@cozybit.com>

On Tuesday 22 March 2005 11:48 am, Kurt J. Lidl wrote:
> On Tue, Mar 22, 2005 at 11:21:56AM -0800, Javier Cardona wrote:
> > On Tuesday 22 March 2005 06:20 am, Timothy Griffin wrote:
> > > perhaps this idea has been kicked around before:
> > > how about using an SQL database (such as MySQL) to
> > > manage all xorp configuration data?
> > > (...)
> > > comments?
> >
> > It's a good idea: the proof is that a similar architecture is being used
> > elsewhere with good success:  the WindManage suite, from Wind River
> > ( http://www.windriver.com/products/device_technologies/management/ ).
>
> No it isn't.
>
> Just because someone does something and files for a patent on it does not
> mean its a good idea.  Besides, its not a new idea -- varients have
> been done in the past.  For example, I think it was the Netgear(?) backbone
> routers that basically did everything through SNMP.  All the world's
> a SNMP query, even the CLI.

I did not claim that it was new, nor that it was good just because a patent 
was filed on it.  Quite the contrary, my opinion is based on architectural 
cleanness and the fact that others seem to be taking similar routes.

> (...)
> 
> Except that you couldn't get work done with it.  When the network
> is mostly busted, and you're trying to fix it by manipulating the
> running router configuration, the less you have to type, the more
> likely it is you're going to be able to fix it.  When you have to
> type long, long command lines to get anything done, it slows down
> the person trying to fix things.  A lot.

Using the proposed architecture (one single repository of configuration data + 
different methods to access it) it is easy to add new interfaces.  There 
can be one just for "mostly busted networks" designed to minimize keystrokes.  
But if someone enjoys typing "long, long command lines to get anything done", 
that can be arranged as well ;)

Cheers,

-- 
Javier Cardona
cozybit Inc.
t 415 664 1088
f 415 664 1010
c 415 630 0627
w http://www.cozybit.com
e javier@cozybit.com

From lidl@pix.net  Tue Mar 22 19:48:03 2005
From: lidl@pix.net (Kurt J. Lidl)
Date: Tue, 22 Mar 2005 14:48:03 -0500
Subject: [Xorp-hackers] Configuration managment ....
In-Reply-To: <200503221121.56259.javier@cozybit.com>
References: <E1DDkF0-00062Z-00@mta1.cl.cam.ac.uk> <200503221121.56259.javier@cozybit.com>
Message-ID: <20050322194802.GA5464@pix.net>

On Tue, Mar 22, 2005 at 11:21:56AM -0800, Javier Cardona wrote:
> On Tuesday 22 March 2005 06:20 am, Timothy Griffin wrote:
> > perhaps this idea has been kicked around before:
> > how about using an SQL database (such as MySQL) to
> > manage all xorp configuration data?
> > (...)
> > comments?
> 
> It's a good idea: the proof is that a similar architecture is being used 
> elsewhere with good success:  the WindManage suite, from Wind River 
> ( http://www.windriver.com/products/device_technologies/management/ ). 

No it isn't.

Just because someone does something and files for a patent on it does not
mean its a good idea.  Besides, its not a new idea -- varients have
been done in the past.  For example, I think it was the Netgear(?) backbone
routers that basically did everything through SNMP.  All the world's
a SNMP query, even the CLI.

It sucked.  Sure, architectually clean, and what engineering student
wouldn't have been impressed with the single data storage, and only
a single access method (SNMP GET) to get data out.  Just paste on
some CLI glue, and you're done.

Except that you couldn't get work done with it.  When the network
is mostly busted, and you're trying to fix it by manipulating the
running router configuration, the less you have to type, the more
likely it is you're going to be able to fix it.  When you have to
type long, long command lines to get anything done, it slows down
the person trying to fix things.  A lot.

Embedding something like a lightweight SQL database might work, but
does it really help?  I can imagine that the next thing someone
asks for is "how about we retrieve the router config from a
remote database?".  There is a serious chicken and egg problem here,
no working network until it is configured, and no configuration
until the network is up.

Sure, then they'll suggest a "bootstrap configuration" just to be
used long enough to get the "real configuration" into the router.
Then you're back at two different configurations, both of which
must be tested to work.  (And testing the "bootstrap configuration"
would necessitate taking the router offline, too!)  I would claim
it is insanity to separate the primary configuration storage from
the physical machine that is the router.  (Or the master of the
routing cluster, however you want to define that.)

You'd be better off making a nice, high level (smells like a job
for XML to me) configuration exporter, and make it easy to dump
that to a central configuration repository.  Tracking configuration
changes like this would actually be valuable to people with non-trivial
networks.

-Kurt

From pavlin@icir.org  Wed Mar 23 04:01:34 2005
From: pavlin@icir.org (Pavlin Radoslavov)
Date: Tue, 22 Mar 2005 20:01:34 -0800
Subject: [Xorp-hackers] XORP IPC mechanism is stressed by data plane traffic
In-Reply-To: Message from "edrt"<edrt@citiz.net>
 of "Tue, 22 Mar 2005 22:02:53 +0800." <200503221354.j2MDslZo072629@wyvern.icir.org>
Message-ID: <200503230401.j2N41Y4k034000@possum.icir.org>

> I have a problem might be related to XORP architecture. 
> 
> I ported multicast part of XORP platform to my target board. 
> Recently I do some stress testing of the ported system and find
> that when feeding multicast packets overloading the board's 
> processing power, the XORP components' XRL communication are
> effected which result at least the following problems I observed
> until now:
> 
>  * configuring XORP components on the fly through call_xrl always
>    failed. (XrlPFSTCPSender::die with errno EWOULDBLOCK/ENOBUFS...) 

Have you tried to use the rtrmgr for configuration purpose?
It uses in-process XRL generation instead of the call_xrl binary.
I don't know whether the in-process XRL generation will make any
difference, but if it does this may give you some clues about how to
solve the problem.

>  * because I'm implementing configure XORP components through
>    call_xrl, when start XORP system in a high volume multicast
>    traffic enviroment, all components starting after MFEA are 
>    failed. (because MFEA put the vif into multicast promisc mode)

By looking into the source code, it seems that the multicast
interfaces are put in promisc mode during startup of the MFEA.
Strictly speaking, this should happen only after a multicast routing
protocol registers with the MFEA.
Hence, one possible solution could be to refactor the MFEA
operations so the multicast routing related operations by the MFEA
are enabled only after the first multicast routing protocol
registers with the MFEA.

> I'm not yet tried to stress the system using unicast traffic which
> might be a different view, but I think the general reason lies
> in the control plane IPC traffic is effected by large volume data
> plane traffic.

Can you test whether the unicast traffic also triggers the problem?
If yes, then the above MFEA refactoring won't help.

> So, if I only want to run all XORP components in a single physical
> node, how can I avoid the above problem (besides optimizing device
> driver):
> 
>   * Can the network stack (FreeBSD-like) differentiate internal IPC
>     traffic with external data traffic and implement internal QOS-like
>     resource reserving and procssing?
> 
>   * Or, have anybody ever successfully tried to integrate XORP
>     components using XrlPFInProcListener+XrlPFInProcSender?

I think that to switch on the InProc XRLs, you need to set in your
environment the following variable:
setenv XORP_PF i

However, then you need a mechanism to initiate the in-process XRLs
from within your router binary (obviously, you cannot use call_xrl
for in-process communication :)

Regards,
Pavlin

> 
>   * Any other suggestions to solve/alleviate the problem?
> 
> 
> I'll be very grateful if anyone could throw light on this issue. 
> 
> 
> Thanks
> Eddy
> 
> 
> _______________________________________________
> Xorp-hackers mailing list
> Xorp-hackers@icir.org
> http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/xorp-hackers


From edrt@citiz.net  Wed Mar 23 15:08:10 2005
From: edrt@citiz.net (edrt)
Date: Wed, 23 Mar 2005 23:08:10 +0800
Subject: [Xorp-hackers] XORP IPC mechanism is stressed by data plane traffic
Message-ID: <200503231500.j2NExrZV091019@wyvern.icir.org>

>On Tue, Mar 22, 2005 at 10:02:53PM +0800, edrt wrote:
>>   * Can the network stack (FreeBSD-like) differentiate internal IPC
>>     traffic with external data traffic and implement internal QOS-like
>>     resource reserving and procssing?
>
>Under FreeBSD you could try the following:-
>
> 1) run ALTQ on the loopback interface and prioritize control plane traffic
>    passing through the IP stack at the expense of data plane traffic;
>
> 2) use a different underlying IPC mechanism for the control plane traffic,
>    e.g. POSIX message queues.
>
>That's all I can think off of the top of my head right now...
>
>BMS
>

Hi Bruce,

Thanks for your helpful advices. Cause I'm new to these areas, 
I might should make some researches before I could drop more 
discussions on these solutions.

Currently I find a cheap way of circumvent the problem, see my
mail reply to Pavlin.


Thanks
Eddy


From edrt@citiz.net  Wed Mar 23 15:10:10 2005
From: edrt@citiz.net (edrt)
Date: Wed, 23 Mar 2005 23:10:10 +0800
Subject: [Xorp-hackers] XORP IPC mechanism is stressed by data plane traffic
Message-ID: <200503231501.j2NF1hkC091059@wyvern.icir.org>

>> I have a problem might be related to XORP architecture. 
>> 
>> I ported multicast part of XORP platform to my target board. 
>> Recently I do some stress testing of the ported system and find
>> that when feeding multicast packets overloading the board's 
>> processing power, the XORP components' XRL communication are
>> effected which result at least the following problems I observed
>> until now:
>> 
>>  * configuring XORP components on the fly through call_xrl always
>>    failed. (XrlPFSTCPSender::die with errno EWOULDBLOCK/ENOBUFS...) 
>
>Have you tried to use the rtrmgr for configuration purpose?
>It uses in-process XRL generation instead of the call_xrl binary.
>I don't know whether the in-process XRL generation will make any
>difference, but if it does this may give you some clues about how to
>solve the problem.
>

Actually, what I'm using is a modified version of call_xrl function, 
which called by CLI as a library function to in-process generate XRL.


>>  * because I'm implementing configure XORP components through
>>    call_xrl, when start XORP system in a high volume multicast
>>    traffic enviroment, all components starting after MFEA are 
>>    failed. (because MFEA put the vif into multicast promisc mode)
>
>By looking into the source code, it seems that the multicast
>interfaces are put in promisc mode during startup of the MFEA.
>Strictly speaking, this should happen only after a multicast routing
>protocol registers with the MFEA.
>Hence, one possible solution could be to refactor the MFEA
>operations so the multicast routing related operations by the MFEA
>are enabled only after the first multicast routing protocol
>registers with the MFEA.
>

I have verified that by extract MfeaNode::add_multicast_vif from
MfeaVif::start, and only add_multicast_vif after PIM registers on
MFEA vif AND MFEA vif is up. 

After that modification, IGMP startup and configured successfully,
but PIM configuration actions after pim/0.1/start_vif are failed 
(because the data traffic became to be pumpped into stack)


>> I'm not yet tried to stress the system using unicast traffic which
>> might be a different view, but I think the general reason lies
>> in the control plane IPC traffic is effected by large volume data
>> plane traffic.
>
>Can you test whether the unicast traffic also triggers the problem?
>If yes, then the above MFEA refactoring won't help.
>
>> So, if I only want to run all XORP components in a single physical
>> node, how can I avoid the above problem (besides optimizing device
>> driver):
>> 
>>   * Can the network stack (FreeBSD-like) differentiate internal IPC
>>     traffic with external data traffic and implement internal QOS-like
>>     resource reserving and procssing?
>> 
>>   * Or, have anybody ever successfully tried to integrate XORP
>>     components using XrlPFInProcListener+XrlPFInProcSender?
>
>I think that to switch on the InProc XRLs, you need to set in your
>environment the following variable:
>setenv XORP_PF i
>
>However, then you need a mechanism to initiate the in-process XRLs
>from within your router binary (obviously, you cannot use call_xrl
>for in-process communication :)
>

Yes. Thanks for the reminder. The XrlPFInProc solution might be the last resort :)

Today, I tried some tuning, and they seem to fix most of the problems

 1) Increase network mbuf, this suppress most of the ENOBUFS.

 2) Add EWOULDBLOCK to is_pseudo_error, then call_xrl usually successfully
   returns after a second/third... try of read. But call_xrl consumes 
   considerable time.

 3) Increase XORP tasks' priority above the priority of the data forwarding
   task (i.e. the task doing most of the IP stack processing), this makes
   call_xrl successfully returns almost immediately.

I doubt these tuning might have high possibilities of causing other problems.
But if only take into consideration of the original problem I encoutered, 
they make all the XORP components works normally even with overloading
external multicast traffic.

Could anyone comment on the possible side effects of these tuning? (Because
they are cheap solution, and I might to use them) What I could think out 
right now:

 #1 seems harmless, but only decrease the available free system memory

 #2 I have no idea what problem it will cause, anybody get ideas?

 #3 this may starve data forwarding task if protocol tasks consume
    too much time in their processing. Besides checking each protocol
    tasks' implementation any other method to solve this problem? 
    (EventLoop::run might help to detect some of the problem)


Thanks
Eddy


BTW
I encounter the following problem during stress testing, because I'm 
using ported XORP source code, and they are NOT based on the latest CVS HEAD.
I could not ensure you can reproduce the problem, but just in case anybody 
have interest to track it...

 * In a high volume data traffic enviroment
 * without the tuning above
 * try to send XORP component XRL commands through call_xrl
 * call_xrl's XrlPFSTCPSender dies with EWOULDBLOCK/ENOBUFS...

task calling call_xrl core dumps, and the stack looks something like

	......(some of the stack information stripped)......

	b982fc xorp_pimsm4_show+150: xorp_pimsm4_show_send(basic_string<char, string_cha
	r_traits<char>, __default_alloc_template<true, 0> > &, basic_string<char, string
	_char_traits<char>, __default_alloc_template<true, 0> > &, basic_string<char, st
	ring_char_traits<char>, __default_alloc_templa?he? ()
	b97f68 xorp_pimsm4_show_send(basic_string<char, string_char_traits<char>, __defa
	ult_alloc_template<true, 0> > &, basic_string<char, string_char_traits<char>, __
	default_alloc_template<true, 0> > &, basic_string<char, string_char_traits<char>
	, __default_alloc_templa?|e?+2bc: xorp_pimsm4_call_xrl(Xrl const &, XrlError
	*, XrlArgs *, unsigned long) ()
	b9737c xorp_pimsm4_call_xrl(Xrl const &, XrlError *, XrlArgs *, unsigned long)+8
	4 : call_xrl ()
	d4aa68 call_xrl       +e4 : call_xrl_main(EventLoop &, XrlDirRouter &, char cons
	t *, XrlError *, XrlArgs *) (165b288, 165b330, f7b458, 165b478, 165b450)
	d4a454 call_xrl_main(EventLoop &, XrlDirRouter &, char const *, XrlError *, XrlA
	rgs *)+25c: EventLoop::run(void) ()
	d60e4c EventLoop::run(void)+e0 : SelectorList::select(TimeVal *) ()
	d94b9c SelectorList::select(TimeVal *)+318: XorpMemberCallback2B0<void, AsyncFil
	eReader, int, SelectorMask>::dispatch(int, SelectorMask) ()
	daa7c4 XorpMemberCallback2B0<void, AsyncFileReader, int, SelectorMask>::dispatch
	(int, SelectorMask)+7c : AsyncFileReader::read(int, SelectorMask) ()
	da7c78 AsyncFileReader::read(int, SelectorMask)+104: AsyncFileReader::complete_t
	ransfer(int, int) ()
	da7f6c AsyncFileReader::complete_transfer(int, int)+2d4: XorpMemberCallback4B0<v
	oid, XrlPFSTCPSender, AsyncFileOperator::Event, unsigned char const *, unsigned
	int, unsigned int>::dispatch(AsyncFileOperator::Event, unsigned char const *, un
	signed int, unsigned int) ()
	d462c8 XorpMemberCallback4B0<void, XrlPFSTCPSender, AsyncFileOperator::Event, un
	signed char const *, unsigned int, unsigned int>::dispatch(AsyncFileOperator::Ev
	ent, unsigned char const *, unsigned int, unsigned int)+84 : XrlPFSTCPSender::re
	cv_data(AsyncFileOperator::Event, unsigned char const *, unsigned int, unsigned
	int) ()
	d3bbc4 XrlPFSTCPSender::recv_data(AsyncFileOperator::Event, unsigned char const
	*, unsigned int, unsigned int)+128: XrlPFSTCPSender::die(char const *) ()
	d3a9c8 XrlPFSTCPSender::die(char const *)+244: XorpMemberCallback2B2<void, XrlDi
	rRouter, XrlError const &, XrlArgs *, XrlPFSender *, ref_ptr<XorpCallback2<void,
	 XrlError const &, XrlArgs *> > >::dispatch(XrlError const &, XrlArgs *) ()
	d31718 XorpMemberCallback2B2<void, XrlDirRouter, XrlError const &, XrlArgs *, Xr
	lPFSender *, ref_ptr<XorpCallback2<void, XrlError const &, XrlArgs *> > >::dispa
	tch(XrlError const &, XrlArgs *)+ac : XrlDirRouter::send_callback(XrlError const
	 &, XrlArgs *, XrlPFSender *, ref_ptr<XorpCallback2<void, XrlError const &, XrlA
	rgs *> >) ()
	d2bc6c XrlDirRouter::send_callback(XrlError const &, XrlArgs *, XrlPFSender *, r
	ef_ptr<XorpCallback2<void, XrlError const &, XrlArgs *> >)+84 : XorpFunctionCall
	back2B6<void, XrlError const &, XrlArgs *, Xrl *, bool *, bool *, bool *, XrlErr
	or *, XrlArgs *>::dispatch(XrlError const &, XrlArgs *) ()
	d4e2a4 XorpFunctionCallback2B6<void, XrlError const &, XrlArgs *, Xrl *, bool *,
	 bool *, bool *, XrlError *, XrlArgs *>::dispatch(XrlError const &, XrlArgs *)+4
	0 : response_handler(XrlError const &, XrlArgs *, Xrl *, bool *, bool *, bool *,
	 XrlError *, XrlArgs *) ()
	d4a1cc response_handler(XrlError const &, XrlArgs *, Xrl *, bool *, bool *, bool
	 *, XrlError *, XrlArgs *)+ac : d4c754 ()
	d4c774 XorpFunctionCallback2B6<void, XrlError const &, XrlArgs *, Xrl *, bool *,
	 bool *, bool *, XrlError *, XrlArgs *>::XorpFunctionCallback2B6(char const *, i
	nt, void (*)(XrlError const &, XrlArgs *, Xrl *, bool *, bool *, bool *, XrlErro
	r *, XrlArgs *), Xrl *, ??e0?f4c: d4b248 ()
	d4b3b0 call_xrl       +a2c: d4b094 ()
	d4b10c call_xrl       +788: d4be78 ()
	d4bea8 XorpFunctionCallback2B6<void, XrlError const &, XrlArgs *, Xrl *, bool *,
	 bool *, bool *, XrlError *, XrlArgs *>::XorpFunctionCallback2B6(char const *, i
	nt, void (*)(XrlError const &, XrlArgs *, Xrl *, bool *, bool *, bool *, XrlErro
	r *, XrlArgs *), Xrl *, ??e/?680: d4bf00 (165b450, 8)
	d4bf9c XorpFunctionCallback2B6<void, XrlError const &, XrlArgs *, Xrl *, bool *,
	 bool *, bool *, XrlError *, XrlArgs *>::XorpFunctionCallback2B6(char const *, i
	nt, void (*)(XrlError const &, XrlArgs *, Xrl *, bool *, bool *, bool *, XrlErro
	r *, XrlArgs *), Xrl *, ?$e/?774: d4c0ec ()
	d4c124 XorpFunctionCallback2B6<void, XrlError const &, XrlArgs *, Xrl *, bool *,
	 bool *, bool *, XrlError *, XrlArgs *>::XorpFunctionCallback2B6(char const *, i
	nt, void (*)(XrlError const &, XrlArgs *, Xrl *, bool *, bool *, bool *, XrlErro
	r *, XrlArgs *), Xrl *, ??e/`+8fc: d4c7ec (1458db8, 8)
	d4c888 XorpFunctionCallback2B6<void, XrlError const &, XrlArgs *, Xrl *, bool *,
	 bool *, bool *, XrlError *, XrlArgs *>::XorpFunctionCallback2B6(char const *, i
	nt, void (*)(XrlError const &, XrlArgs *, Xrl *, bool *, bool *, bool *, XrlErro
	r *, XrlArgs *), Xrl *, ?Xe.P+1060: XrlAtom::copy(XrlAtom const &) (1458db8, 8
	)
	d13b58 XrlAtom::copy(XrlAtom const &)+28 : d1b8c4 ()
	d1b8e4 Mac::~Mac(void)+448: d1b92c ()
	d1b95c Mac::~Mac(void)+4c0: d1cc38 ()
	value = 0 = 0x0
	->	

If I comment out the below code in XrlPFSTCPSender::die, the problem disappears,
response_handler returns nomally.

    // Detach all callbacks before attempting to invoke them.
    // Otherwise destructor may get called when we're still going through
    // the lists of callbacks.
    list<ref_ptr<RequestState> > tmp;
    tmp.splice(tmp.begin(), _requests_waiting);
    tmp.splice(tmp.begin(), _requests_sent);

    _active_requests = 0;
    _active_bytes    = 0;

    // Make local copy of uid in case "this" is deleted in callback
    uint32_t uid = _uid;

    while (tmp.empty() == false) {
	if (sender_list.valid_instance(uid) == false)
	    break;
	ref_ptr<RequestState>& rp = tmp.front();
	if (rp->cb().is_empty() == false)
	    rp->cb()->dispatch(XrlError::SEND_FAILED(), 0);
	tmp.pop_front();
    }

END.


From pavlin@icir.org  Thu Mar 24 09:06:28 2005
From: pavlin@icir.org (Pavlin Radoslavov)
Date: Thu, 24 Mar 2005 01:06:28 -0800
Subject: [Xorp-hackers] XORP IPC mechanism is stressed by data plane traffic
In-Reply-To: Message from "edrt"<edrt@citiz.net>
 of "Wed, 23 Mar 2005 23:10:10 +0800." <200503231501.j2NF1hkC091059@wyvern.icir.org>
Message-ID: <200503240906.j2O96Scd012904@possum.icir.org>

> I have verified that by extract MfeaNode::add_multicast_vif from
> MfeaVif::start, and only add_multicast_vif after PIM registers on
> MFEA vif AND MFEA vif is up. 
> 
> After that modification, IGMP startup and configured successfully,
> but PIM configuration actions after pim/0.1/start_vif are failed 
> (because the data traffic became to be pumpped into stack)

:( It appears that once multicast forwarding is enabled in the
kernel, the burst of multicast data packets quickly utilizes the
system resources and triggers the XRL failures.
If you want to push this solution further, then you could introduce
another XRL from PIM to the MFEA that will be called by PIM after
its configuration is completed. This XRL itself will trigger the
enabling of multicast forwarding in the kernel.

However, there is no guarantee this will indeed fix the problem.
In your case, it happens during the startup configuration, but in
general there is the possibility this may happen even during normal
router operation (e.g., the XRLs exchanged among the XORP modules
may be affected).

> Today, I tried some tuning, and they seem to fix most of the problems
> 
>  1) Increase network mbuf, this suppress most of the ENOBUFS.
> 
>  2) Add EWOULDBLOCK to is_pseudo_error, then call_xrl usually successfully
>    returns after a second/third... try of read. But call_xrl consumes 
>    considerable time.
> 
>  3) Increase XORP tasks' priority above the priority of the data forwarding
>    task (i.e. the task doing most of the IP stack processing), this makes
>    call_xrl successfully returns almost immediately.
> 
> I doubt these tuning might have high possibilities of causing other problems.
> But if only take into consideration of the original problem I encoutered, 
> they make all the XORP components works normally even with overloading
> external multicast traffic.
> 
> Could anyone comment on the possible side effects of these tuning? (Because
> they are cheap solution, and I might to use them) What I could think out 
> right now:
> 
>  #1 seems harmless, but only decrease the available free system memory
> 
>  #2 I have no idea what problem it will cause, anybody get ideas?
> 
>  #3 this may starve data forwarding task if protocol tasks consume
>     too much time in their processing. Besides checking each protocol
>     tasks' implementation any other method to solve this problem? 
>     (EventLoop::run might help to detect some of the problem)

If you don't increase the XORP tasks' priority does it still work?
Also, if you add ENOBUFS to is_pseudo_error without increasing the
network mbuf does it still work?
What I am afraid is that even if increasing the mbuf in your kernel
appears to help fixing the problem in your setup, this may not be
true if the amount of multicast traffic is higher.

> 
> Thanks
> Eddy
> 
> 
> 
> BTW
> I encounter the following problem during stress testing, because I'm 
> using ported XORP source code, and they are NOT based on the latest CVS HEAD.
> I could not ensure you can reproduce the problem, but just in case anybody 
> have interest to track it...
> 
>  * In a high volume data traffic enviroment
>  * without the tuning above
>  * try to send XORP component XRL commands through call_xrl
>  * call_xrl's XrlPFSTCPSender dies with EWOULDBLOCK/ENOBUFS...
> 
> task calling call_xrl core dumps, and the stack looks something
> like

Is this with the original XORP call_xrl binary called by a script
(or exec()-ed), or this is in the in-process code that has been
derived from call_xrl?
Also, could you tell the particular reason for the coredump (e.g.,
invalid pointer, etc), because I cannot decode the log below.

Thanks,
Pavlin


> 
> 	......(some of the stack information stripped)......
> 
> 	b982fc xorp_pimsm4_show+150: xorp_pimsm4_show_send(basic_string<char, string_cha
> 	r_traits<char>, __default_alloc_template<true, 0> > &, basic_string<char, string
> 	_char_traits<char>, __default_alloc_template<true, 0> > &, basic_string<char, st
> 	ring_char_traits<char>, __default_alloc_templa?he? ()
> 	b97f68 xorp_pimsm4_show_send(basic_string<char, string_char_traits<char>, __defa
> 	ult_alloc_template<true, 0> > &, basic_string<char, string_char_traits<char>, __
> 	default_alloc_template<true, 0> > &, basic_string<char, string_char_traits<char>
> 	, __default_alloc_templa?|e?+2bc: xorp_pimsm4_call_xrl(Xrl const &, XrlError
> 	*, XrlArgs *, unsigned long) ()
> 	b9737c xorp_pimsm4_call_xrl(Xrl const &, XrlError *, XrlArgs *, unsigned long)+8
> 	4 : call_xrl ()
> 	d4aa68 call_xrl       +e4 : call_xrl_main(EventLoop &, XrlDirRouter &, char cons
> 	t *, XrlError *, XrlArgs *) (165b288, 165b330, f7b458, 165b478, 165b450)
> 	d4a454 call_xrl_main(EventLoop &, XrlDirRouter &, char const *, XrlError *, XrlA
> 	rgs *)+25c: EventLoop::run(void) ()
> 	d60e4c EventLoop::run(void)+e0 : SelectorList::select(TimeVal *) ()
> 	d94b9c SelectorList::select(TimeVal *)+318: XorpMemberCallback2B0<void, AsyncFil
> 	eReader, int, SelectorMask>::dispatch(int, SelectorMask) ()
> 	daa7c4 XorpMemberCallback2B0<void, AsyncFileReader, int, SelectorMask>::dispatch
> 	(int, SelectorMask)+7c : AsyncFileReader::read(int, SelectorMask) ()
> 	da7c78 AsyncFileReader::read(int, SelectorMask)+104: AsyncFileReader::complete_t
> 	ransfer(int, int) ()
> 	da7f6c AsyncFileReader::complete_transfer(int, int)+2d4: XorpMemberCallback4B0<v
> 	oid, XrlPFSTCPSender, AsyncFileOperator::Event, unsigned char const *, unsigned
> 	int, unsigned int>::dispatch(AsyncFileOperator::Event, unsigned char const *, un
> 	signed int, unsigned int) ()
> 	d462c8 XorpMemberCallback4B0<void, XrlPFSTCPSender, AsyncFileOperator::Event, un
> 	signed char const *, unsigned int, unsigned int>::dispatch(AsyncFileOperator::Ev
> 	ent, unsigned char const *, unsigned int, unsigned int)+84 : XrlPFSTCPSender::re
> 	cv_data(AsyncFileOperator::Event, unsigned char const *, unsigned int, unsigned
> 	int) ()
> 	d3bbc4 XrlPFSTCPSender::recv_data(AsyncFileOperator::Event, unsigned char const
> 	*, unsigned int, unsigned int)+128: XrlPFSTCPSender::die(char const *) ()
> 	d3a9c8 XrlPFSTCPSender::die(char const *)+244: XorpMemberCallback2B2<void, XrlDi
> 	rRouter, XrlError const &, XrlArgs *, XrlPFSender *, ref_ptr<XorpCallback2<void,
> 	 XrlError const &, XrlArgs *> > >::dispatch(XrlError const &, XrlArgs *) ()
> 	d31718 XorpMemberCallback2B2<void, XrlDirRouter, XrlError const &, XrlArgs *, Xr
> 	lPFSender *, ref_ptr<XorpCallback2<void, XrlError const &, XrlArgs *> > >::dispa
> 	tch(XrlError const &, XrlArgs *)+ac : XrlDirRouter::send_callback(XrlError const
> 	 &, XrlArgs *, XrlPFSender *, ref_ptr<XorpCallback2<void, XrlError const &, XrlA
> 	rgs *> >) ()
> 	d2bc6c XrlDirRouter::send_callback(XrlError const &, XrlArgs *, XrlPFSender *, r
> 	ef_ptr<XorpCallback2<void, XrlError const &, XrlArgs *> >)+84 : XorpFunctionCall
> 	back2B6<void, XrlError const &, XrlArgs *, Xrl *, bool *, bool *, bool *, XrlErr
> 	or *, XrlArgs *>::dispatch(XrlError const &, XrlArgs *) ()
> 	d4e2a4 XorpFunctionCallback2B6<void, XrlError const &, XrlArgs *, Xrl *, bool *,
> 	 bool *, bool *, XrlError *, XrlArgs *>::dispatch(XrlError const &, XrlArgs *)+4
> 	0 : response_handler(XrlError const &, XrlArgs *, Xrl *, bool *, bool *, bool *,
> 	 XrlError *, XrlArgs *) ()
> 	d4a1cc response_handler(XrlError const &, XrlArgs *, Xrl *, bool *, bool *, bool
> 	 *, XrlError *, XrlArgs *)+ac : d4c754 ()
> 	d4c774 XorpFunctionCallback2B6<void, XrlError const &, XrlArgs *, Xrl *, bool *,
> 	 bool *, bool *, XrlError *, XrlArgs *>::XorpFunctionCallback2B6(char const *, i
> 	nt, void (*)(XrlError const &, XrlArgs *, Xrl *, bool *, bool *, bool *, XrlErro
> 	r *, XrlArgs *), Xrl *, ??e0?f4c: d4b248 ()
> 	d4b3b0 call_xrl       +a2c: d4b094 ()
> 	d4b10c call_xrl       +788: d4be78 ()
> 	d4bea8 XorpFunctionCallback2B6<void, XrlError const &, XrlArgs *, Xrl *, bool *,
> 	 bool *, bool *, XrlError *, XrlArgs *>::XorpFunctionCallback2B6(char const *, i
> 	nt, void (*)(XrlError const &, XrlArgs *, Xrl *, bool *, bool *, bool *, XrlErro
> 	r *, XrlArgs *), Xrl *, ??e/?680: d4bf00 (165b450, 8)
> 	d4bf9c XorpFunctionCallback2B6<void, XrlError const &, XrlArgs *, Xrl *, bool *,
> 	 bool *, bool *, XrlError *, XrlArgs *>::XorpFunctionCallback2B6(char const *, i
> 	nt, void (*)(XrlError const &, XrlArgs *, Xrl *, bool *, bool *, bool *, XrlErro
> 	r *, XrlArgs *), Xrl *, ?$e/?774: d4c0ec ()
> 	d4c124 XorpFunctionCallback2B6<void, XrlError const &, XrlArgs *, Xrl *, bool *,
> 	 bool *, bool *, XrlError *, XrlArgs *>::XorpFunctionCallback2B6(char const *, i
> 	nt, void (*)(XrlError const &, XrlArgs *, Xrl *, bool *, bool *, bool *, XrlErro
> 	r *, XrlArgs *), Xrl *, ??e/`+8fc: d4c7ec (1458db8, 8)
> 	d4c888 XorpFunctionCallback2B6<void, XrlError const &, XrlArgs *, Xrl *, bool *,
> 	 bool *, bool *, XrlError *, XrlArgs *>::XorpFunctionCallback2B6(char const *, i
> 	nt, void (*)(XrlError const &, XrlArgs *, Xrl *, bool *, bool *, bool *, XrlErro
> 	r *, XrlArgs *), Xrl *, ?Xe.P+1060: XrlAtom::copy(XrlAtom const &) (1458db8, 8
> 	)
> 	d13b58 XrlAtom::copy(XrlAtom const &)+28 : d1b8c4 ()
> 	d1b8e4 Mac::~Mac(void)+448: d1b92c ()
> 	d1b95c Mac::~Mac(void)+4c0: d1cc38 ()
> 	value = 0 = 0x0
> 	->	
> 
> If I comment out the below code in XrlPFSTCPSender::die, the problem disappears,
> response_handler returns nomally.
> 
>     // Detach all callbacks before attempting to invoke them.
>     // Otherwise destructor may get called when we're still going through
>     // the lists of callbacks.
>     list<ref_ptr<RequestState> > tmp;
>     tmp.splice(tmp.begin(), _requests_waiting);
>     tmp.splice(tmp.begin(), _requests_sent);
> 
>     _active_requests = 0;
>     _active_bytes    = 0;
> 
>     // Make local copy of uid in case "this" is deleted in callback
>     uint32_t uid = _uid;
> 
>     while (tmp.empty() == false) {
> 	if (sender_list.valid_instance(uid) == false)
> 	    break;
> 	ref_ptr<RequestState>& rp = tmp.front();
> 	if (rp->cb().is_empty() == false)
> 	    rp->cb()->dispatch(XrlError::SEND_FAILED(), 0);
> 	tmp.pop_front();
>     }
> 
> END.
> 
> 
> 
> _______________________________________________
> Xorp-hackers mailing list
> Xorp-hackers@icir.org
> http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/xorp-hackers


From edrt@citiz.net  Thu Mar 24 14:08:45 2005
From: edrt@citiz.net (edrt)
Date: Thu, 24 Mar 2005 22:08:45 +0800
Subject: [Xorp-hackers] XORP IPC mechanism is stressed by data plane traffic
Message-ID: <200503241401.j2OE18Zp007256@wyvern.icir.org>

>
>:( It appears that once multicast forwarding is enabled in the
>kernel, the burst of multicast data packets quickly utilizes the
>system resources and triggers the XRL failures.
>If you want to push this solution further, then you could introduce
>another XRL from PIM to the MFEA that will be called by PIM after
>its configuration is completed. This XRL itself will trigger the
>enabling of multicast forwarding in the kernel.
>
>However, there is no guarantee this will indeed fix the problem.
>In your case, it happens during the startup configuration, but in
>general there is the possibility this may happen even during normal
>router operation (e.g., the XRLs exchanged among the XORP modules
>may be affected).
>

Yes.


>> Today, I tried some tuning, and they seem to fix most of the problems
>> 
>>  1) Increase network mbuf, this suppress most of the ENOBUFS.
>> 
>>  2) Add EWOULDBLOCK to is_pseudo_error, then call_xrl usually successfully
>>    returns after a second/third... try of read. But call_xrl consumes 
>>    considerable time.
>> 
>>  3) Increase XORP tasks' priority above the priority of the data forwarding
>>    task (i.e. the task doing most of the IP stack processing), this makes
>>    call_xrl successfully returns almost immediately.
>> 
>> I doubt these tuning might have high possibilities of causing other problems.
>> But if only take into consideration of the original problem I encoutered, 
>> they make all the XORP components works normally even with overloading
>> external multicast traffic.
>> 
>> Could anyone comment on the possible side effects of these tuning? (Because
>> they are cheap solution, and I might to use them) What I could think out 
>> right now:
>> 
>>  #1 seems harmless, but only decrease the available free system memory
>> 
>>  #2 I have no idea what problem it will cause, anybody get ideas?
>> 
>>  #3 this may starve data forwarding task if protocol tasks consume
>>     too much time in their processing. Besides checking each protocol
>>     tasks' implementation any other method to solve this problem? 
>>     (EventLoop::run might help to detect some of the problem)
>
>If you don't increase the XORP tasks' priority does it still work?
>Also, if you add ENOBUFS to is_pseudo_error without increasing the
>network mbuf does it still work?

 * With low priority XORP tasks, call_xrl will be very slow, read failed
   with multiple EWOULDBLOCK until eventually call_xrl timeout.

 * With high priority XORP tasks, low mbuf, ignore ENOBUFS in is_pseudo_error, 
   all XORP components complains ENOBUFS when they communicate.

>What I am afraid is that even if increasing the mbuf in your kernel
>appears to help fixing the problem in your setup, this may not be
>true if the amount of multicast traffic is higher.
>

Emm, I'll research this more. But at the first sight, there might be
a stable packet receive rate device driver can afford when device is
overloaded, which is not always proportional to external packet injection
rate. If so, we can tune mbuf based on this assumption.


>> 
>> BTW
>> I encounter the following problem during stress testing, because I'm 
>> using ported XORP source code, and they are NOT based on the latest CVS HEAD.
>> I could not ensure you can reproduce the problem, but just in case anybody 
>> have interest to track it...
>> 
>>  * In a high volume data traffic enviroment
>>  * without the tuning above
>>  * try to send XORP component XRL commands through call_xrl
>>  * call_xrl's XrlPFSTCPSender dies with EWOULDBLOCK/ENOBUFS...
>> 
>> task calling call_xrl core dumps, and the stack looks something
>> like
>
>Is this with the original XORP call_xrl binary called by a script
>(or exec()-ed), or this is in the in-process code that has been
>derived from call_xrl?

in-process code derived from call_xrl


>Also, could you tell the particular reason for the coredump (e.g.,
>invalid pointer, etc), because I cannot decode the log below.
>

I don't catch the reason (that's why I paste the stack information)
But it is triggered by XrlPFSTCPSender::die, and from the stack 
information it seems that response_handler doesn't return properly.
Again, I'm not sure we can reproduce it on XORP CVS HEAD version.
Anyway, can we move this to bugzilla and track it there?


Thanks
Eddy


From bms@spc.org  Thu Mar 24 20:16:45 2005
From: bms@spc.org (Bruce M Simpson)
Date: Thu, 24 Mar 2005 12:16:45 -0800
Subject: [Xorp-hackers] XORP IPC mechanism is stressed by data plane traffic
In-Reply-To: <200503241401.j2OE18Zp007256@wyvern.icir.org>
References: <200503241401.j2OE18Zp007256@wyvern.icir.org>
Message-ID: <20050324201645.GC4191@empiric.icir.org>

On Thu, Mar 24, 2005 at 10:08:45PM +0800, edrt wrote:
> >If you don't increase the XORP tasks' priority does it still work?
> >Also, if you add ENOBUFS to is_pseudo_error without increasing the
> >network mbuf does it still work?
> 
>  * With low priority XORP tasks, call_xrl will be very slow, read failed
>    with multiple EWOULDBLOCK until eventually call_xrl timeout.
>  * With high priority XORP tasks, low mbuf, ignore ENOBUFS in is_pseudo_error, 
>    all XORP components complains ENOBUFS when they communicate.

Which version of FreeBSD are you running?

Can you grab the output of netstat -m when this happens (are you running
out of packet headers, or clusters) ?

Is sysctl net.inet.tcp.do_tcpdrain enabled?

What is the tunable kern.ipc.nmbclusters set to?

Can you provide any metrics about how much traffic you are dealing
with (and how much of it is multicast) when this happens?

BMS

From edrt@citiz.net  Fri Mar 25 10:00:23 2005
From: edrt@citiz.net (edrt)
Date: Fri, 25 Mar 2005 18:0:23 +0800
Subject: [Xorp-hackers] XORP IPC mechanism is stressed by data plane traffic
Message-ID: <200503251002.j2PA2L46022411@wyvern.icir.org>

>
>Which version of FreeBSD are you running?
>

VxWorks, its networking stack might be FreeBSD like. 
Can't figure out what version it derived from.


>Can you grab the output of netstat -m when this happens (are you running
>out of packet headers, or clusters) ?
>

See below discussion.


>Is sysctl net.inet.tcp.do_tcpdrain enabled?
>

No. But I think it might cause stack to blindly drain XRL tcp connection.


>What is the tunable kern.ipc.nmbclusters set to?
>
>Can you provide any metrics about how much traffic you are dealing
>with (and how much of it is multicast) when this happens?
>

I choose a low end target board to stress. With 8kpps, 64byte packet
input, mbuf usage remains at about 1200/6600, 64byte mbuf cluster usage
remains at about 800/1000. In the test, no other traffic flow, only
testing multicast traffic.


Thanks
Eddy


From edrt@citiz.net  Fri Mar 25 10:05:35 2005
From: edrt@citiz.net (edrt)
Date: Fri, 25 Mar 2005 18:5:35 +0800
Subject: [Xorp-hackers] XORP IPC mechanism is stressed by data plane traffic
Message-ID: <200503251007.j2PA7FqT022511@wyvern.icir.org>

>
>I choose a low end target board to stress. With 8kpps, 64byte packet
>input, mbuf usage remains at about 1200/6600, 64byte mbuf cluster usage
>remains at about 800/1000. 
>

Forget to mention, those are mbuf statistics after mbuf tuning. 
Before tuning, mbuf are 100% used. After tuning the bottle neck
seems to drop to device driver level.


From pavlin@icir.org  Mon Mar 28 05:28:02 2005
From: pavlin@icir.org (Pavlin Radoslavov)
Date: Sun, 27 Mar 2005 21:28:02 -0800
Subject: [Xorp-hackers] XORP IPC mechanism is stressed by data plane traffic
In-Reply-To: Message from "edrt"<edrt@citiz.net>
 of "Thu, 24 Mar 2005 22:08:45 +0800." <200503241401.j2OE18Zp007256@wyvern.icir.org>
Message-ID: <200503280528.j2S5S2gB050900@possum.icir.org>

> >> BTW
> >> I encounter the following problem during stress testing, because I'm 
> >> using ported XORP source code, and they are NOT based on the latest CVS HEAD.
> >> I could not ensure you can reproduce the problem, but just in case anybody 
> >> have interest to track it...
> >> 
> >>  * In a high volume data traffic enviroment
> >>  * without the tuning above
> >>  * try to send XORP component XRL commands through call_xrl
> >>  * call_xrl's XrlPFSTCPSender dies with EWOULDBLOCK/ENOBUFS...
> >> 
> >> task calling call_xrl core dumps, and the stack looks something
> >> like
> >
> >Is this with the original XORP call_xrl binary called by a script
> >(or exec()-ed), or this is in the in-process code that has been
> >derived from call_xrl?
> 
> in-process code derived from call_xrl
> 
> 
> >Also, could you tell the particular reason for the coredump (e.g.,
> >invalid pointer, etc), because I cannot decode the log below.
> >
> 
> I don't catch the reason (that's why I paste the stack information)
> But it is triggered by XrlPFSTCPSender::die, and from the stack 
> information it seems that response_handler doesn't return properly.
> Again, I'm not sure we can reproduce it on XORP CVS HEAD version.
> Anyway, can we move this to bugzilla and track it there?

Given that the coredump happens in some non-trivially modified code,
we cannot add it to bugzilla without any source code that
demonstrates the problem.

I was looking into the particular segment of the
XrlPFSTCPSender::die() method (that you have commented), and
that particular piece of code tries to do something clever to invoke
the pending callbacks in case of error. If that code is
commented-out, some callbacks may not be invoked as appropriate.

Thanks,
Pavlin

From Timothy.Griffin@cl.cam.ac.uk  Mon Mar 28 12:24:11 2005
From: Timothy.Griffin@cl.cam.ac.uk (Timothy Griffin)
Date: Mon, 28 Mar 2005 13:24:11 +0100
Subject: [Xorp-hackers] Configuration managment ....
In-Reply-To: Your message of "Tue, 22 Mar 2005 14:04:10 PST."
 <20050322220410.GN747@empiric.icir.org>
Message-ID: <E1DFtHw-0004yA-00@mta1.cl.cam.ac.uk>

Bruce M Simpson <bms@spc.org> wrote:

> Why aren't people doing this more widely? I would like to see firm figures
> for resource consumption by MySQL on embedded targets.
> 
> Has anyone tried running MySQL on a machine without an MMU, for example?
> 
> Surely if one wishes to go the SQL route, SQLite would be more appropriate
> for embedded targets?
> 
> The code footprint given for SQLite on the web page is a typical ~250KB
> on x86 for gcc, this doesn't say anything about the run-time memory
> footprint, and I would imagine with better compilers it could get smaller.

these are interesting points. but i think they raise issues with the entire
design philosophy of xorp, not just with using mysql for config storage. 
do xorp design goals really include the ability to run on boxes without virtual memory? 
to have a small code footprint?  i've always thought of xorp's principle goals
as being flexibility/extensibility/modularity. so it seems that using an 
off-the-shelf, and widely used, open-source data storage system 
fits very well with those goals... 


From atanu@ICSI.Berkeley.EDU  Mon Mar 28 20:41:41 2005
From: atanu@ICSI.Berkeley.EDU (Atanu Ghosh)
Date: Mon, 28 Mar 2005 12:41:41 -0800
Subject: [Xorp-hackers] Announcing XORP Release Candidate 1.1
Message-ID: <91277.1112042501@tigger.icir.org>

On behalf of the entire XORP team, I'm delighted to announce the XORP
1.1 Release Candidate, which is now available from <http://www.xorp.org>.

Once the release candidate has proven to be stable, the actual 1.1
release will be prepared. This is planned to occur in the next two
weeks. In the intervening period we will be fixing minor problems and
updating the documentation.

There are still a number of non-critical bugs that we know about which
will not be addressed until the 1.2 release; these are documented in
the errata section below.

In general, to test XORP, we run automated regression tests on a daily
basis with various operating systems and compilers. We also run a
number of PCs as XORP routers. We have enabled as many protocols as
feasible on those routers to test protocol interactions (for example a
BGP IPv6 multicast feed being used by PIM-SM). In addition, automated
scripts are run to externally toggle BGP peerings. Finally, we have
automated scripts that interact directly with the xorpsh to change the
configuration settings.

We have put significant effort into testing but obviously we have not
found all the problems. This is where you can help us to make XORP
more stable, by downloading and using it!

As always we'd welcome your comments - xorp-users@xorp.org is the
right place for general discussion, and private feedback to the XORP
core team can be sent to feedback@xorp.org.

 - The XORP Team

P.S.
Release notes and errata are included below.

------------------------------------------------------------------
		XORP RELEASE NOTES

This file contains XORP release notes (most recent releases first).

Release 1.1-RC (2005/03/24)
=========================
  ALL:
    - Numerous improvements, bug fixes and cleanup.

    - XORP now builds on amd64+OpenBSD-3.6-current.

    - The --enable-advanced-mcast-api flag to "./configure" has been
      replaced  with the --disable-advanced-multicast-api flag.

    - Addition of support for code execution profiling.

    - Currently "gmake" does not build the regression tests.
      The command "gmake check" should be used to build and run the
      regression tests.

    - Addition of two new documents:
      * "An Introduction to Writing a XORP Process"
      * "XORP User Manual"

  CONFIGURATION:
    - All "enabled: true/false" XORP configuration flags are now
      renamed to "disable: false/true".

    - The syntax for configuring the IPv4/IPv6 forwarding has changed:

      OLD:
      fea {
          enable-unicast-forwarding4: true
          enable-unicast-forwarding6: true
      }

      NEW:
      fea {
          unicast-forwarding4 {
              disable: false
          }

          unicast-forwarding6 {
              disable: false
          }
      }

    - The syntax for configuring the AFI/SAFI combinations in BGP has
      changed:

      OLD:
      bgp {
          peer <peer_name> {
              enable-ipv4-multicast
              enable-ipv6-unicast
              enable-ipv6-multicast
          }
      }

      NEW:
      bgp {
          peer <peer_name> {
              ipv4-unicast: true
              ipv4-multicast: true
              ipv6-unicast: true
              ipv6-multicast: true
          }
      }
      The new syntax allows IPv4 unicast to be disabled which was not
      previously possible.

  LIBXORP:
    - Bug fix in ordering events scheduled at exactly the same time
      and expiring at exactly the same time.

    - Various improvements to the eventloop implementation.

    - Addition of a mechanism for buffered asynchronous reads and writes.

  LIBXIPC:
    - Addition of XRL pipelining support.

    - The Finder client address can be defined by the following variable
      in the environment: XORP_FINDER_CLIENT_ADDRESS. This re-enables
      communicating with remote XORP processes.

    - Various other improvements (including performance) and bug fixes.

  LIBFEACLIENT:
    - Few bug fixes.

  XRL:
    - No significant changes.

  RTRMGR:
    - Addition of a new rtrmgr template keyword:
      %deprecated: "Reason".
      This keyword can be used to deprecate old configuration statements.

    - Addition of a new rtrmgr keyword: %update.
      It is similar to %activate, and is called whenever the configuration
      in the subtree has changed.

    - Modification to the rtrmgr template semantics: the XRLs per template
      nodes are sent in the order those nodes are declared in the template
      files. Previously, the order was alphabetical (by the name of the
      template nodes).

    - Various other improvements and bug fixes.

  XORPSH:
    - Addition of a mechanism to track the status of the modules, and
      to provide operational commands for only those modules that are
      running.

    - Various other improvements and bug fixes.

  POLICY:
    - Initial implementation of a policy manager. It is still being tested,
      and should not be used.

  FEA/MFEA:
    - Implementation of Click FEA support.

    - Addition of support for discard interfaces and discard routes.

    - Addition of support for ACLs, though currently there is no mechanism
      to configure them through the XORP configuration file.

    - Initial support for raw sockets.

    - Various bug fixes, improvements and cleanup.

  RIB:
    - Bug fix in adding point-to-point network interfaces.

    - Removal of the old mechanism (ExportTable) for propagating
      the routes to the FEA and all other interested parties.

    - Removal of hard-wired "static" table.

    - Various other improvements and bug fixes.

  RIP:
    - MD5 authentication now works properly. Previously, it was generating
      the wrong signature.

    - Cisco compatibility bug fix.

  BGP:
    - Addition of support for creating IPv6 TCP connections.

    - Few bug fixes in the Multi-protocol support.

    - Major improvements to the flow control mechanism.

    - Various improvements and bug fixes.

  STATIC_ROUTES:
    -  Addition of configuration support for interface-specific static
       routes.

    - Improvements in handling stored routes if they are affected by
      network interface information updates.

    - Addition of support for tracking the state of the relevant
      processes, and for graceful registering/deregistering with them.

    - Addition of support for better checking of the XRL error codes.

    - Few other improvements and bug fixes.

  MLD/IGMP:
    - Bug fix in updating the primary address of an interface.

    - Addition of support for tracking the state of the relevant
      processes, and for graceful registering/deregistering with them.

    - Addition of support for better checking of the XRL error codes.

    - Few other improvements and bug fixes.

  PIM-SM:
    - Bug fixes in handling the MRIB entries and MRIB-related state.

    - Bug fix in scheduling the internal PimMre tasks.

    - Bug fix in updating the primary address of an interface.

    - Bug fix in the computation of the checksum of PIM Register packages
      received from a Cisco router that itself is not spec-compliant in the
      checksum computation.

    - Addition of support for tracking the state of the relevant
      processes, and for graceful registering/deregistering with them.

    - Addition of support for better checking of the XRL error codes.

    - Various other bug fixes, improvements and cleanup.

  FIB2MRIB:
    - Bug fix in deleting the Fib2Mrib entries.

    - Improvements in handling stored routes if they are affected by
      network interface information updates.

    - Addition of support for tracking the state of the relevant
      processes, and for graceful registering/deregistering with them.

    - Addition of support for better checking of the XRL error codes.

    - Few other bug fixes and improvements.

  CLI:
    - Bug fix in <SPACE> auto-completion for sub-commands.

    - Few other bug fixes and improvements.

  SNMP:
    - No significant changes.

------------------------------------------------------------------
		XORP ERRATA

  ALL:
    - Parallel building (e.g., "gmake -j 4") may fail on multi-CPU machines.
      The simplest work-around is to rerun gmake or not to use the -j flag.

    - The following compiler is known to be buggy, and should not be used
      to compile XORP:
          gcc34 (GCC) 3.4.0 20040310 (prerelease) [FreeBSD]
      A newer compiler such as the following should be used instead:
          gcc34 (GCC) 3.4.2 20040827 (prerelease) [FreeBSD]

    - If you run BGP, RIB, FIB2MRIB, and PIM-SM at the same time,
      the propagation latency for the BGP routes to reach the kernel
      is increased. We are investigating the problem.

  LIBXORP:
    - No known issues.

  LIBXIPC:
    - No known issues.

  LIBFEACLIENT:
    - No known issues.

  XRL:
    - No known issues.

  RTRMGR:
    - There are several known issues, but none of them is considered critical.
      The list of known issues is available from
      http://www.xorp.org/bugzilla/query.cgi

    - Using the rtrmgr "-r" command-line option to restart processes that
      have failed does not work if a process fails while being reconfigured
      via xorpsh. If that happens, the rtrmgr itself may coredump.
      Therefore, using the "-r" command-line option is not recommended!
      Also, note that a process that has been killed by SIGTERM or SIGKILL
      will not be restarted (this is a feature rather than a bug).
      Ideally, we want to monitor the processes status using the finder
      rather than the forked children process status, therefore in
      the future when we have a more robust implementation the "-r"
      switch will be removed and will be enabled by default.

  XORPSH:
    - There are several known issues, but none of them is considered critical.
      The list of known issues is available from
      http://www.xorp.org/bugzilla/query.cgi

  FEA/MFEA:
    - On Linux with kernel 2.6 (e.g., RedHat FC2 with kernel 2.6.5-1.358),
      some of the tests may fail (with or without an error message),
      but no coredump image. Some of those failures can be contributed
      to a kernel problem. E.g., running "dmesg can show kernel
      "Oops" messages like:

Unable to handle kernel NULL pointer dereference at virtual address 00000000
 printing eip:
02235532
*pde = 00000000
Oops: 0000 [#15]
CPU:    0
EIP:    0060:[<02235532>]    Not tainted
EFLAGS: 00010202   (2.6.5-1.358) 
EIP is at __dev_get_by_index+0x14/0x2b
eax: 022db854   ebx: 1ae7aef8   ecx: 00000001   edx: 00000000
esi: 00000000   edi: 00008910   ebp: fee43e9c   esp: 1ae7aef0
ds: 007b   es: 007b   ss: 0068
Process test_finder_eve (pid: 2026, threadinfo=1ae7a000 task=1406d7b0)
Stack: 022365c7 00000000 009caffc 009cc780 0969ef28 fee43edc 00000001 009cc780 
       0969ef28 fee43ed8 00008910 00000000 00008910 fee43e9c 02236e50 fee43e9c 
       07aa4e00 3530355b 5d303637 00000000 0227a55b 021536b6 022cfa00 00000001 
Call Trace:
 [<022365c7>] dev_ifname+0x30/0x66
 [<02236e50>] dev_ioctl+0x83/0x283
 [<0227a55b>] unix_create1+0xef/0xf7
 [<021536b6>] alloc_inode+0xf9/0x175
 [<0227c090>] unix_ioctl+0x72/0x7b
 [<022301a5>] sock_ioctl+0x268/0x280
 [<0223054f>] sys_socket+0x2a/0x3d
 [<0214ea0e>] sys_ioctl+0x1f2/0x224

Code: 0f 18 02 90 2d 34 01 00 00 39 48 34 74 08 85 d2 89 d0 75 ea 

      This appears to be a kernel bug triggered by ioctl(SIOCGIFNAME)
      which itself is called by if_indextoname(3). Currently, there
      is no known solution of the problem except to use a kernel that does
      not have the problem (at this stage it is not known whether all
      2.6 Linux kernels are affected or only specific versions).
      It seems that a very similar problem has been reported to the
      Linux kernel developers, but the problem is still unsolved:

      https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=121697

  RIB: 
    - In some rare cases, the RIB may fail to delete an existing route
      (See http://www.xorp.org/bugzilla/show_bug.cgi?id=62).
      We are aware of the issue and will attempt to fix it in the future.

  RIP:
    - No known issues.

  BGP:
    - If the RIB bug above (failure to delete an existing route) is
      triggered by BGP, then the deletion failure error received by
      BGP from the RIB is considered by BGP as a fatal error.
      This is not a BGP problem, but a RIB problem that will be fixed
      in the future.

    - The BGP configuration mandates that an IPv4 nexthop must be supplied.
      Unfortunately it is necessary to provide an IPv4 nexthop even for an
      IPv6 only peering. Even more unfortunately it is not possible to force
      the IPv6 nexthop.

    - It is *essential* for an IPv6 peering that an IPv6 nexthop is provided.
      Unfortunately the configuration does not enforce this requrement.
      This will be fixed in the future.

  STATIC_ROUTES:
    - No known issues.
      
  MLD/IGMP:
    - If MLD/IGMP is started with a relatively large number of interfaces
      (e.g., on the order of 20), then it may fail with the following error:

        [ 2004/06/14 12:58:56  ERROR test_pim:16548 MFEA +666
        mfea_proto_comm.cc join_multicast_group ] Cannot join group 224.0.0.2
        on vif eth8: No buffer space available

      The solution is to increase the multicast group membership limit.
      E.g., to increase the value from 20 (the default) to 200, run as a root:

        echo 200 > /proc/sys/net/ipv4/igmp_max_memberships

  PIM-SM:
    - If the kernel does not support PIM-SM, or if PIM-SM is not enabled
      in the kernel, then running PIM-SM will fail with the following
      error message:
        [ 2004/06/12 10:26:41  ERROR xorp_fea:444 MFEA +529 mfea_mrouter.cc
        start_mrt ] setsockopt(MRT_INIT, 1) failed: Operation not supported

    - On Linux, if the unicast Reverse Path Forwarding information is
      different from the multicast Reverse Path Forwarding information,
      the Reverse Path Filtering should be disabled. E.g., as root:

        echo 0 > /proc/sys/net/ipv4/conf/all/rp_filter
      OR
        echo 0 > /proc/sys/net/ipv4/conf/eth0/rp_filter
        echo 0 > /proc/sys/net/ipv4/conf/eth1/rp_filter
        ...

      Otherwise, the router will ignore packets if they don't arrive on
      the reverse-path interface.
      For more information about Reverse Path Filtering see
      http://www.tldp.org/HOWTO/Adv-Routing-HOWTO/lartc.kernel.rpf.html

  FIB2MRIB:
    - No known issues.

  CLI:
    - No known issues.

  SNMP:
    - On some versions of Linux, there are some bugs in net-snmp versions
      5.0.8 and 5.0.9, which prevent dynamic loading from working.
      See http://www.xorp.org/snmp.html for links to the net-snmp patches
      that solve the problems.

    - Version 5.1 of net-snmp requires a simple modification, otherwise
      XORP will fail to compile.
      See http://www.xorp.org/snmp.html for a link to the net-snmp patch
      that solves the problems.

From edrt@citiz.net  Tue Mar 29 12:47:45 2005
From: edrt@citiz.net (edrt)
Date: Tue, 29 Mar 2005 20:47:45 +0800
Subject: [Xorp-hackers] XORP IPC mechanism is stressed by data plane traffic
Message-ID: <200503291246.j2TCkAdC085604@wyvern.icir.org>

> 
> > I don't catch the reason (that's why I paste the stack information)
> > But it is triggered by XrlPFSTCPSender::die, and from the stack 
> > information it seems that response_handler doesn't return properly.
> > Again, I'm not sure we can reproduce it on XORP CVS HEAD version.
> > Anyway, can we move this to bugzilla and track it there?
>
> Given that the coredump happens in some non-trivially modified code,
> we cannot add it to bugzilla without any source code that
> demonstrates the problem.
>
> I was looking into the particular segment of the
> XrlPFSTCPSender::die() method (that you have commented), and
> that particular piece of code tries to do something clever to invoke
> the pending callbacks in case of error. If that code is
> commented-out, some callbacks may not be invoked as appropriate.

OK. I'll research it more and report back to the mailing list if it is 
actually a XORP bug.

BTW, a bug found today by stressing the system you might be interested.
When system is short of mbuf cluster, and setsockopt(SO_RCVBUF) always 
return failed. x_comm_sock_set_rcvbuf will loop forever. (delta will be
decremented to become a minus).


Regards
Eddy


From edrt@citiz.net  Tue Mar 29 13:01:13 2005
From: edrt@citiz.net (edrt)
Date: Tue, 29 Mar 2005 21:01:13 +0800
Subject: [Xorp-hackers] XORP IPC mechanism is stressed by data plane traffic
Message-ID: <200503291259.j2TCx92T085724@wyvern.icir.org>

>BTW, a bug found today by stressing the system you might be interested.
>When system is short of mbuf cluster, and setsockopt(SO_RCVBUF) always 
>return failed. x_comm_sock_set_rcvbuf will loop forever. (delta will be
>decremented to become a minus).
>


Sorry, desired_bufsize will be decremented to become a minus (delta will 
be remains as 1)


-Eddy


From pavlin@icir.org  Tue Mar 29 23:10:25 2005
From: pavlin@icir.org (Pavlin Radoslavov)
Date: Tue, 29 Mar 2005 15:10:25 -0800
Subject: [Xorp-hackers] XORP IPC mechanism is stressed by data plane traffic
In-Reply-To: Message from "edrt"<edrt@citiz.net>
 of "Tue, 29 Mar 2005 21:01:13 +0800." <200503291259.j2TCx92T085724@wyvern.icir.org>
Message-ID: <200503292310.j2TNAPK1017414@possum.icir.org>

> >BTW, a bug found today by stressing the system you might be interested.
> >When system is short of mbuf cluster, and setsockopt(SO_RCVBUF) always 
> >return failed. x_comm_sock_set_rcvbuf will loop forever. (delta will be
> >decremented to become a minus).
> >
> 
> 
> Sorry, desired_bufsize will be decremented to become a minus (delta will 
> be remains as 1)

Bug fixed in the CVS repository.
Thanks for the catch!

Pavlin