[Xorp-users] questions about xorp

MANJON@terra.es MANJON@terra.es
Thu, 30 Mar 2006 14:37:45 +0200 (MEST)


first at all, thanks for your help

This messages appears when the processes of xorp have died.
In my system the firewall and xorp switchover works fine: the firewall 
failover, the xorp is started in the active and stopped in the passive 
one.
But each indefinite time one or two or all process of xorp are stopped 
or died. I don´t know why and I can´t understand the logs.
At this moment I have the active node with 213 unicast routes and 785 
multicast routes. What are the limits tested in xorp if they exist?
I have two clusters with the same problem in different places.
This is the exit of the "top" command in the active node right now. 
Now all is working but will fail sure.

 14:32:56  up 29 days,  8:07,  1 user,  load average: 1.06, 1.11, 1.09
47 processes: 45 sleeping, 2 running, 0 zombie, 0 stopped
CPU states:  cpu    user    nice  system    irq  softirq  iowait    
idle
           total    2.2%    0.0%    0.6%   0.0%     1.0%    0.0%   
96.2%
Mem:   899112k av,  389572k used,  509540k free,       0k shrd,     
468k buff
       233332k active,              57796k inactive
Swap:       0k av,       0k used,       0k free                  
154628k cached

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU 
COMMAND
32617 root      15   0 10384  10M  2932 S     0.4  1.1   7:34   0 
xorp_pimsm4
32483 root      15   0 10332  10M  3664 S     0.2  1.1   3:33   0 
xorp_fea

 I dont know what information you need to help me.

thanks Pavlin

Jose Maria Martin


----Mensaje original----
De: pavlin@icir.org
Recibido: 30/03/2006 5:11
Para: <MANJON@terra.es>
CC: <xorp-users@xorp.org>
Asunto: Re: [Xorp-users] questions about xorp 

> I have a firewall cluster using xorp for multicast. I have an script 
that g=
> et up xorp processses in the active node. Really I have only run 
xorp in th=
> e active node. When there is a failover my script run the xorp in 
the new a=
> ctive node and kill xorp in the passive one. This is the xorp 
processes tha=
> t I start:
> 
>  1308 ?        S      0:01 /opt/bladefusion/xorp/bin/xorp_rtrmgr
>  1310 ?        S      0:49 /opt/bladefusion/xorp/fea/xorp_fea
>  1381 ?        S      0:00 /opt/bladefusion/xorp/rib/xorp_rib
>  1399 ?        S      0:00 
/opt/bladefusion/xorp/fib2mrib/xorp_fib2mrib
>  1417 ?        S      0:28 /opt/bladefusion/xorp/mld6igmp/xorp_igmp
>  1435 ?        S      0:05 /opt/bladefusion/xorp/pim/xorp_pimsm4
> 
> My question is: Is necessary use xorp_fib2mrib in my system? I 
haven=C2=B4t=
>  to sync xorp states between nodes because I have only one node 
running xor=
> p .

For all practical purpose, the answer is "yes".
Process xorp_fib2mrib is used to obtain the unicast forwarding state
from the kernel (via the FEA) and push it into the Multicast RIB
(which is needed by PIM-SM for the reverse-path forwarding check).

> My other question is about an error in my /var/log/messages, when my 
xorp d=
> ied
> 
> Mar 28 02:01:41 fw1bjscpd BF-PIM: [ 2006/03/28 02:01:41 ERROR 
xorp_rtrmgr:2=
> 2712 XRL +629 xrl_pf_stcp.cc die ] XrlPFSTCPSender died: Keepalive 
timeout=
> =20

This error (and all other errors) is problematic.
This message shows some XRL communication problems: the keepalive to
some of the other XORP processes has timeout.
All other errors are probably a direct or indirect result of those
XRL timeouts.

Do you see those errors during the switchover, or well after the
switchover was completed?

If they happen during the switchover then something has gone wrong.
E.g., if you are starting a new instance of XORP, first you must
make sure that all processes from the old instance have been killed.
Otherwise, they may inflict on the new XORP instance.

If you start seeing the errors well after the switchover has
completed, then the following log message might be a suspect:

> Mar 28 02:02:13 fw1bjscpd ntpdate[24855]: adjust time server 
55.1.1.8 offse=
> t -0.054280 sec

In general, adjusting the time backwards doesn't play well with
XORP, so the above adjustment _may_ have something to do with XRL
keepalive timeout. The easiest way to test this is temporary to turn
off NTP and see whether you still get keepalive timeouts.

Pavlin

_______________________________________________
Xorp-users mailing list
Xorp-users@xorp.org
http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/xorp-users