[Xorp-users] PIM problem: not forwarding from internal to external interface

Pavlin Radoslavov pavlin@icir.org
Thu, 23 Jun 2005 12:22:38 -0700


First, thanks for the very detailed info, because it eliminates a
number of extra email exchanges :)

> We are trying to route multicast through our firewall primarily for Accessgrid 
> access. The fw has a 2.6.9 Linux kernel, and two active interfaces, namely 
> eth0 (internal)  and eth2 (external). eth1 is NOT configured. A campus router
> provides the static RP, though the RP is not a next hop from the firewall. 
> Is that a problem?  

This should be fine.

>                                             Linux            local
> Internet-- Router  -- ... --Campus -- (eth2)Firewall/(eth0)--switch-- beacon 
>            (is RP)          Router1         Router                    node
>          a.b.r.1            a.b.c.75  a.b.c.74      a.b.d.1           a.b.d.14

[Detailed info deleted]

> We are using a fresh CVS copy of XORP, and a pretty standart configuration 
> file. The firewall has static routes on it for a bunch of networks behind it, 
> and no dynamic routing protocols are used. So my assumption is that fib2mrib
> should be used.

Yes, you should use fib2mrib. Even if you have dynamic routing
protocols, for the time being you still should use fib2mrib.

> Xorp> show pim mfc 
> 233.4.200.19    random_beacon_IP   a.b.r.1  
>     Incoming interface :      eth2
>     Outgoing interfaces:      O..
> 
> .... tons of beacon nodes just like the on above
> 
> 233.4.200.19    a.b.d.14    a.b.r.1
>     Incoming interface :      eth0
>     Outgoing interfaces:      ..O

The above entry is the most important clue that PIM-SM (should) have
installed the correct multicast forwarding entry in the kernel. You
could double-check by verifying that the entry is in the kernel as
well (cat /proc/net/ip_mr_cache), but in this case I don't think it
is necessary because the problem seems to be elsewhere (see below).

<DEL>

> Before I send the complete log, I would like to send warning/error messages
> and see if they make sense to you, and if we need to to be concerned:
> 
>   ERROR xorp_fea:1703 MFEA +1781 mfea_proto_comm.cc proto_socket_write ] sendmsg(proto 103 from a.b.d.1 to a.b.r.1 on vif register_vif) failed: Message too long
>   ERROR xorp_pimsm4:1729 PIM +2617 xrl_pim_node.cc mfea_client_send_protocol_message_cb ] Cannotdsend a protocol message: 102 Command failed Cannot send PIMSM_4 protocol message from a.b.d.1 to a.b.r.1 on vif register_vif
>  WARNING xorp_fea MFEA ] proto_socket_read() failed: RX packet from a.b.r.1 to 224.0.0.2: no vif found
>  WARNING xorp_fea XrlMfeaTarget ] Handling method for mfea/0.1/send_protocol_message4 failed: XrlCmdError 102 Command failed Cannot send PIMSM_4 protocol message from a.b.d.1 to a.b.r.1 on vif register_vif

Yes, I think this is probably the problem. After PIM-SM receives a
data packet from the beacon, it encapsulates it by adding the PIM
Register header and then tries to unicast it to the RP. It looks
like that after the encapsulation the packet becomes too large and
the kernel doesn't want to accept it for transmission.

It is not clear to me why the kernel didn't like the packet, so to
start chasing the problem can you do the following:

 * Run tcpdump on the interface between your XORP router and
   the multicast beacon and capture the original size of the multicast
   data packets.

 * Checkout the lastest version of fea/mfea_proto_comm.cc (rev 1.32)
   and run again XORP. The newer version prints the size of the failed
   data packet so this can provide some additional clue about why the
   kernel doesn't like the packet.

> After XORP initialization, we continuosly see the following printed from XORP:
> 
> [ 2005/06/23 12:49:28 TRACE xorp_pimsm4 PIM ] TX PIM_REGISTER from a.b.d.1 to a.b.r.1 on vif register_vif
> [ 2005/06/23 12:49:28 TRACE xorp_pimsm4 PIM ] RX WHOLEPKT signal from MFEA_4: vif_index = 2 src = a.b.d.14 dst = 233.4.200.19
> [ 2005/06/23 12:49:28 TRACE xorp_pimsm4 PIM ] TX PIM_REGISTER from a.b.d.1 to a.b.r.1 on vif register_vif

The above TRACE messages are normal, and they indicate that PIM-SM
proparly receives the whole data packets from the kernel and then
encapsulates them and initiates the transmission to the RIP.

Pavlin