[Xorp-users] XORP Crash

Bruce Simpson bms at incunabulum.net
Wed Jul 29 09:42:27 PDT 2009


Tafi Makamure wrote:
>
> Hi All,
>
>  
>
> I recently had a crash/fail on me in a live system and am trying to 
> find out what the cause was.  Restarting XORP resolved the issue 
> however I am trying to get to the bottom of cause.
>
>  
>
> I got the following events in the xorp logs.
>
>  
>
> /[ 2009/07/29 02:14:23  FATAL xorp_bgp:4665 BGP +200 socket.cc connect 
> ] Assertion (!get_sock().is_valid()) failed/
>

BGP is the only XORP routing process which directly manages its own 
sockets. Something has caused one of the live sockets in the BGP process 
to die. This error message pins down the assertion failure to line 200 
in socket.cc; SocketClient::connect().

XORP was initially developed on FreeBSD, which is a UNIX-like system, so 
it is very possible that some of the code in BGP which directly uses 
sockets, is assuming that the file descriptor(s) can be recycled.

Checking svn annotate, this change is one of mine, from the Win32 merge 
3 years ago. We've formally dropped Windows support now -- the recycling 
of sockets in BGP was an issue with the Winsock socket library, and the  
paranoid assertion on line 200 to catch this is being triggered somehow. 
It would be good to get a root cause analysis on the condition.

Points to consider:
 * Do you have any other logs from the BGP process?
 * A core dump or GDB backtrace would be really helpful.
 * Was BGP trying to re-establish a dropped connection with a remote BGP 
peer at the time?
 * Can you please try commenting out line 200 of bgp/socket.cc with the 
C++ '//' one-line comment? That would be the best workaround for now.

> //
>
> /[ 2009/07/29 02:14:23  ERROR xorp_rib:4662 LIBXORP +226 
> buffered_asyncio.cc io_event ] read error 104/
>


The BufferedAsyncReader class isn't actually used anywhere outside of 
XRL. The following messages are fallout from the RIB, which BGP 
interacts directly with, noticing that BGP has disappeared.

> //
>
> /[ 2009/07/29 02:14:23  ERROR xorp_rib:4662 XRL +169 xrl_pf_stcp.cc 
> read_event ] Read failed (error = 104)/
>

The remaining messages are just fallout from the RIB picking up that BGP 
has died, followed by the Router Manager.

thanks
BMS



More information about the Xorp-users mailing list