[Xorp-hackers] TCP sockets and packet splitting

Bruce Simpson bms at incunabulum.net
Tue Jun 9 05:46:23 PDT 2009


Victor Faion wrote:
> ...
> Thanks for the reply, I was using TCP sockets with IPv4 packets, 
> that's why I thought XORP would handle the merging,

Hang on, need to disambiguate: are you are sending raw IPv4 datagrams 
(this does not use the socket4.xif interface), or are you sending TCP 
stream traffic (this does use the socket4.xif interface) ?

I assume here you mean: sending TCP stream traffic via the socket4.xif 
XRL interface, with the default XRL library settings for IPC 
encapsulation, between your application and the XORP FEA (i.e. TCP over 
the loopback interface).

> but I saw the comment in IoTcpUdpSocket::send():
> 	// We don't coalesce for TCP as well, but this could be changed in the
>
> 	// future if it improves performance.
>
>   
> I think the socket library should indicate to the programmer when it's 
> going to split the packet being sent.

Actually there is no guarantee in the socket4.xif API for preserving 
message boundaries in TCP, as it is a stream protocol. The behaviour of 
splitting TCP writes across segments is constrained by the host's TCP 
implementation, as well as the underlying network conditions.

It sounds like you want something similar in principle to Linux's 
TCP_CORK socket option: "If set, don't send out partial frames. All 
queued partial frames are sent when the option is cleared again." See 
here for constructive bashing of TCP_CORK: http://www.baus.net/on-tcp_cork

Normally the behaviour of most TCPs is to attempt to send the payload 
you provide right away in one segment, and set the PUSH flag on the 
segment. However TCP does not necessarily guarantee to preserve message 
boundaries in this way -- it is, after all, a stream protocol, not a 
sequenced message protocol, and is subject to retransmissions, sliding 
window effects, head-of-line blocking, Nagle algorithm, etc.

As an example, HTTP/1.0 is an exception to this, as it will shut down 
one half of the full-duplex connection after the request is sent, this 
serves as its message boundary. The use of Selective Acknowledgements 
(SACK) can mean that multiple segments are in flight out-of-order.

In summary: It would be difficult for socket4.xif to provide any API 
guarantee about splitting TCP writes, as it depends on the host's 
network stack.

thanks,
BMS



More information about the Xorp-hackers mailing list