[Xorp-hackers] TCP sockets and packet splitting
Bruce Simpson
bms at incunabulum.net
Tue Jun 9 05:46:23 PDT 2009
Victor Faion wrote:
> ...
> Thanks for the reply, I was using TCP sockets with IPv4 packets,
> that's why I thought XORP would handle the merging,
Hang on, need to disambiguate: are you are sending raw IPv4 datagrams
(this does not use the socket4.xif interface), or are you sending TCP
stream traffic (this does use the socket4.xif interface) ?
I assume here you mean: sending TCP stream traffic via the socket4.xif
XRL interface, with the default XRL library settings for IPC
encapsulation, between your application and the XORP FEA (i.e. TCP over
the loopback interface).
> but I saw the comment in IoTcpUdpSocket::send():
> // We don't coalesce for TCP as well, but this could be changed in the
>
> // future if it improves performance.
>
>
> I think the socket library should indicate to the programmer when it's
> going to split the packet being sent.
Actually there is no guarantee in the socket4.xif API for preserving
message boundaries in TCP, as it is a stream protocol. The behaviour of
splitting TCP writes across segments is constrained by the host's TCP
implementation, as well as the underlying network conditions.
It sounds like you want something similar in principle to Linux's
TCP_CORK socket option: "If set, don't send out partial frames. All
queued partial frames are sent when the option is cleared again." See
here for constructive bashing of TCP_CORK: http://www.baus.net/on-tcp_cork
Normally the behaviour of most TCPs is to attempt to send the payload
you provide right away in one segment, and set the PUSH flag on the
segment. However TCP does not necessarily guarantee to preserve message
boundaries in this way -- it is, after all, a stream protocol, not a
sequenced message protocol, and is subject to retransmissions, sliding
window effects, head-of-line blocking, Nagle algorithm, etc.
As an example, HTTP/1.0 is an exception to this, as it will shut down
one half of the full-duplex connection after the request is sent, this
serves as its message boundary. The use of Selective Acknowledgements
(SACK) can mean that multiple segments are in flight out-of-order.
In summary: It would be difficult for socket4.xif to provide any API
guarantee about splitting TCP writes, as it depends on the host's
network stack.
thanks,
BMS
More information about the Xorp-hackers
mailing list