Wow, I found it and I am a complete idiot. My RTO estimator sometimes generated a microsecond field of greater than 1 million. This causes select to fail. Thanks for the help.<br><br>-Amit<br><br><div class="gmail_quote">
On Dec 9, 2007 4:53 PM, <<a href="mailto:vern@cs.berkeley.edu">vern@cs.berkeley.edu</a>> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<div class="Ih2E3d">> I am using select to wake me up on a timeout or if there is data ready to be<br>> received. Towards the end of the transfer, select returns before a timeout<br>> and FD_ISSET returns true implying that there is data to be read, so I call
<br>> recvfrom which ends up blocking which would imply that there is nothing<br>> there.<br><br></div>In my experience, this is pretty much always some sort of bug in the use<br>of select, though they can sometimes be very hard to find. You've already
<br>taken care of the #1 suspect, which is failing to FD_ZERO or failing to<br>FD_SET correctly. Another possibility is that your code is structured to<br>(somewhere internal) read from the fd that you're then later trying to
<br>read due to select(), so that now it no longer has anything to return;<br>or you're reading with recvfrom and what you've specified doesn't match<br>the packet that came in.<br><br>If you send me your select() loop, I'll try to take a look at it. However,
<br>I'm not online much today, so I'm not sure if I'll be able to reply before<br>tomorrow.<br><div class="Ih2E3d"><br>> I read online that select does not guarantee that recvfrom wont<br>> block because the packet may have been corrupted.
<br><br></div>That doesn't sound right to me - the kernel should make the integrity checks<br>prior to analyzing the rest of the header, and it has to do that analysis<br>in order to figure out which file descriptor to flag as being available
<br>for reading. (More generally, servers all over the world rely on select()<br>not causing them to occasionally block waiting to read - so this is code<br>that has *really* been hammered on.)<br><div class="Ih2E3d"><br>
> So I changed my socket to<br>> work in non-blocking mode.<br><br></div>Do try to avoid that. Like select(), it comes with its own subtle usage<br>errors, and the combination can be quite confusing.<br><br>It's worth stepping through the code executed by MNL for the recv to
<br>see whether in some cases it reads twice or something like that.<br><br> Vern<br></blockquote></div><br>