[Xorp-hackers] Socket polling

Fri Feb 13 11:56:16 PST 2009

On Fri, Feb 13, 2009 at 16:28, Victor Faion <vfaion at gmail.com> wrote:
> On Thu, Feb 12, 2009 at 18:55, Pavlin Radoslavov
> <pavlin at icsi.berkeley.edu> wrote:
>> Victor Faion <vfaion at gmail.com> wrote:
>>
>>> Hello,
>>>
>>> I was trying to setup a process that tries connect to its neighbours
>>> over TCP and basically I wanted it to keep trying to connect to its
>>> neighbours until it can, but I was having some trouble as the process
>>> basically stops trying to connect when it can't connect the first
>>> time.
>>>
>>> I iterate over all the neighbour objects calling their connect
>>> function which calls send_tcp_open_bind_connect. The callback given to
>>> send_tcp_open_bind_connect just checks if there was an error and if
>>> there was it calls connectRetry() which pretty much does the same
>>> thing as connect (calls send_tcp_open_bind_connect and passes it the
>>> same callback as connect). The problem is the first time when it calls
>>> connect and fails, it just calls the socket4_user_0_1_error_event
>>> function (saying ``Transport endpoint is not connected'' which is
>>> expected) but then it doesn't go back into connectRetry() and no
>>> connection is made when its neighbours are actually listening for this
>>> connection. Is there a better/easier way of doing this polling or am I
>>> just doing the recursing with the callback the wrong way?
>>
>> Is connectRetry() a method in your protocol?
>>
>
>
> Yeah, connect() takes in the parameters needed to call
> send_tcp_open_bind_connect() and saves them into the Neighbour object.
> Then connectRetry() uses the cached values to call
> send_tcp_open_bind_connect() if it fails the first time.
>
>
>> In your event handler for socket4_user_0_1_error_event you need to
>> handle the error conditions (e.g., schedule a call to
>> connectRetry()).
>>
>
>
> I tried to avoid this as this means iterating over all the neighbours
> again, checking each sockid and matching against the sockid received
> in socket4_user_0_1_error_event to figure out which neighbour's
> connect function to call again. Anyway I tried doing it like this but
> it still doesn't repeatedly try to connect to a neighbour. It goes in
> this order:
>
> 1. Try to connect normally using the neighbour's conect() (shouldn't be able to)
>
> 2. Callback for send_tcp_open_bind_connect gets called (and the
> XrlError object received is XrlError::OKAY() for some reason)
>
> 3. socketx_user_0_1_error_event gets called and says ``Transport
> endpoint is not connected fatal''
>
> 4. Then socketx_user_0_1_error_event iterates over the neighbours,
> when it matches the one which has the sockid that
> socketx_user_0_1_error_event received it calls connect() again.
>
> 5. Then I get a warning that says ``Handling method for
> socket4_user/0.1/error_event failed: XrlCmdError 102 Command failed
> socket error''
>
> 6. Then the same thing as step 2 happens.
>
> The cycle ends there, connect() only gets called twice because
> socketx_user_0_1_error_event only gets called once. Not sure why this
> happens, something to do with that warning. Why does that happen
> though?
>
>
>> Also, are you saying that the first time you call
>> send_tcp_open_bind_connect() and it fails, the callback for that XRL
>> is not called at all? I would guess the callback might be called
>> after socket4_user_0_1_error_event is received, but I wouldn't bet
>> on the ordering.
>>
>> Pavlin
>>
>
>
> Well the callback gets called but the problem is that I'm not sure
> which of the callback and the error event handler get called last in
> order to reschedule the connecting.
>
> Victor
>

Sorry the reason for step 5 above was because my
socket4_user_0_1_error_event was returning
XrlCmdError::COMMAND_FAILED("socket error"). However when I changed it
to return XrlCmdError::OKAY() basically it goes through steps 1-4 from
above except sometimes, it doesn't happen in the order above but in
the order 1, 3, 4, 2. When this happens it ends in step 2 and a
connection is not made. This happens because the callback sets the
sockid of the neighbour when a connection attempt is made, and the
error handler uses this sockid to know which neighbour to connect to.
So when the new sockid doesn't get set, the error handler doesn't find
the neighbour. Not sure how to get the new sockid into the event
handler when it doesn't get set into the callback.