[Xorp-hackers] Socket polling

Victor Faion vfaion at gmail.com
Tue Feb 17 04:35:30 PST 2009


On Mon, Feb 16, 2009 at 19:12, Pavlin Radoslavov
<pavlin at icsi.berkeley.edu> wrote:
> Victor Faion <vfaion at gmail.com> wrote:
>
>> On Fri, Feb 13, 2009 at 19:56, Victor Faion <vfaion at gmail.com> wrote:
>> > On Fri, Feb 13, 2009 at 16:28, Victor Faion <vfaion at gmail.com> wrote:
>> >> On Thu, Feb 12, 2009 at 18:55, Pavlin Radoslavov
>> >> <pavlin at icsi.berkeley.edu> wrote:
>> >>> Victor Faion <vfaion at gmail.com> wrote:
>> >>>
>> >>>> Hello,
>> >>>>
>> >>>> I was trying to setup a process that tries connect to its neighbours
>> >>>> over TCP and basically I wanted it to keep trying to connect to its
>> >>>> neighbours until it can, but I was having some trouble as the process
>> >>>> basically stops trying to connect when it can't connect the first
>> >>>> time.
>> >>>>
>> >>>> I iterate over all the neighbour objects calling their connect
>> >>>> function which calls send_tcp_open_bind_connect. The callback given to
>> >>>> send_tcp_open_bind_connect just checks if there was an error and if
>> >>>> there was it calls connectRetry() which pretty much does the same
>> >>>> thing as connect (calls send_tcp_open_bind_connect and passes it the
>> >>>> same callback as connect). The problem is the first time when it calls
>> >>>> connect and fails, it just calls the socket4_user_0_1_error_event
>> >>>> function (saying ``Transport endpoint is not connected'' which is
>> >>>> expected) but then it doesn't go back into connectRetry() and no
>> >>>> connection is made when its neighbours are actually listening for this
>> >>>> connection. Is there a better/easier way of doing this polling or am I
>> >>>> just doing the recursing with the callback the wrong way?
>> >>>
>> >>> Is connectRetry() a method in your protocol?
>> >>>
>> >>
>> >>
>> >> Yeah, connect() takes in the parameters needed to call
>> >> send_tcp_open_bind_connect() and saves them into the Neighbour object.
>> >> Then connectRetry() uses the cached values to call
>> >> send_tcp_open_bind_connect() if it fails the first time.
>> >>
>> >>
>> >>> In your event handler for socket4_user_0_1_error_event you need to
>> >>> handle the error conditions (e.g., schedule a call to
>> >>> connectRetry()).
>> >>>
>> >>
>> >>
>> >> I tried to avoid this as this means iterating over all the neighbours
>> >> again, checking each sockid and matching against the sockid received
>> >> in socket4_user_0_1_error_event to figure out which neighbour's
>> >> connect function to call again. Anyway I tried doing it like this but
>> >> it still doesn't repeatedly try to connect to a neighbour. It goes in
>> >> this order:
>> >>
>> >> 1. Try to connect normally using the neighbour's conect() (shouldn't be able to)
>> >>
>> >> 2. Callback for send_tcp_open_bind_connect gets called (and the
>> >> XrlError object received is XrlError::OKAY() for some reason)
>> >>
>> >> 3. socketx_user_0_1_error_event gets called and says ``Transport
>> >> endpoint is not connected fatal''
>> >>
>> >> 4. Then socketx_user_0_1_error_event iterates over the neighbours,
>> >> when it matches the one which has the sockid that
>> >> socketx_user_0_1_error_event received it calls connect() again.
>> >>
>> >> 5. Then I get a warning that says ``Handling method for
>> >> socket4_user/0.1/error_event failed: XrlCmdError 102 Command failed
>> >> socket error''
>> >>
>> >> 6. Then the same thing as step 2 happens.
>> >>
>> >> The cycle ends there, connect() only gets called twice because
>> >> socketx_user_0_1_error_event only gets called once. Not sure why this
>> >> happens, something to do with that warning. Why does that happen
>> >> though?
>> >>
>> >>
>> >>> Also, are you saying that the first time you call
>> >>> send_tcp_open_bind_connect() and it fails, the callback for that XRL
>> >>> is not called at all? I would guess the callback might be called
>> >>> after socket4_user_0_1_error_event is received, but I wouldn't bet
>> >>> on the ordering.
>> >>>
>> >>> Pavlin
>> >>>
>> >>
>> >>
>> >> Well the callback gets called but the problem is that I'm not sure
>> >> which of the callback and the error event handler get called last in
>> >> order to reschedule the connecting.
>> >>
>> >> Victor
>> >>
>> >
>> >
>> > Sorry the reason for step 5 above was because my
>> > socket4_user_0_1_error_event was returning
>> > XrlCmdError::COMMAND_FAILED("socket error"). However when I changed it
>> > to return XrlCmdError::OKAY() basically it goes through steps 1-4 from
>> > above except sometimes, it doesn't happen in the order above but in
>> > the order 1, 3, 4, 2. When this happens it ends in step 2 and a
>> > connection is not made. This happens because the callback sets the
>> > sockid of the neighbour when a connection attempt is made, and the
>> > error handler uses this sockid to know which neighbour to connect to.
>> > So when the new sockid doesn't get set, the error handler doesn't find
>> > the neighbour. Not sure how to get the new sockid into the event
>> > handler when it doesn't get set into the callback.
>> >
>>
>>
>> Hello,
>>
>> Sorry to restart this thread, I'm not sure how to handle the case when
>> a router cannot connect to another router. I don't understand why when
>> I call send_tcp_open_bind_connect to another router (which isn't even
>> online) the callback to send_tcp_open_bind_connect receives
>> XrlError::OKAY(). I wanted to handle this error in the callback as I
>> don't have enough information to handle it in the
>> socket4_user_0_1_error_event function. I couldn't find any code in
>> XORP that does this sort of thing. Where do the errors that get passed
>> into the callback for send_tcp_open_bind_connect get set?
>
> For a reason that is unclear to me without further investigation,
> the order of the send_tcp_open_bind_connect callback and the
> socket4_user_0_1_error_event upcall are reversed
> (always/occasionally?). I had a quick look in the FEA, and the
> callback should be received first, but obviously from your
> description this doesn't seem to be the case.
> The correct solution should be to investigate the issue and fix it.
> This might require understanding of the FEA I/O internals and some
> XRL-related knowledge. Unfortunately, I can't give you an estimate
> how soon I/we can allocate the resources to fix that, so your best
> bet would be to submit a Bugzilla entry.
>

Yeah they are reversed, I think it only happens when I reschedule
send_tcp_open_bind_connect from socket4_user_0_1_error_event. In that
case, it calls socket4_user_0_1_error_event before the callback.
I can submit a bug report, what sort of information should I include?

Here's the relevant output from when I start xorp_rtrmgr. At first it
attempts to connect normally, then goes to the callback, then
socket4_user_0_1_error_event, matches the sockid because it has been
set in the callback then attempts to reconnect to that IP. However
after this is goes back to socket4_user_0_1_error_event before the
callback (and so the new sockid is not set and there is no match and
no reconnection attempt).

[ 2009/02/17 12:06:07 INFO xorp_bpsf XrlBpsfTarget ] Connecting to 146.169.3.10
[ 2009/02/17 12:06:07 INFO xorp_bpsf XrlBpsfTarget ] CB for
92a9030b-02ba706b-0006c2a9-3c670000
[ 2009/02/17 12:06:07 INFO xorp_bpsf XrlBpsfTarget ]
socketx_user_0_1_error_event 92a9030b-02ba706b-0006c2a9-3c670000
Transport endpoint is not connected fatal
[ 2009/02/17 12:06:07 INFO xorp_bpsf XrlBpsfTarget ] sockid match: 146.169.3.10
[ 2009/02/17 12:06:07 INFO xorp_bpsf XrlBpsfTarget ] connect retry
[ 2009/02/17 12:06:07 INFO xorp_bpsf XrlBpsfTarget ]
socketx_user_0_1_error_event 92a9030b-02ba706b-0006d31d-3c670000
Transport endpoint is not connected fatal
[ 2009/02/17 12:06:07 INFO xorp_bpsf XrlBpsfTarget ] CB for
92a9030b-02ba706b-0006d31d-3c670000


> For your own purpose you need to move forward by using some
> workaround. One possible solution that comes to mind is to have a
> map of states per sockid that can be populated/updated regardless of
> the order of the callbacks and the upcalls. E.g., if an upcall is
> received before the sockid is known, a new entry is created for that
> sockid and the state is set according to the upcall error. Then,
> after the send_tcp_open_bind_connect callback is invoked, at that
> time the sockid entry can be used for its intended purpose (and the
> error condition is already filled-in).

Sounds good I will try to implement this :-)

> On the other hand, if the upcall is inbound_connect_event or
> outbound_connect_event (instead of error_event), then only after the
> send_tcp_open_bind_connect callback is called, then you take the
> appropriate actions.
>

Doesn't this also assume that
inbound_connect_event/outbound_connect_event get called after the
callback? (Not sure if this is what actually happens).

> Hope that helps,
> Pavlin
>

Thanks for the help, I'll try to use your suggestion and see how that goes.

Victor



More information about the Xorp-hackers mailing list