[Xorp-hackers] Socket polling

Pavlin Radoslavov pavlin at ICSI.Berkeley.EDU
Mon Feb 16 11:12:51 PST 2009


Victor Faion <vfaion at gmail.com> wrote:

> On Fri, Feb 13, 2009 at 19:56, Victor Faion <vfaion at gmail.com> wrote:
> > On Fri, Feb 13, 2009 at 16:28, Victor Faion <vfaion at gmail.com> wrote:
> >> On Thu, Feb 12, 2009 at 18:55, Pavlin Radoslavov
> >> <pavlin at icsi.berkeley.edu> wrote:
> >>> Victor Faion <vfaion at gmail.com> wrote:
> >>>
> >>>> Hello,
> >>>>
> >>>> I was trying to setup a process that tries connect to its neighbours
> >>>> over TCP and basically I wanted it to keep trying to connect to its
> >>>> neighbours until it can, but I was having some trouble as the process
> >>>> basically stops trying to connect when it can't connect the first
> >>>> time.
> >>>>
> >>>> I iterate over all the neighbour objects calling their connect
> >>>> function which calls send_tcp_open_bind_connect. The callback given to
> >>>> send_tcp_open_bind_connect just checks if there was an error and if
> >>>> there was it calls connectRetry() which pretty much does the same
> >>>> thing as connect (calls send_tcp_open_bind_connect and passes it the
> >>>> same callback as connect). The problem is the first time when it calls
> >>>> connect and fails, it just calls the socket4_user_0_1_error_event
> >>>> function (saying ``Transport endpoint is not connected'' which is
> >>>> expected) but then it doesn't go back into connectRetry() and no
> >>>> connection is made when its neighbours are actually listening for this
> >>>> connection. Is there a better/easier way of doing this polling or am I
> >>>> just doing the recursing with the callback the wrong way?
> >>>
> >>> Is connectRetry() a method in your protocol?
> >>>
> >>
> >>
> >> Yeah, connect() takes in the parameters needed to call
> >> send_tcp_open_bind_connect() and saves them into the Neighbour object.
> >> Then connectRetry() uses the cached values to call
> >> send_tcp_open_bind_connect() if it fails the first time.
> >>
> >>
> >>> In your event handler for socket4_user_0_1_error_event you need to
> >>> handle the error conditions (e.g., schedule a call to
> >>> connectRetry()).
> >>>
> >>
> >>
> >> I tried to avoid this as this means iterating over all the neighbours
> >> again, checking each sockid and matching against the sockid received
> >> in socket4_user_0_1_error_event to figure out which neighbour's
> >> connect function to call again. Anyway I tried doing it like this but
> >> it still doesn't repeatedly try to connect to a neighbour. It goes in
> >> this order:
> >>
> >> 1. Try to connect normally using the neighbour's conect() (shouldn't be able to)
> >>
> >> 2. Callback for send_tcp_open_bind_connect gets called (and the
> >> XrlError object received is XrlError::OKAY() for some reason)
> >>
> >> 3. socketx_user_0_1_error_event gets called and says ``Transport
> >> endpoint is not connected fatal''
> >>
> >> 4. Then socketx_user_0_1_error_event iterates over the neighbours,
> >> when it matches the one which has the sockid that
> >> socketx_user_0_1_error_event received it calls connect() again.
> >>
> >> 5. Then I get a warning that says ``Handling method for
> >> socket4_user/0.1/error_event failed: XrlCmdError 102 Command failed
> >> socket error''
> >>
> >> 6. Then the same thing as step 2 happens.
> >>
> >> The cycle ends there, connect() only gets called twice because
> >> socketx_user_0_1_error_event only gets called once. Not sure why this
> >> happens, something to do with that warning. Why does that happen
> >> though?
> >>
> >>
> >>> Also, are you saying that the first time you call
> >>> send_tcp_open_bind_connect() and it fails, the callback for that XRL
> >>> is not called at all? I would guess the callback might be called
> >>> after socket4_user_0_1_error_event is received, but I wouldn't bet
> >>> on the ordering.
> >>>
> >>> Pavlin
> >>>
> >>
> >>
> >> Well the callback gets called but the problem is that I'm not sure
> >> which of the callback and the error event handler get called last in
> >> order to reschedule the connecting.
> >>
> >> Victor
> >>
> >
> >
> > Sorry the reason for step 5 above was because my
> > socket4_user_0_1_error_event was returning
> > XrlCmdError::COMMAND_FAILED("socket error"). However when I changed it
> > to return XrlCmdError::OKAY() basically it goes through steps 1-4 from
> > above except sometimes, it doesn't happen in the order above but in
> > the order 1, 3, 4, 2. When this happens it ends in step 2 and a
> > connection is not made. This happens because the callback sets the
> > sockid of the neighbour when a connection attempt is made, and the
> > error handler uses this sockid to know which neighbour to connect to.
> > So when the new sockid doesn't get set, the error handler doesn't find
> > the neighbour. Not sure how to get the new sockid into the event
> > handler when it doesn't get set into the callback.
> >
> 
> 
> Hello,
> 
> Sorry to restart this thread, I'm not sure how to handle the case when
> a router cannot connect to another router. I don't understand why when
> I call send_tcp_open_bind_connect to another router (which isn't even
> online) the callback to send_tcp_open_bind_connect receives
> XrlError::OKAY(). I wanted to handle this error in the callback as I
> don't have enough information to handle it in the
> socket4_user_0_1_error_event function. I couldn't find any code in
> XORP that does this sort of thing. Where do the errors that get passed
> into the callback for send_tcp_open_bind_connect get set?

For a reason that is unclear to me without further investigation,
the order of the send_tcp_open_bind_connect callback and the
socket4_user_0_1_error_event upcall are reversed
(always/occasionally?). I had a quick look in the FEA, and the
callback should be received first, but obviously from your
description this doesn't seem to be the case.
The correct solution should be to investigate the issue and fix it.
This might require understanding of the FEA I/O internals and some
XRL-related knowledge. Unfortunately, I can't give you an estimate
how soon I/we can allocate the resources to fix that, so your best
bet would be to submit a Bugzilla entry.

For your own purpose you need to move forward by using some
workaround. One possible solution that comes to mind is to have a
map of states per sockid that can be populated/updated regardless of
the order of the callbacks and the upcalls. E.g., if an upcall is
received before the sockid is known, a new entry is created for that
sockid and the state is set according to the upcall error. Then,
after the send_tcp_open_bind_connect callback is invoked, at that
time the sockid entry can be used for its intended purpose (and the
error condition is already filled-in).
On the other hand, if the upcall is inbound_connect_event or
outbound_connect_event (instead of error_event), then only after the
send_tcp_open_bind_connect callback is called, then you take the
appropriate actions.

Hope that helps,
Pavlin



More information about the Xorp-hackers mailing list