From bms at incunabulum.net  Thu Oct  1 03:06:36 2009
From: bms at incunabulum.net (Bruce Simpson)
Date: Thu, 01 Oct 2009 11:06:36 +0100
Subject: [Xorp-hackers] OLSR assert
In-Reply-To: <4AC3CEA6.5040207@candelatech.com>
References: <4AC3B065.3070300@candelatech.com>
	<4AC3C015.5070703@incunabulum.net> <4AC3C3B7.40204@candelatech.com>
	<4AC3C60A.7060708@incunabulum.net>
	<4AC3CEA6.5040207@candelatech.com>
Message-ID: <4AC47F2C.8010903@incunabulum.net>

Ben Greear wrote:
>
> Here's an attached patch that seems to fix things.  I believe the main 
> error
> was checking for (!is_mpr()) in consider_remaining_cand_mprs
>
> I can't see why that check helps anything, and it was excluding from 
> consideration the mpr
> that was needed to find the 2-hop neighbor in my setup.

I'm not 100% sure about this. It's been a long time since that code was 
written, so I'm hazy on details. [..reads code..]

Good catch. I'd conservatively check it in, given that the real hard 
work of MPR set computation in OLSR, is in fact in minimizing the set. 
As you can see, OLSR is tricky to do in an event-driven way, and it's 
easy to introduce bugs.

The bug is (in English): Just because a node was selected to cover a 
poorly covered N2, should not exclude it from consideration for other N2.

The is_mpr flag is cleared on every new MPR recount. It should only be 
set by the MPR recount code. The check for !is_mpr() was probably there 
as an optimization against the work already done by the 
consider_poorly_covered_twohops() and consider_persistent_cand_mprs().

Yes, this could cause otherwise valid MPRs to be skipped in 
Neighborhood::consider_remaining_cand_mprs(), given that all MPRs for a 
subset of N2 have to be considered anyway; the notion of 'persistent' 
only really applies to N (- WILL_ALWAYS.

When considering all other candidate MPRs, the CandMprOrderPred will 
return the first (highest) match anyway. Multiple candidates get 
filtered out when the MPR set is later minimized anyway. It might be 
better just to get rid of consider_persistent_cand_mprs() in this case.

later,
BMS


From bms at incunabulum.net  Thu Oct  1 03:32:45 2009
From: bms at incunabulum.net (Bruce Simpson)
Date: Thu, 01 Oct 2009 11:32:45 +0100
Subject: [Xorp-hackers] OLSR assert
In-Reply-To: <4AC3C015.5070703@incunabulum.net>
References: <4AC3B065.3070300@candelatech.com>
	<4AC3C015.5070703@incunabulum.net>
Message-ID: <4AC4854D.7050908@incunabulum.net>

Bruce Simpson wrote:
> Ben Greear wrote:
>   
>> The reset_twohop_mpr_state counts neighbors that are strict and reachable.
>> But, the consider_poorly_covered method checks for reachability == 1.
>> In the log below, neighbor 10.7.7.7 is not counted in poorly_covered.
>> Should we maybe check for reachability() > 0 instead of == 1?
>>   
>>     
>
> Off the top of my head, for classical OLSR, as specified in the RFC, it 
> needs to be covered by a minimum of 1 neighbour, in terms of links.
>
> I don't have the code in front of me, obviously a test of reachability 
> == 1 would be naive. If the fix is that simple, that's great.
>   

This is logically correct, a poorly covered N2 is one which has 
reachability of 1. When computing the MPR set, N which are the only 
means of reaching those N2 need to be considered first.

It's the is_essential_mpr() predicate (within minimize_mpr_set()) which 
is responsible for making sure that those critical links aren't thrown 
out, when pruning the MPR set to reduce flooding.

Most of the work involved in computing MPRs upfront is done to limit 
(minimize?) the work minimize_mpr_set() has to do.


From bms at incunabulum.net  Thu Oct  1 04:11:21 2009
From: bms at incunabulum.net (Bruce Simpson)
Date: Thu, 01 Oct 2009 12:11:21 +0100
Subject: [Xorp-hackers] valgrind:  selector.cc:  Reading free'd memory
In-Reply-To: <4AC38E63.6030308@candelatech.com>
References: <4AC2AA1F.1080308@candelatech.com>
	<4AC2BFEC.6010802@candelatech.com>
	<4AC324FE.7010700@incunabulum.net>
	<4AC37746.2080004@candelatech.com>
	<4AC37E69.4040407@incunabulum.net>
	<4AC38E63.6030308@candelatech.com>
Message-ID: <4AC48E59.9080405@incunabulum.net>

Ben Greear wrote:
>
> The problem is that a method called by an object can cause that object
> to be deleted, and when that method continues, it is accessing deleted
> memory.

SelectorList::Node::run_hooks(), right? That one *is* nasty... (re comment)

WinDispatcher doesn't have this problem; there, the callbacks are held 
in separate maps, and  the ref_ptr for the callback protects the 
callback itself; where multiple dispatches are taking place within a 
for-block, the iterators involved are protected also.

SelectorList::Node does not have such protection -- it's entirely 
possible that the callback will go off and try to remove an event, but 
as soon as it does, it can invalidate the SelectorList::Node. The 
protection in run_hooks() seems insufficient...

Are there specific places where this is triggered? The comment would 
seem to indicate it's only an issue if more than one callback runs on 
the same FD, which is certainly possible even if they're *not* for the 
same IoEventType.


From lizhaous2000 at yahoo.com  Thu Oct  1 06:27:19 2009
From: lizhaous2000 at yahoo.com (Li Zhao)
Date: Thu, 1 Oct 2009 06:27:19 -0700 (PDT)
Subject: [Xorp-hackers] rtrmgr restart
Message-ID: <321304.32172.qm@web58703.mail.re1.yahoo.com>

Correct me if I am wrong. When router is dying, rtrmgr is not terminating
other processes gracefully. After the router is coming back to live, the
lastest running can not be picked up. The only way I can think of to save
the running config is through xorpsh (some scripts), but I can not make
this script called successfully when rtrmgr is dying.

Thanks.

Li


From greearb at candelatech.com  Thu Oct  1 09:03:46 2009
From: greearb at candelatech.com (Ben Greear)
Date: Thu, 01 Oct 2009 09:03:46 -0700
Subject: [Xorp-hackers] valgrind:  selector.cc:  Reading free'd memory
In-Reply-To: <4AC48E59.9080405@incunabulum.net>
References: <4AC2AA1F.1080308@candelatech.com>
	<4AC2BFEC.6010802@candelatech.com>
	<4AC324FE.7010700@incunabulum.net>
	<4AC37746.2080004@candelatech.com>
	<4AC37E69.4040407@incunabulum.net>
	<4AC38E63.6030308@candelatech.com>
	<4AC48E59.9080405@incunabulum.net>
Message-ID: <4AC4D2E2.5090302@candelatech.com>

On 10/01/2009 04:11 AM, Bruce Simpson wrote:
> Ben Greear wrote:
>>
>> The problem is that a method called by an object can cause that object
>> to be deleted, and when that method continues, it is accessing deleted
>> memory.
>
> SelectorList::Node::run_hooks(), right? That one *is* nasty... (re comment)
>
> WinDispatcher doesn't have this problem; there, the callbacks are held
> in separate maps, and the ref_ptr for the callback protects the callback
> itself; where multiple dispatches are taking place within a for-block,
> the iterators involved are protected also.
>
> SelectorList::Node does not have such protection -- it's entirely
> possible that the callback will go off and try to remove an event, but
> as soon as it does, it can invalidate the SelectorList::Node. The
> protection in run_hooks() seems insufficient...
>
> Are there specific places where this is triggered? The comment would
> seem to indicate it's only an issue if more than one callback runs on
> the same FD, which is certainly possible even if they're *not* for the
> same IoEventType.

As soon as the memory is free'd due to resize, all bets are off and the
loop might think it should continue even when it shouldn't have because
something else has acquired and written to the memory before the loop
completes.

We have to ensure that the Node memory can never be deleted while that method
is running.  My work-around solves this for any sane amount of file-descriptors
(up to 1024).

My patch is better than what previously existed, but some day we could revisit
the whole logic in that area perhaps.

Thanks,
Ben

-- 
Ben Greear <greearb at candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


From greearb at candelatech.com  Thu Oct  1 15:49:56 2009
From: greearb at candelatech.com (Ben Greear)
Date: Thu, 01 Oct 2009 15:49:56 -0700
Subject: [Xorp-hackers] PATCH: XrlRouter timeout needs to be allowed higher.
Message-ID: <4AC53214.3020600@candelatech.com>

Please note that the old 'max timeout' that you could set as an eviron variable
was only 6 seconds.  This is less than the default of 30 seconds, which makes
no sense at all.

The attached patch fixes this:

Give user a better clue as to why xrl router timed out.
Allow user to set up to 2 minute timeout..helps with running
lots of instances under valgrind and other strange things.

Thanks,
Ben

-- 
Ben Greear <greearb at candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: xrl_router_timeout.patch
Url: http://mailman.ICSI.Berkeley.EDU/pipermail/xorp-hackers/attachments/20091001/88394222/attachment.ksh 

From greearb at candelatech.com  Thu Oct  1 17:13:36 2009
From: greearb at candelatech.com (Ben Greear)
Date: Thu, 01 Oct 2009 17:13:36 -0700
Subject: [Xorp-hackers] PATCH:  Fix uninitialized memory, found by valgrind
Message-ID: <4AC545B0.8080403@candelatech.com>

This patch fixes some errors relating to not initializing memory
properly.  I found these by using valgrind.

Thanks,
Ben

-- 
Ben Greear <greearb at candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: xorp_uninit_memory.patch
Url: http://mailman.ICSI.Berkeley.EDU/pipermail/xorp-hackers/attachments/20091001/fda8ef09/attachment.ksh 

From bms at incunabulum.net  Fri Oct  2 04:34:54 2009
From: bms at incunabulum.net (Bruce Simpson)
Date: Fri, 02 Oct 2009 12:34:54 +0100
Subject: [Xorp-hackers] PATCH: XrlRouter timeout needs to be allowed
	higher.
In-Reply-To: <4AC53214.3020600@candelatech.com>
References: <4AC53214.3020600@candelatech.com>
Message-ID: <4AC5E55E.1020803@incunabulum.net>

I've checked in the logic part of this patch. Thanks!


From bms at incunabulum.net  Fri Oct  2 04:42:24 2009
From: bms at incunabulum.net (Bruce Simpson)
Date: Fri, 02 Oct 2009 12:42:24 +0100
Subject: [Xorp-hackers] Stub licensing
In-Reply-To: <4AC3EFFA.8040705@incunabulum.net>
References: <4AC2AA1F.1080308@candelatech.com>	<4AC2BFEC.6010802@candelatech.com>	<4AC324FE.7010700@incunabulum.net>	<4AC37746.2080004@candelatech.com>	<4AC37E69.4040407@incunabulum.net>	<4AC38E63.6030308@candelatech.com>	<4AC3C4C5.7040307@incunabulum.net>	<4AC3C8EC.7090001@candelatech.com>
	<4AC3EFFA.8040705@incunabulum.net>
Message-ID: <4AC5E720.3060407@incunabulum.net>

Bruce Simpson wrote:
> ...
> The scope of the GPL was purely limited to individual routing processes, 
> not the core libraries, which are LGPL. The XRL RPC stubs don't actually 
> have an explicit license, and should probably be updated to reflect 
> either LGPL or public domain status.
>   

Correction: The generated RPC stubs contain a reference to the LGPL, but 
don't embed the license text itself.


From greearb at candelatech.com  Fri Oct  2 11:34:24 2009
From: greearb at candelatech.com (Ben Greear)
Date: Fri, 02 Oct 2009 11:34:24 -0700
Subject: [Xorp-hackers] Question on startup errors/warnings.
Message-ID: <4AC647B0.4060007@candelatech.com>

I'm crawling through xorp logs trying to clean or explain xorp errors.

Any idea what this indicates?

[ 2009/10/02 11:21:53  WARNING xorp_rtrmgr:28398 XrlFinderTarget obj/x86_64-linux-public17/xrl/targets/finder_base.cc:715 
handle_finder_event_notifier_0_1_register_class_event_interest ] Handling method for finder_event_notifier/0.1/register_class_event_interest failed: XrlCmdError 
102 Command failed failed to add watch
[ 2009/10/02 11:21:53  ERROR xorp_rtrmgr:28398 RTRMGR rtrmgr/xrl_rtrmgr_interface.cc:334 finder_register_done ] Failed to register with finder about XRL 
xorpsh-28454-i7-dqc-1 (err: Command failed)
[ 2009/10/02 11:21:53  INFO xorp_rtrmgr:28398 RTRMGR rtrmgr/module_manager.cc:101 execute ] Executing module: igmp (mld6igmp/xorp_igmp)
[ 2009/10/02 11:21:53  WARNING xorp_rtrmgr:28398 XrlFinderTarget obj/x86_64-linux-public17/xrl/targets/finder_base.cc:453 handle_finder_0_2_resolve_xrl ] 
Handling method for finder/0.2/resolve_xrl failed: XrlCmdError 102 Command failed Target "IGMP" does not exist or is not enabled.
[ 2009/10/02 11:21:53  WARNING xorp_rtrmgr:28398 RTRMGR rtrmgr/task.cc:212 xrl_done ] Failed to receive reply, code: 201 Resolve failed  retries: 0  max_retries: 30


Is this a real error, or just complaints because everything hasn't properly started yet?

Thanks,
Ben

-- 
Ben Greear <greearb at candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


From bms at incunabulum.net  Sat Oct  3 03:30:16 2009
From: bms at incunabulum.net (Bruce Simpson)
Date: Sat, 03 Oct 2009 11:30:16 +0100
Subject: [Xorp-hackers] valgrind:  selector.cc:  Reading free'd memory
In-Reply-To: <4AC4D2E2.5090302@candelatech.com>
References: <4AC2AA1F.1080308@candelatech.com>
	<4AC2BFEC.6010802@candelatech.com>
	<4AC324FE.7010700@incunabulum.net>
	<4AC37746.2080004@candelatech.com>
	<4AC37E69.4040407@incunabulum.net>
	<4AC38E63.6030308@candelatech.com>
	<4AC48E59.9080405@incunabulum.net>
	<4AC4D2E2.5090302@candelatech.com>
Message-ID: <4AC727B8.7030705@incunabulum.net>

Ben Greear wrote:
>
> We have to ensure that the Node memory can never be deleted while that 
> method
> is running.  My work-around solves this for any sane amount of 
> file-descriptors
> (up to 1024).

I've committed the part of the change which preallocates 
_selector_entries, but limited it to 256 file descriptors to keep the 
memory wastage down. thanks!


From bms at incunabulum.net  Sat Oct  3 03:33:10 2009
From: bms at incunabulum.net (Bruce Simpson)
Date: Sat, 03 Oct 2009 11:33:10 +0100
Subject: [Xorp-hackers] Question on startup errors/warnings.
In-Reply-To: <4AC647B0.4060007@candelatech.com>
References: <4AC647B0.4060007@candelatech.com>
Message-ID: <4AC72866.2020206@incunabulum.net>

Ben Greear wrote:
> I'm crawling through xorp logs trying to clean or explain xorp errors.
>
> Any idea what this indicates?
>
> [ 2009/10/02 11:21:53  WARNING xorp_rtrmgr:28398 XrlFinderTarget obj/x86_64-linux-public17/xrl/targets/finder_base.cc:715 
> handle_finder_event_notifier_0_1_register_class_event_interest ] Handling method for finder_event_notifier/0.1/register_class_event_interest failed: XrlCmdError 
> 102 Command failed failed to add watch
> [ 2009/10/02 11:21:53  ERROR xorp_rtrmgr:28398 RTRMGR rtrmgr/xrl_rtrmgr_interface.cc:334 finder_register_done ] Failed to register with finder about XRL 
> xorpsh-28454-i7-dqc-1 (err: Command failed)
>   

This could just be xorpsh startup racing with the Router Manager 
finishing its initial configuration tree pass.

> [ 2009/10/02 11:21:53  INFO xorp_rtrmgr:28398 RTRMGR rtrmgr/module_manager.cc:101 execute ] Executing module: igmp (mld6igmp/xorp_igmp)
> [ 2009/10/02 11:21:53  WARNING xorp_rtrmgr:28398 XrlFinderTarget obj/x86_64-linux-public17/xrl/targets/finder_base.cc:453 handle_finder_0_2_resolve_xrl ] 
> Handling method for finder/0.2/resolve_xrl failed: XrlCmdError 102 Command failed Target "IGMP" does not exist or is not enabled.
> [ 2009/10/02 11:21:53  WARNING xorp_rtrmgr:28398 RTRMGR rtrmgr/task.cc:212 xrl_done ] Failed to receive reply, code: 201 Resolve failed  retries: 0  max_retries: 30
>
>
> Is this a real error, or just complaints because everything hasn't properly started yet?
>   

This could be the same situation with the igmp child process. I've seen 
similar log verbiage when there are debug hooks active in the system.


From bms at incunabulum.net  Sat Oct  3 03:41:49 2009
From: bms at incunabulum.net (Bruce Simpson)
Date: Sat, 03 Oct 2009 11:41:49 +0100
Subject: [Xorp-hackers] Patch to update build notes slightly
In-Reply-To: <4AC1126D.4030907@candelatech.com>
References: <4AC1126D.4030907@candelatech.com>
Message-ID: <4AC72A6D.8020108@incunabulum.net>

An appropriate update has been committed for now.


From bms at incunabulum.net  Sat Oct  3 04:10:33 2009
From: bms at incunabulum.net (Bruce Simpson)
Date: Sat, 03 Oct 2009 12:10:33 +0100
Subject: [Xorp-hackers] PATCH:  Fix uninitialized memory,
	found by valgrind
In-Reply-To: <4AC545B0.8080403@candelatech.com>
References: <4AC545B0.8080403@candelatech.com>
Message-ID: <4AC73129.4060905@incunabulum.net>

Ben Greear wrote:
> This patch fixes some errors relating to not initializing memory
> properly.  I found these by using valgrind.

A few questions/points:

* Why is the initializer for TransactionManager::_next_tid required? 
This integer key is never exposed outside of TransactionManager, and the 
std::map it indexes doesn't make any assumptions about the key space. 
Can you provide the valgrind hit?

* Why is the initializer for IfConfigTransactionManager::_tid_exec 
required? This member is only referenced in two places: when it's set on 
the pre_commit, and when the operation result callback fires, it gets 
passed by value. There are other places in the FEA using the 
TransactionManager. Are they also affected/is there coverage?

* Can you provide the valgrind hits which are fixed by the memset() 
calls in io_ip_socket.cc?

The CMSG macros should notice if a buffer, passed to a socket call, 
didn't return any data. If they aren't, that could be a bug elsewhere.

We really need to understand the problems these fixes address before 
taking them. It is normally good practice to clear buffers, when needed, 
but it's OK to omit that step for performance if and only if it doesn't 
cause stale state to get picked up.

cheers,
BMS


From greearb at candelatech.com  Sat Oct  3 08:43:26 2009
From: greearb at candelatech.com (Ben Greear)
Date: Sat, 03 Oct 2009 08:43:26 -0700
Subject: [Xorp-hackers] valgrind:  selector.cc:  Reading free'd memory
In-Reply-To: <4AC727B8.7030705@incunabulum.net>
References: <4AC2AA1F.1080308@candelatech.com>
	<4AC2BFEC.6010802@candelatech.com>
	<4AC324FE.7010700@incunabulum.net>
	<4AC37746.2080004@candelatech.com>
	<4AC37E69.4040407@incunabulum.net>
	<4AC38E63.6030308@candelatech.com>
	<4AC48E59.9080405@incunabulum.net>
	<4AC4D2E2.5090302@candelatech.com>
	<4AC727B8.7030705@incunabulum.net>
Message-ID: <4AC7711E.3080702@candelatech.com>

Bruce Simpson wrote:
> Ben Greear wrote:
>>
>> We have to ensure that the Node memory can never be deleted while 
>> that method
>> is running.  My work-around solves this for any sane amount of 
>> file-descriptors
>> (up to 1024).
>
> I've committed the part of the change which preallocates 
> _selector_entries, but limited it to 256 file descriptors to keep the 
> memory wastage down. thanks!
Hopefully no one will ever get a descriptor bigger than 256!  It should 
certainly be better than before, but I'm
going to leave my tree at 1024 since I open a descriptor per interface 
and sometimes run lots of protocols.

Thanks,
Ben

-- 
Ben Greear <greearb at candelatech.com> 
Candela Technologies Inc  http://www.candelatech.com


From greearb at candelatech.com  Sat Oct  3 08:55:51 2009
From: greearb at candelatech.com (Ben Greear)
Date: Sat, 03 Oct 2009 08:55:51 -0700
Subject: [Xorp-hackers] PATCH:  Fix uninitialized memory,
	found by valgrind
In-Reply-To: <4AC73129.4060905@incunabulum.net>
References: <4AC545B0.8080403@candelatech.com>
	<4AC73129.4060905@incunabulum.net>
Message-ID: <4AC77407.2050205@candelatech.com>

Bruce Simpson wrote:
> Ben Greear wrote:
>> This patch fixes some errors relating to not initializing memory
>> properly.  I found these by using valgrind.
>
> A few questions/points:
>
> * Why is the initializer for TransactionManager::_next_tid required? 
> This integer key is never exposed outside of TransactionManager, and 
> the std::map it indexes doesn't make any assumptions about the key 
> space. Can you provide the valgrind hit?
>
> * Why is the initializer for IfConfigTransactionManager::_tid_exec 
> required? This member is only referenced in two places: when it's set 
> on the pre_commit, and when the operation result callback fires, it 
> gets passed by value. There are other places in the FEA using the 
> TransactionManager. Are they also affected/is there coverage?
>
> * Can you provide the valgrind hits which are fixed by the memset() 
> calls in io_ip_socket.cc?
>
> The CMSG macros should notice if a buffer, passed to a socket call, 
> didn't return any data. If they aren't, that could be a bug elsewhere.
>
> We really need to understand the problems these fixes address before 
> taking them. It is normally good practice to clear buffers, when 
> needed, but it's OK to omit that step for performance if and only if 
> it doesn't cause stale state to get picked up.
Run rtrmgr under valgrind with OSPF (though it's not OSPF related), and 
you should see these errors.

I don't think any of them are critical, but they make valgrind noisy so 
you can't see other errors that
might be real.  At any rate, it isn't clean code to leave member 
variables un-initialized.  It's just asking
for weird problems some day with someone starts using the variables 
differently.

The changes are not in any hot path, so they are not going to hurt any 
performance.

Here's my valgrind start command:

valgrind --trace-children=yes 
--log-file=valgrind_xorp_$XORP_FINDER_SERVER_PORT.%p.txt --leak-check=full
 --track-origins=yes --track-fds=yes xorp_rtrmgr -p 
$XORP_FINDER_SERVER_PORT -b $CFG_FILE -P $PIDFILE.rtrmgr

Thanks,
Ben

>
> cheers,
> BMS


-- 
Ben Greear <greearb at candelatech.com> 
Candela Technologies Inc  http://www.candelatech.com


From greearb at candelatech.com  Mon Oct  5 11:31:35 2009
From: greearb at candelatech.com (Ben Greear)
Date: Mon, 05 Oct 2009 11:31:35 -0700
Subject: [Xorp-hackers] PATCH:  Remove some dead code,
	unlink pid-file on exit.
Message-ID: <4ACA3B87.70309@candelatech.com>

This is mostly just a cleanup patch.  It removes some dead code and
changes around the pidfile logic a bit.  It also allows unlinking the
pid-file on exit using the atexit call.  Tested on Linux.

Thanks,
Ben

-- 
Ben Greear <greearb at candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: unlink_pidfile.patch
Url: http://mailman.ICSI.Berkeley.EDU/pipermail/xorp-hackers/attachments/20091005/119c5b08/attachment.ksh 

From greearb at candelatech.com  Mon Oct  5 14:54:35 2009
From: greearb at candelatech.com (Ben Greear)
Date: Mon, 05 Oct 2009 14:54:35 -0700
Subject: [Xorp-hackers] rtrmgr and TaskManager
Message-ID: <4ACA6B1B.8020604@candelatech.com>

I'm trying to figure out why xorpsh commits take so long, and in doing so,
I'm trying to understand the TaskManager.

There is one part that is particularly confusing:

There is a _completion_cb that is assigned when a task is queued up, but
the next task to run isn't necessarily that task if there are other higher
priority tasks running.

Seems like that could be a problem to me?

Thanks,
Ben

-- 
Ben Greear <greearb at candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


From greearb at candelatech.com  Mon Oct  5 15:34:51 2009
From: greearb at candelatech.com (Ben Greear)
Date: Mon, 05 Oct 2009 15:34:51 -0700
Subject: [Xorp-hackers] rtrmgr and TaskManager
In-Reply-To: <4ACA6B1B.8020604@candelatech.com>
References: <4ACA6B1B.8020604@candelatech.com>
Message-ID: <4ACA748B.6070308@candelatech.com>

On 10/05/2009 02:54 PM, Ben Greear wrote:
> I'm trying to figure out why xorpsh commits take so long, and in doing so,
> I'm trying to understand the TaskManager.
>
> There is one part that is particularly confusing:
>
> There is a _completion_cb that is assigned when a task is queued up, but
> the next task to run isn't necessarily that task if there are other higher
> priority tasks running.
>
> Seems like that could be a problem to me?

Well, here's the slow-down I'm seeing..but WTF would someone add a 1-second
sleep here???

task.cc:  XrlStatusValidation::validate
    } else {
	//
	// When we're running with do_exec == false, we want to
	// exercise most of the same machinery, but we want to ensure
	// that the xrl_done response gets the right arguments even
	// though we're not going to call the XRL.
	//
	_retry_timer = eventloop().new_oneoff_after_ms(1000,
			callback(this, &XrlStatusValidation::dummy_response));
     }
}


-- 
Ben Greear <greearb at candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


From greearb at candelatech.com  Mon Oct  5 15:58:33 2009
From: greearb at candelatech.com (Ben Greear)
Date: Mon, 05 Oct 2009 15:58:33 -0700
Subject: [Xorp-hackers] PATCH:  Logging improvements,
	fix artificial deal for xorpsh commit.
Message-ID: <4ACA7A19.30909@candelatech.com>

The attached patch has these improvements:

1)  Fix logging & tracing to show micro-seconds, greatly aids debugging performance issues.

2)  Change some pass-by-value string arguments to const string& in router-mgr.  This will improve
     performance and a small bit of memory usage.

3)  Remove 1 second timeout in 'commit' path.  At best, the timeout might have worked around
     a race condition, but I can see no reason to leave it in.  I tested with it set to zero
     timeout and things work fine.  This makes commits last around 200ms instead of 1.2ms, which is
     a big improvement when scripting xorp.

Thanks,
Ben

-- 
Ben Greear <greearb at candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: logging_commit_timeout.patch
Url: http://mailman.ICSI.Berkeley.EDU/pipermail/xorp-hackers/attachments/20091005/b39b9a69/attachment.ksh 

From greearb at candelatech.com  Mon Oct  5 16:42:36 2009
From: greearb at candelatech.com (Ben Greear)
Date: Mon, 05 Oct 2009 16:42:36 -0700
Subject: [Xorp-hackers] PATCH:  Fix commit failure on device removal race,
	related to IGMP.
Message-ID: <4ACA846C.7040908@candelatech.com>

If an interface is removed from the system, then you can no longer remove
it from xorp igmp configuration because the commit will fail (due to
lack of vif).  This is a race of some sort or another, and was fairly difficult
to reproduce even on our setup.

Here's the fix:

*  Don't fail vif_stop in Mld6igmpNode::stop_vif if the interface is already removed.
    Log the inconsistency, but return XORP_OK so the commit can continue.
    This is similar to code I've had in 'mfea_node' for several years.

Thanks,
Ben

-- 
Ben Greear <greearb at candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: igmp_commit_fail.patch
Url: http://mailman.ICSI.Berkeley.EDU/pipermail/xorp-hackers/attachments/20091005/0c356418/attachment.ksh 

From greearb at candelatech.com  Mon Oct  5 21:26:41 2009
From: greearb at candelatech.com (Ben Greear)
Date: Mon, 05 Oct 2009 21:26:41 -0700
Subject: [Xorp-hackers] PATCH: Don't fail commit on multicast address
	removal failure.
Message-ID: <4ACAC701.1070300@candelatech.com>

This patch fixes a bug where a commit can fail if the multicast 
addresses trying to be
removed are already gone (probably because an entire network device 
disappeared
shortly ago).  If it's already gone, log a warning, but don't fail the 
commit.

-- 
Ben Greear <greearb at candelatech.com> 
Candela Technologies Inc  http://www.candelatech.com


-------------- next part --------------
A non-text attachment was scrubbed...
Name: multicast_fea_rm.patch
Type: text/x-patch
Size: 1821 bytes
Desc: not available
Url : http://mailman.ICSI.Berkeley.EDU/pipermail/xorp-hackers/attachments/20091005/149b009a/attachment.bin 

From greearb at candelatech.com  Mon Oct  5 21:31:02 2009
From: greearb at candelatech.com (Ben Greear)
Date: Mon, 05 Oct 2009 21:31:02 -0700
Subject: [Xorp-hackers] PATCH:  Add startup methods for faster startup.
Message-ID: <4ACAC806.1000000@candelatech.com>

If there is no status and no startup method in a xorp target, the 
router-mgr uses a 2-second
sleep for 'verification'.  This slows down startup of Xorp quite a bit 
when you have lots
of protocols running.

This patch adds startup methods to many of the common targets.  There 
are still more to
go, however.

Thanks,
Ben

-- 
Ben Greear <greearb at candelatech.com> 
Candela Technologies Inc  http://www.candelatech.com


-------------- next part --------------
A non-text attachment was scrubbed...
Name: startup_methods.patch
Type: text/x-patch
Size: 6324 bytes
Desc: not available
Url : http://mailman.ICSI.Berkeley.EDU/pipermail/xorp-hackers/attachments/20091005/9098b29a/attachment-0001.bin 

From bms at incunabulum.net  Tue Oct  6 04:50:57 2009
From: bms at incunabulum.net (Bruce Simpson)
Date: Tue, 06 Oct 2009 12:50:57 +0100
Subject: [Xorp-hackers] PATCH:  Add startup methods for faster startup.
In-Reply-To: <4ACAC806.1000000@candelatech.com>
References: <4ACAC806.1000000@candelatech.com>
Message-ID: <4ACB2F21.7080603@incunabulum.net>

Ben Greear wrote:
> If there is no status and no startup method in a xorp target, the 
> router-mgr uses a 2-second
> sleep for 'verification'.  This slows down startup of Xorp quite a bit 
> when you have lots
> of protocols running.
>
> This patch adds startup methods to many of the common targets.  There 
> are still more to
> go, however.

Thanks for tracking this down; yes, I've noticed that process startup is 
slower than it could be, but have only had free time / mindspace to look 
at the XRL specifics.

    Could this be made a more general change? If the XIF method for 
startup you are adding is not specific to a particular protocol, it 
might be an idea to make it part of the common.xif -- which is where 
most of the process control knobs are.
    I'd rather not get too far into the machinery here, because I'm 
about to take a badly needed break. I guess the firewall and ifmgr 
modules are a special case, because they're separate service bundles 
located in the FEA process.

On a more general note:
    One of the things Pavlin raised in an old BugZilla ticket, is the 
fact that the Router Manager is fairly complex because it implements 
transactions on the config tree itself.
    If this is pushed into the protocols themselves (they'd have to keep 
their own config snapshot, and adopt a commit-rollback transaction model 
in the XIF RPC interfaces), then the Router Manager gets a bit simpler 
overall.

cheers,
BMS


From bms at incunabulum.net  Tue Oct  6 05:44:02 2009
From: bms at incunabulum.net (Bruce Simpson)
Date: Tue, 06 Oct 2009 13:44:02 +0100
Subject: [Xorp-hackers] PATCH:  Logging improvements,
 fix artificial deal for xorpsh commit.
In-Reply-To: <4ACA7A19.30909@candelatech.com>
References: <4ACA7A19.30909@candelatech.com>
Message-ID: <4ACB3B92.3050505@incunabulum.net>

Ben Greear wrote:
> The attached patch has these improvements:
>
> 1)  Fix logging & tracing to show micro-seconds, greatly aids 
> debugging performance issues.
> 2)  Change some pass-by-value string arguments to const string& in 
> router-mgr.  This will improve
>     performance and a small bit of memory usage.
> 3)  Remove 1 second timeout in 'commit' path.  At best, the timeout 
> might have worked around
>     a race condition, but I can see no reason to leave it in.  I 
> tested with it set to zero
>     timeout and things work fine.  This makes commits last around 
> 200ms instead of 1.2ms, which is
>     a big improvement when scripting xorp.

Comments:
   * It should be possible to turn off the millisecond logging if 
desired. Whilst it's certainly a useful feature to have when debugging 
time contingent code, it does add clutter to the output.
    * Perhaps putting it under the other debug knobs in SConstruct would 
be a good idea?
   * %llu is not a portable format specifier, and 'unsigned long long' 
is not a portable type, please don't use them in portable code.
  * Perhaps the code which prints the timeval is a candidate for a 
function like xlog_localtime2string_short() ?

  * xlog_localtime2string_short() is still defined in xlog.c; so why comment out its prototype, are you getting warnings from the compiler?
  * A XorpTimer of 0 is a possible candidate for a XorpTask. I can't really delve further into that change at the moment, though.
  * Yes, it may be useful to constify the string arguments in those callback functions, but this change considered low priority.
  * Please avoid introducing unnecessary whitespace changes in diffs.
  

Can you please raise a Trac item for these suggested improvements?
I probably won't have time to look at the Router Manager in detail for at least 4 weeks.

Sorry for the bureaucracy... I appreciate you're doing what you can in the here and now to improve the code, however, it makes reviewing patches and applying them that much easier, and we do need to keep the code alignment and type clean, etc.

thanks,
BMS


From bms at incunabulum.net  Tue Oct  6 05:45:33 2009
From: bms at incunabulum.net (Bruce Simpson)
Date: Tue, 06 Oct 2009 13:45:33 +0100
Subject: [Xorp-hackers] PATCH: Fix commit failure on device removal race,
 related to IGMP.
In-Reply-To: <4ACA846C.7040908@candelatech.com>
References: <4ACA846C.7040908@candelatech.com>
Message-ID: <4ACB3BED.2090100@incunabulum.net>

Ben,

Can you please raise a Trac ticket about this issue, and attach your patch?

Ben Greear wrote:
> If an interface is removed from the system, then you can no longer remove
> it from xorp igmp configuration because the commit will fail (due to
> lack of vif).  This is a race of some sort or another, and was fairly 
> difficult
> to reproduce even on our setup.
>
> Here's the fix:
>
> *  Don't fail vif_stop in Mld6igmpNode::stop_vif if the interface is 
> already removed.
>    Log the inconsistency, but return XORP_OK so the commit can continue.

Thank you
BMS


From bms at incunabulum.net  Tue Oct  6 05:47:09 2009
From: bms at incunabulum.net (Bruce Simpson)
Date: Tue, 06 Oct 2009 13:47:09 +0100
Subject: [Xorp-hackers] PATCH: Don't fail commit on multicast address
 removal failure.
In-Reply-To: <4ACAC701.1070300@candelatech.com>
References: <4ACAC701.1070300@candelatech.com>
Message-ID: <4ACB3C4D.3020304@incunabulum.net>

Hi Ben,

Thanks for your patch.

Ben Greear wrote:
> This patch fixes a bug where a commit can fail if the multicast 
> addresses trying to be
> removed are already gone (probably because an entire network device 
> disappeared
> shortly ago).  If it's already gone, log a warning, but don't fail the 
> commit.

Can you please attach this to a Trac ticket for the interface removal 
condition?

It would be easier to tackle this on the basis of 'this is a specific 
problem which needs to be solved', rather than doing piecemeal commits.

thanks,
BMS


From bms at incunabulum.net  Tue Oct  6 06:07:43 2009
From: bms at incunabulum.net (Bruce Simpson)
Date: Tue, 06 Oct 2009 14:07:43 +0100
Subject: [Xorp-hackers] PATCH:  Remove some dead code,
 unlink pid-file on exit.
In-Reply-To: <4ACA3B87.70309@candelatech.com>
References: <4ACA3B87.70309@candelatech.com>
Message-ID: <4ACB411F.9070309@incunabulum.net>

Ben,

A few comments about this patch:
 * Do you have a specific distribution which relies on the behaviour of 
removing the pid file after the daemon terminates?
    For the BSD distributions at least, when running XORP from an 
rc.subr init script, this shouldn't be an issue; the file is just 
ignored, and overwritten on the next run.
    It's good filesystem hygeine to remove it, though.

Ben Greear wrote:
> This is mostly just a cleanup patch.  It removes some dead code and
> changes around the pidfile logic a bit.  It also allows unlinking the
> pid-file on exit using the atexit call.  Tested on Linux.

 * atexit() is better specified now, though, although a check for a C99 
compliant implementation would be useful (for folk trying to link 
against it on embedded platforms):
    http://www.opengroup.org/onlinepubs/009695399/functions/atexit.html

 * handle_atexit() should be renamed to reflect what it's doing through 
the atexit() mechanism. We are indeed daemonizing the whole process, not 
Rtrmgr, so it certainly belongs at the C top level scope.

 * Please don't use cout. C stdio is used elsewhere in this file, so no 
point in pulling in iostreams; C stdio should still be accessible. In 
practice this isn't an issue if libc is shared, however, it does pull in 
parts of the C++ runtime we don't immediately need.

 * The reason the pid gets written out from the parent is because on 
most distributions, we are writing to an absolute path under /var 
(usually /var/run/<procname>.pid). If the child is chrooted it may not 
have access to this absolute path, which breaks POLA.

JT indicated that he wasn't 100% happy with some POLA elements of how 
XORP daemonizes. For example, it won't chdir() to /. This potentially 
breaks chroot()-ed operation, or at least means that the rtrmgr still 
holds a vnode lock on the place it was started from.

So hopefully he will chime in on this.

cheers,
BMS


From bms at incunabulum.net  Tue Oct  6 06:15:06 2009
From: bms at incunabulum.net (Bruce Simpson)
Date: Tue, 06 Oct 2009 14:15:06 +0100
Subject: [Xorp-hackers] PATCH:  Remove some dead code,
 unlink pid-file on exit.
In-Reply-To: <4ACB411F.9070309@incunabulum.net>
References: <4ACA3B87.70309@candelatech.com> <4ACB411F.9070309@incunabulum.net>
Message-ID: <4ACB42DA.2070604@incunabulum.net>

Bruce Simpson wrote:
>  * Do you have a specific distribution which relies on the behaviour of 
> removing the pid file after the daemon terminates?
>     For the BSD distributions at least, when running XORP from an 
> rc.subr init script, this shouldn't be an issue; the file is just 
> ignored, and overwritten on the next run.
>     It's good filesystem hygeine to remove it, though.
>   

P.S. can you please raise a Trac enhancement request for this issue? Thanks.


From bms at incunabulum.net  Tue Oct  6 06:21:09 2009
From: bms at incunabulum.net (Bruce Simpson)
Date: Tue, 06 Oct 2009 14:21:09 +0100
Subject: [Xorp-hackers] PATCH:  Fix uninitialized memory,
	found by valgrind
In-Reply-To: <4AC77407.2050205@candelatech.com>
References: <4AC545B0.8080403@candelatech.com>
	<4AC73129.4060905@incunabulum.net>
	<4AC77407.2050205@candelatech.com>
Message-ID: <4ACB4445.5050307@incunabulum.net>

Ben,

Thanks for raising the uninitialized buffer issue. Unfortunately I won't 
have free time to perform valgrind runs on the code before I leave for 
my trip.

Ben Greear wrote:
> ...
>>
>> * Can you provide the valgrind hits which are fixed by the memset() 
>> calls in io_ip_socket.cc?
>>
> Run rtrmgr under valgrind with OSPF (though it's not OSPF related), 
> and you should see these errors.

It would be really useful if you could attach the valgrind logs (or at 
least the relevant excerpt) to Trac ticket(s) so that the issue can be 
investigated further. I really need to stay focused on the XRL code when 
I get back from my trip, however.

Keeping the relevant information with the issue, in the Trac database, 
is really helpful, as it helps others to pick up and investigate in an 
ongoing way; they may not have all the context/state from what you are 
actually trying to do at that moment.

I do appreciate the work you're doing in tracking down possible issues 
with valgrind, and hope you understand it is easy to get overwhelmed by 
issue reports individually.

thanks,
BMS


From bms at incunabulum.net  Tue Oct  6 06:31:35 2009
From: bms at incunabulum.net (Bruce Simpson)
Date: Tue, 06 Oct 2009 14:31:35 +0100
Subject: [Xorp-hackers] Away 7th Oct - 21st Oct
Message-ID: <4ACB46B7.90707@incunabulum.net>

Hi all,

    I'll be away on a trip in Scotland from 7th Oct - 21st Oct, getting 
a well needed rest, so will only have sporadic access to email during 
that time, and will be unavailable for support requests. I'll endeavour 
to respond to email in detail when I get back.
    A 1.7-RC for the community code beginning late November is a 
possibility, this is depending on how much of the XRL replacement work 
can be finished by then.

thanks,
BMS


From bms at incunabulum.net  Tue Oct  6 06:55:49 2009
From: bms at incunabulum.net (Bruce Simpson)
Date: Tue, 06 Oct 2009 14:55:49 +0100
Subject: [Xorp-hackers] rtrmgr and TaskManager
In-Reply-To: <4ACA748B.6070308@candelatech.com>
References: <4ACA6B1B.8020604@candelatech.com>
	<4ACA748B.6070308@candelatech.com>
Message-ID: <4ACB4C65.1080303@incunabulum.net>

Ben Greear wrote:
> Well, here's the slow-down I'm seeing..but WTF would someone add a 1-second
> sleep here???
>   

I don't really have time to delve into the Router Manager specifics 
right now, but I'll share some of what's evident from the XRL 
replacement work.

    My guess here is that the Router Manager code is letting the 
EventLoop run so other task(s) can get serviced. At this point in 
execution, the processes are not being started; instead, the XRLs being 
fired off to configure the process during startup, are shimmed.

    One reason why this is required, is because XRL is trying to be 
completely asynchronous. There's a fair amount of complexity in XRL, and 
the Router Manager, to deal with the fact that XRL method resolution 
happens on a per-method basis, and is completely asynchronous. The 
EventLoop needs to be run() in order for things to happen, mostly 
because C++ doesn't have continuations.

    The lack of a synchronous model for coding to XRL, as an RPC 
mechanism, means that we have some complexity in the 'show_*' tools. 
These are also written in C++, because XRL is tied to C++ as an 
implementation language.

    Any XRLs invoked by the Router Manager come from the template files; 
the *.xrls files are used to validate XRL invocation against the 
targets. These are always XRLs of the form 'finder://' which forces the 
resolution to go via the Finder (an indirect method call using the 
textual Finder protocol).

...

I'd argue that the Router Manager really needs to be revisited entirely, 
and it has been in the commercial product, although not to the extent 
I'd argue needs doing to make the routing processes useful outside of 
the framework they're embedded in right now, as the code is realized in 
the community branch.

    It is purely configuration space stuff, and involves a text parser 
for a configuration language, a configuration tree containing the 
current router config state, and the marshaling/pushing of that state to 
and from the routing processes.

    One might argue that it doesn't even need to be written in C++, and 
an object scripting language (e.g. Python, Ruby) would be sufficiently 
mature (and fast) to do what a Router Manager needs to do. All of these 
things could be realized in an OO scripting language.

    Of course, we don't really have free time on the board right now to 
deal with this right off the bat. The timer you mention here as an issue 
probably could be speeded up, however a time gap there is probably still 
necessary to let other callbacks run.

    I'm wary of wading into it too much before a 1.7-RC is cut, although 
if you find that cutting corners in these areas helps, and doesn't 
disturb functionality, it is something we can consider at that time.

cheers,
BMS


From greearb at candelatech.com  Tue Oct  6 09:26:34 2009
From: greearb at candelatech.com (Ben Greear)
Date: Tue, 06 Oct 2009 09:26:34 -0700
Subject: [Xorp-hackers] rtrmgr and TaskManager
In-Reply-To: <4ACB4C65.1080303@incunabulum.net>
References: <4ACA6B1B.8020604@candelatech.com>
	<4ACA748B.6070308@candelatech.com>
	<4ACB4C65.1080303@incunabulum.net>
Message-ID: <4ACB6FBA.5060109@candelatech.com>

On 10/06/2009 06:55 AM, Bruce Simpson wrote:
> Ben Greear wrote:
>> Well, here's the slow-down I'm seeing..but WTF would someone add a
>> 1-second
>> sleep here???
>
> I don't really have time to delve into the Router Manager specifics
> right now, but I'll share some of what's evident from the XRL
> replacement work.
>
> My guess here is that the Router Manager code is letting the EventLoop
> run so other task(s) can get serviced. At this point in execution, the
> processes are not being started; instead, the XRLs being fired off to
> configure the process during startup, are shimmed.
>
> One reason why this is required, is because XRL is trying to be
> completely asynchronous. There's a fair amount of complexity in XRL, and
> the Router Manager, to deal with the fact that XRL method resolution
> happens on a per-method basis, and is completely asynchronous. The
> EventLoop needs to be run() in order for things to happen, mostly
> because C++ doesn't have continuations.
>
> The lack of a synchronous model for coding to XRL, as an RPC mechanism,
> means that we have some complexity in the 'show_*' tools. These are also
> written in C++, because XRL is tied to C++ as an implementation language.
>
> Any XRLs invoked by the Router Manager come from the template files; the
> *.xrls files are used to validate XRL invocation against the targets.
> These are always XRLs of the form 'finder://' which forces the
> resolution to go via the Finder (an indirect method call using the
> textual Finder protocol).
>
> ...
>
> I'd argue that the Router Manager really needs to be revisited entirely,
> and it has been in the commercial product, although not to the extent
> I'd argue needs doing to make the routing processes useful outside of
> the framework they're embedded in right now, as the code is realized in
> the community branch.
>
> It is purely configuration space stuff, and involves a text parser for a
> configuration language, a configuration tree containing the current
> router config state, and the marshaling/pushing of that state to and
> from the routing processes.
>
> One might argue that it doesn't even need to be written in C++, and an
> object scripting language (e.g. Python, Ruby) would be sufficiently
> mature (and fast) to do what a Router Manager needs to do. All of these
> things could be realized in an OO scripting language.
>
> Of course, we don't really have free time on the board right now to deal
> with this right off the bat. The timer you mention here as an issue
> probably could be speeded up, however a time gap there is probably still
> necessary to let other callbacks run.
>
> I'm wary of wading into it too much before a 1.7-RC is cut, although if
> you find that cutting corners in these areas helps, and doesn't disturb
> functionality, it is something we can consider at that time.

Anything that depends on waiting for other tasks to run by just sleeping
for a while is a broken algorithm, so I'd prefer to see the problems sooner
than later.  From my poking at the code, I can't see any reason it should
need to sleep though...other tasks can run just fine after that one
completes.  If there are others that *must* run first, hopefully they
are properly chained with callbacks (the commit seems to be done thus).
I'm going to run with zero timer there and see if any problems
shake out.  After several hours yesterday, I had seen no problems, but saw significant
speed-up in 'commit' xorpsh commands which is very useful for me.

With regard to re-architecting rtr-mgr:  Networking is asynchronous by design
and considering that external events (interfaces coming & going, link state
bouncing, etc) can happen at any time, the code just needs to deal properly
with async events.  The one thing I'd work towards is more of a 'desired'
v/s 'actual' config.  Users could always configure any logical configuration
and the system will try to make this happen, but it will also deal properly
with 'phantom' things like interfaces that don't exist currently.  A different
programming language isn't going to help any of that I think..and I'd very
much like to keep with c/c++.

Thanks,
Ben

>
> cheers,
> BMS


-- 
Ben Greear <greearb at candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


From greearb at candelatech.com  Tue Oct  6 09:39:47 2009
From: greearb at candelatech.com (Ben Greear)
Date: Tue, 06 Oct 2009 09:39:47 -0700
Subject: [Xorp-hackers] [Xorp-users] Away 7th Oct - 21st Oct
In-Reply-To: <4ACB46B7.90707@incunabulum.net>
References: <4ACB46B7.90707@incunabulum.net>
Message-ID: <4ACB72D3.40300@candelatech.com>

On 10/06/2009 06:31 AM, Bruce Simpson wrote:
> Hi all,
>
>      I'll be away on a trip in Scotland from 7th Oct - 21st Oct, getting
> a well needed rest, so will only have sporadic access to email during
> that time, and will be unavailable for support requests. I'll endeavour
> to respond to email in detail when I get back.
>      A 1.7-RC for the community code beginning late November is a
> possibility, this is depending on how much of the XRL replacement work
> can be finished by then.

Enjoy, and please ignore all my emails during that time :)

Ben

>
> thanks,
> BMS
>
> _______________________________________________
> Xorp-users mailing list
> Xorp-users at xorp.org
> http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/xorp-users


-- 
Ben Greear <greearb at candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


From greearb at candelatech.com  Tue Oct  6 09:42:19 2009
From: greearb at candelatech.com (Ben Greear)
Date: Tue, 06 Oct 2009 09:42:19 -0700
Subject: [Xorp-hackers] PATCH:  Remove some dead code,
 unlink pid-file on exit.
In-Reply-To: <4ACB411F.9070309@incunabulum.net>
References: <4ACA3B87.70309@candelatech.com> <4ACB411F.9070309@incunabulum.net>
Message-ID: <4ACB736B.5090304@candelatech.com>

On 10/06/2009 06:07 AM, Bruce Simpson wrote:
> Ben,
>
> A few comments about this patch:
> * Do you have a specific distribution which relies on the behaviour of
> removing the pid file after the daemon terminates?
> For the BSD distributions at least, when running XORP from an rc.subr
> init script, this shouldn't be an issue; the file is just ignored, and
> overwritten on the next run.
> It's good filesystem hygeine to remove it, though.

I have my own xorp startup logic, and having valid pid files makes things
slightly more efficient.  More of a hygiene thing though.  You can never
absolutely depend on atexit working, so you still need to check pid file
contents against /proc/[pid]/ to be certain.

Ben

-- 
Ben Greear <greearb at candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


From greearb at candelatech.com  Tue Oct  6 09:51:25 2009
From: greearb at candelatech.com (Ben Greear)
Date: Tue, 06 Oct 2009 09:51:25 -0700
Subject: [Xorp-hackers] PATCH:  Add startup methods for faster startup.
In-Reply-To: <4ACB2F21.7080603@incunabulum.net>
References: <4ACAC806.1000000@candelatech.com>
	<4ACB2F21.7080603@incunabulum.net>
Message-ID: <4ACB758D.1060302@candelatech.com>

On 10/06/2009 04:50 AM, Bruce Simpson wrote:
> Ben Greear wrote:
>> If there is no status and no startup method in a xorp target, the
>> router-mgr uses a 2-second
>> sleep for 'verification'. This slows down startup of Xorp quite a bit
>> when you have lots
>> of protocols running.
>>
>> This patch adds startup methods to many of the common targets. There
>> are still more to
>> go, however.
>
> Thanks for tracking this down; yes, I've noticed that process startup is
> slower than it could be, but have only had free time / mindspace to look
> at the XRL specifics.
>
> Could this be made a more general change? If the XIF method for startup
> you are adding is not specific to a particular protocol, it might be an
> idea to make it part of the common.xif -- which is where most of the
> process control knobs are.
> I'd rather not get too far into the machinery here, because I'm about to
> take a badly needed break. I guess the firewall and ifmgr modules are a
> special case, because they're separate service bundles located in the
> FEA process.

It could probably be put in common code.  I'm just learning this code
myself...likely can get a cleaner patch later when I understand things
better.

>
> On a more general note:
> One of the things Pavlin raised in an old BugZilla ticket, is the fact
> that the Router Manager is fairly complex because it implements
> transactions on the config tree itself.
> If this is pushed into the protocols themselves (they'd have to keep
> their own config snapshot, and adopt a commit-rollback transaction model
> in the XIF RPC interfaces), then the Router Manager gets a bit simpler
> overall.

I dislike that because then it would become virtually impossible to restart
failed protocol processes.  I think the rtr-mgr should hold all config state.

As mentioned earlier, I think the commit/rollback thing has been somewhat over-thought
as well.  There are way too many ways to fail a commit...I think it should only fail
if there are logical issues (and in that case, the rtr-mgr shouldn't even try
to 'commit': it should not even accept the change in the first place.)  If it tries
to push something to a module and that module reports error, then we flag that piece
of configuration as pending, or invalid, or similar and these flags would show up
when the user did a 'show run' or similar.  Then they know they need to fix it,
perhaps by re-configuring, or perhaps by fixing broken hardware, or some other
external thing.

My various patches to not fail commits when we don't have to is my ongoing effort
towards this type of behaviour....

Ben

>
> cheers,
> BMS


-- 
Ben Greear <greearb at candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


From bms at incunabulum.net  Tue Oct  6 09:54:12 2009
From: bms at incunabulum.net (Bruce Simpson)
Date: Tue, 06 Oct 2009 17:54:12 +0100
Subject: [Xorp-hackers] rtrmgr and TaskManager
In-Reply-To: <4ACB6FBA.5060109@candelatech.com>
References: <4ACA6B1B.8020604@candelatech.com>
	<4ACA748B.6070308@candelatech.com>
	<4ACB4C65.1080303@incunabulum.net>
	<4ACB6FBA.5060109@candelatech.com>
Message-ID: <4ACB7634.8040701@incunabulum.net>

Ben Greear wrote:
>
> Anything that depends on waiting for other tasks to run by just sleeping
> for a while is a broken algorithm, so I'd prefer to see the problems 
> sooner
> than later.

That's not entirely true. Let me clarify.

In a threaded environment, this comment is valid; threads can starve 
each other of resources, or cause deadlock/livelock/experience race 
conditions, if synchronization is incorrect. So yes, your point about 
tasks sleeping to achieve synchronization being a flawed mechanism, is 
valid, in a threaded environment.

In a coroutine based environment, which is what XORP uses, this isn't a 
valid comment. Explicit yield points are necessary to allow other tasks 
to run, and 'synchronization' is achieved using state variables of some 
kind. This is the case in C++ as with any other language which 
implements coroutines -- there is only a single thread of execution, so 
in effect, nothing is ever sleeping.

The 'synchronization' point, if you like, is when select() finally gets 
called. This is pretty much what the io_service idiom in Boost C++ is doing.

Continuations offer language support for the coroutine construct, which 
is something C++ doesn't have; see here and further on in this reply:
    http://en.wikipedia.org/wiki/Coroutine

In the case of the Router Manager, I wouldn't be entirely surprised if 
there were callbacks stacked up waiting for dispatch in the background, 
however given how serial it is in nature (in terms of process bringup 
and trying to avoid thundering herd problems for OS resources), I'm not 
surprised it errs on the side of conservatism, hence the large timeout 
thresholds.

>   From my poking at the code, I can't see any reason it should
> need to sleep though...other tasks can run just fine after that one
> completes.  If there are others that *must* run first, hopefully they
> are properly chained with callbacks (the commit seems to be done thus).
> I'm going to run with zero timer there and see if any problems
> shake out.  After several hours yesterday, I had seen no problems, but 
> saw significant
> speed-up in 'commit' xorpsh commands which is very useful for me.

I buy the argument, but I'm sure you can understand my hands-off / 
kid-gloves position with regards to the Router Manager and taking 
changes for it -- it is a large C++ subsystem which I'm not entirely 
familiar with, and when I've made changes to it in the past, mostly when 
porting to Win32, it's been a case of get in, get out, stay focused, get 
it over with, and survive it.

If you experiment with turning those timeouts down, and it works for 
you, that's great, but I really need to have a clear picture of what's 
going on, if I'm to be expected to support it on an ongoing basis.

>
> With regard to re-architecting rtr-mgr:  Networking is asynchronous by 
> design
> and considering that external events (interfaces coming & going, link 
> state
> bouncing, etc) can happen at any time, the code just needs to deal 
> properly
> with async events.

In XORP's case, more engineering time seems to have been burnt up on 
getting the XRL layer written than on these external events you mention. 
The FEA in theory handles all of these events, it is something of a 
kitchen sink. What could do with better realization is how these events 
are propagated to the rest of the system -- which is why I've been 
focused on looking at XRL.

>   The one thing I'd work towards is more of a 'desired'
> v/s 'actual' config.  Users could always configure any logical 
> configuration
> and the system will try to make this happen, but it will also deal 
> properly
> with 'phantom' things like interfaces that don't exist currently.  A 
> different
> programming language isn't going to help any of that I think..and I'd 
> very
> much like to keep with c/c++.

As you've probably already seen, the Router Manager code is non-trivial, 
and there's a lot of complexity in there to deal with the asynchrony of 
the XRL RPC calls.

I agree that the configuration model needs serious looking at for things 
like dynamic interfaces (VPN, wireless, hot-swappable cards etc) and 
it's something which I raised several times as an agenda point during my 
time at ICSI. Unfortunately, the development focus has been in other 
areas, and I haven't been in a position to call the shots on where the 
effort went. I certainly got the impression that this put some folk off 
from trying XORP in the here and now.

Regarding the use of C/C++ for development: XORP is strongly tied to the 
concept of continuations, even if it doesn't have language support.

Twisted Python at least has the benefit of strong language support for 
continuations, in the form of how it overloads the 'yield' operator. 
This allows a call stack frame to be easily tucked away and restored at 
a later point in time, and in an exception safe way.

There have been efforts over the years to try to do this in C++, e.g. 
uC++, Concurrent C++ and others, but none of them have matured 
sufficiently for production use.

What we have in XORP is a compromise, and it's largely tied to the 
semantics of how I/O happens in a UNIX-like system.

cheers
BMS


From greearb at candelatech.com  Tue Oct  6 10:05:43 2009
From: greearb at candelatech.com (Ben Greear)
Date: Tue, 06 Oct 2009 10:05:43 -0700
Subject: [Xorp-hackers] PATCH: Fix commit failure on device removal race,
 related to IGMP.
In-Reply-To: <4ACB3BED.2090100@incunabulum.net>
References: <4ACA846C.7040908@candelatech.com>
	<4ACB3BED.2090100@incunabulum.net>
Message-ID: <4ACB78E7.6010402@candelatech.com>

On 10/06/2009 05:45 AM, Bruce Simpson wrote:
> Ben,
>
> Can you please raise a Trac ticket about this issue, and attach your patch?

These bugs are all over the place...I think it will be a waste of effort to open
bugs and attach patches for each instance.

(Notice bug-trac:  10599, open for more than a year, with the simplest
possible patch attached).

I think it's best that I post patches, get feedback, fix them as much as possible,
and keep them in my tree for continuous testing.  When you have time to
review & commit this sort of stuff, we can deal with it in larger chunks.  By then
I should have more of the issues found and fixed.

Thanks,
Ben

-- 
Ben Greear <greearb at candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


From greearb at candelatech.com  Tue Oct  6 10:34:12 2009
From: greearb at candelatech.com (Ben Greear)
Date: Tue, 06 Oct 2009 10:34:12 -0700
Subject: [Xorp-hackers] rtrmgr and TaskManager
In-Reply-To: <4ACB7634.8040701@incunabulum.net>
References: <4ACA6B1B.8020604@candelatech.com>
	<4ACA748B.6070308@candelatech.com>
	<4ACB4C65.1080303@incunabulum.net>
	<4ACB6FBA.5060109@candelatech.com>
	<4ACB7634.8040701@incunabulum.net>
Message-ID: <4ACB7F94.10103@candelatech.com>

On 10/06/2009 09:54 AM, Bruce Simpson wrote:
> Ben Greear wrote:

> I buy the argument, but I'm sure you can understand my hands-off /
> kid-gloves position with regards to the Router Manager and taking
> changes for it -- it is a large C++ subsystem which I'm not entirely
> familiar with, and when I've made changes to it in the past, mostly when
> porting to Win32, it's been a case of get in, get out, stay focused, get
> it over with, and survive it.
>
> If you experiment with turning those timeouts down, and it works for
> you, that's great, but I really need to have a clear picture of what's
> going on, if I'm to be expected to support it on an ongoing basis.

You are welcome to expect me to support it, but then you'll need
to accept my patches, and I'm liable to get medieval on it :)
As you can tell, I don't mind changing things I don't well understand :)
And, I'll probably work towards my ideas about desired v/s actual and
definitely not towards more fine-grained threading (which is what
that 'continuation' stuff you talk about sounds like to me.)  I do
like select loops with event handling though, and I am continuously grateful
that there are no pthreads in xorp!

> In XORP's case, more engineering time seems to have been burnt up on
> getting the XRL layer written than on these external events you mention.
> The FEA in theory handles all of these events, it is something of a
> kitchen sink. What could do with better realization is how these events
> are propagated to the rest of the system -- which is why I've been
> focused on looking at XRL.

I think joining fea and rtr-mgr into a single process makes a lot of
sense.  Let the protocols remain separate.

>> The one thing I'd work towards is more of a 'desired'
>> v/s 'actual' config. Users could always configure any logical
>> configuration
>> and the system will try to make this happen, but it will also deal
>> properly
>> with 'phantom' things like interfaces that don't exist currently. A
>> different
>> programming language isn't going to help any of that I think..and I'd
>> very
>> much like to keep with c/c++.
>
> As you've probably already seen, the Router Manager code is non-trivial,
> and there's a lot of complexity in there to deal with the asynchrony of
> the XRL RPC calls.

The XRL RPC basically just works, as far as I can tell.  The logic needed
to deal with these dynamic events should be entirely outside of the RPC
mechanism..it's just a transport.  The bugs I find in this area are in
protocols and FEA, mostly because they always expect they know everything
and return errors and/or assert whenever something unexpected happens.

I'm (slowly) fixing this because I need it for my own efforts.

Someday I'll add a XORP_WARNING return instead of just OK and ERROR so
that we can return warning messages w/out failing commands.

> I agree that the configuration model needs serious looking at for things
> like dynamic interfaces (VPN, wireless, hot-swappable cards etc) and
> it's something which I raised several times as an agenda point during my
> time at ICSI. Unfortunately, the development focus has been in other
> areas, and I haven't been in a position to call the shots on where the
> effort went. I certainly got the impression that this put some folk off
> from trying XORP in the here and now.

Well, I'm in a position to fix this in my tree, and I'm doing so.
I've no idea who has position to do that to the public tree if you don't!

Thanks,
Ben

-- 
Ben Greear <greearb at candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


From greearb at candelatech.com  Tue Oct  6 10:56:02 2009
From: greearb at candelatech.com (Ben Greear)
Date: Tue, 06 Oct 2009 10:56:02 -0700
Subject: [Xorp-hackers] PATCH:  Logging improvements,
 fix artificial deal for xorpsh commit.
In-Reply-To: <4ACB3B92.3050505@incunabulum.net>
References: <4ACA7A19.30909@candelatech.com> <4ACB3B92.3050505@incunabulum.net>
Message-ID: <4ACB84B2.2010909@candelatech.com>

On 10/06/2009 05:44 AM, Bruce Simpson wrote:
> Ben Greear wrote:

> Comments:
> * It should be possible to turn off the millisecond logging if desired.
> Whilst it's certainly a useful feature to have when debugging time
> contingent code, it does add clutter to the output.
> * Perhaps putting it under the other debug knobs in SConstruct would be
> a good idea?

Will add an #ifdef that could be twiddled in scons.

> * %llu is not a portable format specifier, and 'unsigned long long' is
> not a portable type, please don't use them in portable code.

Ok, I can use uint64_t, but what do you use instead of %llu to print it?

> * Perhaps the code which prints the timeval is a candidate for a
> function like xlog_localtime2string_short() ?
>
> * xlog_localtime2string_short() is still defined in xlog.c; so why
> comment out its prototype, are you getting warnings from the compiler?

It was all commented out...I removed entirely now.

> * A XorpTimer of 0 is a possible candidate for a XorpTask. I can't
> really delve further into that change at the moment, though.
> * Yes, it may be useful to constify the string arguments in those
> callback functions, but this change considered low priority.
> * Please avoid introducing unnecessary whitespace changes in diffs.
>
>
> Can you please raise a Trac item for these suggested improvements?
> I probably won't have time to look at the Router Manager in detail for
> at least 4 weeks.
>
> Sorry for the bureaucracy... I appreciate you're doing what you can in
> the here and now to improve the code, however, it makes reviewing
> patches and applying them that much easier, and we do need to keep the
> code alignment and type clean, etc.

I'll let these changes perk in my tree..plz let me know when you're ready
to work on it (no hurry) and I'll make diffs against upstream and we can quickly
resolve any issues and commit the code.

In the meantime, these reviews are appreciated and will help make the
eventual merge easier I think.

Thanks,
Ben

-- 
Ben Greear <greearb at candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


From greearb at candelatech.com  Tue Oct  6 10:58:36 2009
From: greearb at candelatech.com (Ben Greear)
Date: Tue, 06 Oct 2009 10:58:36 -0700
Subject: [Xorp-hackers] PATCH:  Add startup methods for faster startup.
In-Reply-To: <4ACB2F21.7080603@incunabulum.net>
References: <4ACAC806.1000000@candelatech.com>
	<4ACB2F21.7080603@incunabulum.net>
Message-ID: <4ACB854C.3070008@candelatech.com>

On 10/06/2009 04:50 AM, Bruce Simpson wrote:
> Ben Greear wrote:
>> If there is no status and no startup method in a xorp target, the
>> router-mgr uses a 2-second
>> sleep for 'verification'. This slows down startup of Xorp quite a bit
>> when you have lots
>> of protocols running.
>>
>> This patch adds startup methods to many of the common targets. There
>> are still more to
>> go, however.
>
> Thanks for tracking this down; yes, I've noticed that process startup is
> slower than it could be, but have only had free time / mindspace to look
> at the XRL specifics.

By the way, I did a quick oprofile run the other day.  Xorpsh was the top
offender by far...and I don't remember seeing xrl anywhere near the top
of the list.  Not all configurations will show the same performance graphs,
of course...but at least it isn't a large problem in all cases.

I'll be testing with oprofile some more..will post results next time
I get something interesting.

Thanks,
Ben

-- 
Ben Greear <greearb at candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


From bms at incunabulum.net  Wed Oct  7 01:36:16 2009
From: bms at incunabulum.net (Bruce Simpson)
Date: Wed, 07 Oct 2009 09:36:16 +0100
Subject: [Xorp-hackers] PATCH:  Logging improvements,
 fix artificial deal for xorpsh commit.
In-Reply-To: <4ACB84B2.2010909@candelatech.com>
References: <4ACA7A19.30909@candelatech.com> <4ACB3B92.3050505@incunabulum.net>
	<4ACB84B2.2010909@candelatech.com>
Message-ID: <4ACC5300.3010703@incunabulum.net>

Ben Greear wrote:
>
> Will add an #ifdef that could be twiddled in scons.

Excellent...

>
>> * %llu is not a portable format specifier, and 'unsigned long long' is
>> not a portable type, please don't use them in portable code.
>
> Ok, I can use uint64_t, but what do you use instead of %llu to print it?

%j and intmax_t is ISO C99 portable. It sucks because it means casting 
to the widest integer type on the platform, but it's a known quantity. 
'long long' has been a problem since well before Sun brought out SPARCV9.

cheers,
BMS


From bms at incunabulum.net  Wed Oct  7 01:49:51 2009
From: bms at incunabulum.net (Bruce Simpson)
Date: Wed, 07 Oct 2009 09:49:51 +0100
Subject: [Xorp-hackers] rtrmgr and TaskManager
In-Reply-To: <4ACB7F94.10103@candelatech.com>
References: <4ACA6B1B.8020604@candelatech.com>
	<4ACA748B.6070308@candelatech.com>
	<4ACB4C65.1080303@incunabulum.net>
	<4ACB6FBA.5060109@candelatech.com>
	<4ACB7634.8040701@incunabulum.net> <4ACB7F94.10103@candelatech.com>
Message-ID: <4ACC562F.9010807@incunabulum.net>

Ben Greear wrote:
>
> You are welcome to expect me to support it, but then you'll need
> to accept my patches, and I'm liable to get medieval on it :)
> As you can tell, I don't mind changing things I don't well understand :)

Yeah, that's what gets things done in the end.

> And, I'll probably work towards my ideas about desired v/s actual and
> definitely not towards more fine-grained threading (which is what
> that 'continuation' stuff you talk about sounds like to me.)

It is a real problem. Facebook are shipping a C++ concurrency library in 
Thrift which is largely modeled on Java constructs. Fortunately, the 
splice for XORP doesn't need to use this -- you can easily end up with 
several such models which don't overlap.

There have been efforts, around and under the umbrella of the Boost 
project, to come up with better frameworks for implementing state 
machines e.g. Boost.Statechart.

It's kind of sad that in CS education that Java has been pushed over 
languages which force students into a situation where they don't learn 
about computer architecture (I was a bit of a rebel at uni, and spent my 
time learning THIS stuff instead of the syllabus, by that time this 
'dumbing down' element was already happening in Scottish higher 
education -- I could go on and on about how I was reading Jay Sussman's 
'wizard book' at 16, and never touched it at uni, but I'd just sound 
like a bitter geek). Sometimes there is no substitute for a finite state 
machine (FSM), if you want tight code; threads have their own penalties.

Erlang is an interesting exception because they plain did away with both 
notions of coroutine and thread. Tasks are extremely lightweight in 
Erlang, although the scheduler is purely best-effort (at least in the 
openly-available Erlang Open Telecoms Platform (OTP), open sourced by 
Ericsson), and tasks can't even share variables; they communicate by 
message passing.

So they are somewhere between those two extremes -- scheduling is not 
necessarily cooperative.

Erlang also has language framework support for FSMs, and there are nice 
abstractions for tying protocol decode (at a bitfield level) to Erlang 
variables. This just eliminates a whole layer of complexity in the code 
developers have to write for communication apps.

Education is great, more important, the will and intent to just DO 
THINGS, and sometimes that means side-stepping what is known already -- 
or applying it appropriately on a jagged path, a bit like forked lightning.

>   I do
> like select loops with event handling though, and I am continuously 
> grateful
> that there are no pthreads in xorp!

Yes, appropriately threaded code is harder to debug, and inappropriate 
locking can really rain on your parade.

>
> I think joining fea and rtr-mgr into a single process makes a lot of
> sense.  Let the protocols remain separate.

There's a lot of state in there which makes that non-trivial. I've 
played with the idea of making certain components 'in-process servers' 
COM style, i.e. loadable .so's.

Thrift should speed up RPC performance, so I'm not

P.S. Robert Watson is being funded by Google to finish SOCK_SEQPACKET 
for AF_UNIX on FreeBSD which helps. Chrome is using it under the hood, 
it turns out.

There's a little bit of additional complexity in async RPC (both Thrift 
and XRL) to deal with out-of-order delivery. Not fully implemented, though.

And not all kernels will dispatch async in flight. This is where you see 
the design schism between the UNIX-like ones (Linux, BSD) and Windows 
(NT), which is fully async under the hood, and reordering of any local 
IPC can be a real issue (I/O completion ports).


> ...
> Someday I'll add a XORP_WARNING return instead of just OK and ERROR so
> that we can return warning messages w/out failing commands.

More appropriate use of exceptions might be better. Orion has argued 
that removing exceptions keeps the footprint down, I wonder if it's 
worth the churn.

>
> Well, I'm in a position to fix this in my tree, and I'm doing so.
> I've no idea who has position to do that to the public tree if you don't!

My personal agenda is that we have a whole load of stuff in XORP which 
makes it easy for people to do things in the routing space, we just need 
to work on the realization of the goal of folk actually using it.

Got a train to catch...
BMS


From greearb at candelatech.com  Wed Oct  7 09:36:07 2009
From: greearb at candelatech.com (Ben Greear)
Date: Wed, 07 Oct 2009 09:36:07 -0700
Subject: [Xorp-hackers] rtrmgr and TaskManager
In-Reply-To: <4ACC562F.9010807@incunabulum.net>
References: <4ACA6B1B.8020604@candelatech.com>
	<4ACA748B.6070308@candelatech.com>
	<4ACB4C65.1080303@incunabulum.net>
	<4ACB6FBA.5060109@candelatech.com>
	<4ACB7634.8040701@incunabulum.net> <4ACB7F94.10103@candelatech.com>
	<4ACC562F.9010807@incunabulum.net>
Message-ID: <4ACCC377.6040207@candelatech.com>

On 10/07/2009 01:49 AM, Bruce Simpson wrote:

> There's a little bit of additional complexity in async RPC (both Thrift
> and XRL) to deal with out-of-order delivery. Not fully implemented, though.

I think as long as each process's events are (or can be) ordered, it isn't so
big of a deal if there is reordering in time among different processes.

A single process, say xorpsh, could then just wait for completion of
the previous request before making a new one to ensure serialization.

I *think* that the 'commit' is serialized like this already, but if not,
I'll need to make it so.

For other applications, it would be good to turn on serialization by
default (for instance, the router daemons often make xrl calls and appear
to expect the previous one to complete before the second one is attempted).

Based on my brief look at rtr-mgr, if the client process doesn't wait for
completion, then it's *possible* for requests from the same process to
be reordered.

>> Someday I'll add a XORP_WARNING return instead of just OK and ERROR so
>> that we can return warning messages w/out failing commands.
>
> More appropriate use of exceptions might be better. Orion has argued
> that removing exceptions keeps the footprint down, I wonder if it's
> worth the churn.

You couldn't throw an exception across an RPC, but you could return
proper error codes and text strings to describe the error/warning/info/etc.

Either way, I don't like C++ exceptions and prefer using return values
and/or passing an error-reporting construct by value to be filled out
by lower calls (like passing in a string& err_msg, and using the return
value to know if an error actually happened.)  We could use a more formal
construct, maybe consisting of something like:

class foo {
   int severity; // enum, how bad was it?  info,warning,error,fatal ??
   int error_code; // enum, like errno perhaps? no-such-route,no-such-vif,invalid-request, ...
   string message;
};

Thanks,
Ben

-- 
Ben Greear <greearb at candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


From greearb at candelatech.com  Wed Oct  7 09:51:03 2009
From: greearb at candelatech.com (Ben Greear)
Date: Wed, 07 Oct 2009 09:51:03 -0700
Subject: [Xorp-hackers] PATCH:  Logging improvements,
 fix artificial deal for xorpsh commit.
In-Reply-To: <4ACC5300.3010703@incunabulum.net>
References: <4ACA7A19.30909@candelatech.com> <4ACB3B92.3050505@incunabulum.net>
	<4ACB84B2.2010909@candelatech.com>
	<4ACC5300.3010703@incunabulum.net>
Message-ID: <4ACCC6F7.2020404@candelatech.com>

On 10/07/2009 01:36 AM, Bruce Simpson wrote:
> Ben Greear wrote:
>>
>> Will add an #ifdef that could be twiddled in scons.
>
> Excellent...
>
>>
>>> * %llu is not a portable format specifier, and 'unsigned long long' is
>>> not a portable type, please don't use them in portable code.
>>
>> Ok, I can use uint64_t, but what do you use instead of %llu to print it?
>
> %j and intmax_t is ISO C99 portable. It sucks because it means casting
> to the widest integer type on the platform, but it's a known quantity.
> 'long long' has been a problem since well before Sun brought out SPARCV9.

 From MS's page, they may not support %j (or %ll for that matter).  Maybe
the just don't document it:

http://msdn.microsoft.com/en-us/library/hf4y5e3w%28VS.71%29.aspx

Anyway, I think I'll leave it %llu for now.  It's not the end of the world if
some obscure platform uses something other than a 64-bit value for this,
and if it breaks compile due to snprintf limitations on some platform,
can fix it then with #ifdef or some other kludge.

(Using uint64_t and %llu is a compile warning for F11, 64-bit, btw,
but unsigned long long and %llu works fine on 32 and 64 bit.)

Ben

-- 
Ben Greear <greearb at candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


From greearb at candelatech.com  Wed Oct  7 21:31:18 2009
From: greearb at candelatech.com (Ben Greear)
Date: Wed, 07 Oct 2009 21:31:18 -0700
Subject: [Xorp-hackers] PATCH:  Allow delayed start of PIM vif
Message-ID: <4ACD6B16.6080500@candelatech.com>

Here's an example of not failing a commit because the network interface 
isn't ready.  Needs
a bit more testing, but this is the behaviour I'm trying to move 
toward.  Many of the other protocols
need similar work, but I'm just posting a single patch for comment now.

Thanks,
Ben

-- 
Ben Greear <greearb at candelatech.com> 
Candela Technologies Inc  http://www.candelatech.com


-------------- next part --------------
A non-text attachment was scrubbed...
Name: patch0.patch
Type: text/x-patch
Size: 2802 bytes
Desc: not available
Url : http://mailman.ICSI.Berkeley.EDU/pipermail/xorp-hackers/attachments/20091007/e838c051/attachment.bin 

From lizhaous2000 at yahoo.com  Thu Oct  8 08:22:04 2009
From: lizhaous2000 at yahoo.com (Li Zhao)
Date: Thu, 8 Oct 2009 08:22:04 -0700 (PDT)
Subject: [Xorp-hackers] static xrl interface calls
Message-ID: <569389.38414.qm@web58706.mail.re1.yahoo.com>

As document said, XrlStaticRoutesV0p1Client::send_add_route4 is called from rtrmgr. But actually i do not see that symbol in rtrmgr. Actually i do not see any process is calling this method. On the other hand, target call XrlStaticRoutsNode::static_routes_0_1_add_route4 was called on xorp_static_routes. I do not know how was this triggered. Can any body explain to me? Thanks.

Li


From jtc at acorntoolworks.com  Fri Oct  9 07:40:50 2009
From: jtc at acorntoolworks.com (J.T. Conklin)
Date: Fri, 09 Oct 2009 07:40:50 -0700
Subject: [Xorp-hackers] PATCH:  Logging improvements,
 fix artificial deal for xorpsh commit.
In-Reply-To: <4ACB3B92.3050505@incunabulum.net> (Bruce Simpson's message of
	"Tue, 06 Oct 2009 13:44:02 +0100")
References: <4ACA7A19.30909@candelatech.com> <4ACB3B92.3050505@incunabulum.net>
Message-ID: <8763aobpfx.fsf@orac.acorntoolworks.com>

Bruce Simpson <bms at incunabulum.net> writes:
> Comments:
>    * It should be possible to turn off the millisecond logging if 
> desired. Whilst it's certainly a useful feature to have when debugging 
> time contingent code, it does add clutter to the output.

I think that now systems are so fast that sub-second timestamps should
be almost always be used for log/event timestamps.  This is especially
true in distributed systems where log messages from separate programs
are/need to be merged into one stream. Without sub-second timestamps,
everything appears to happen at one time.

For what it's worth, the new RFC5424 says the "originator SHOULD
include TIME-SECFRAC if its clock accuracy and performance permit".
While we don't currently format log messages according to the RFC, we
probably open a Trac ticket along those lines.  Emitting RFC compliant
log messages will make it easier for automated log analysis and data-
base systems to handle XORP output.

>     * Perhaps putting it under the other debug knobs in SConstruct would 
> be a good idea?

As for configure knobs to control this (and other) log behavior...  I
think it's worth considering totally rototilling XORP's log subsystem
and using log4cxx (or log4j/log4cxx inspired code of our own).  If we
took full advantage of the framework, we could have much more finer
grain control of log messages by defining logger hierarchies (eg., we
could enable messages just from the xorp.bgp.foo.bar logger).  We could 
also define/select format specifications with a config file, avoiding 
compiling in behavior like whether sub-second timestamps would be used.

It's a big project, but I think has the potential of similarly big
rewards.

In the short term, I think we should change the log output to be RFC
5424 compliant, including sub-second timestamps.

>    * %llu is not a portable format specifier, and 'unsigned long long' 
> is not a portable type, please don't use them in portable code.

As Ben found discovered, your suggestion to cast to intmax_t and use
the %j format specifier doesn't work on the older systems.

I think fixed sized integral types like int64_t, uint64_t, etc. and
the corresponding macros like PRId64, PRIu64, etc.  were interduced in
C90, and should be the most portable.  And, if we run into any systems
that don't have them, it should be easy enough to define the types and
macros in xorp_config.h.

    --jtc

-- 
J.T. Conklin


From jtc at acorntoolworks.com  Fri Oct  9 07:58:48 2009
From: jtc at acorntoolworks.com (J.T. Conklin)
Date: Fri, 09 Oct 2009 07:58:48 -0700
Subject: [Xorp-hackers] PATCH:  Logging improvements,
 fix artificial deal for xorpsh commit.
In-Reply-To: <4ACA7A19.30909@candelatech.com> (Ben Greear's message of "Mon,
	05 Oct 2009 15:58:33 -0700")
References: <4ACA7A19.30909@candelatech.com>
Message-ID: <871vlcbolz.fsf@orac.acorntoolworks.com>

Ben Greear <greearb at candelatech.com> writes:
> 2) Change some pass-by-value string arguments to const string& in
> router-mgr.  This will improve performance and a small bit of memory
> usage.

Hi Ben,

This, and passing string literals to functions/methods that expected
string parameters, was identified as being responsible for a huge
proprortion of the static footprint bloat during the "XORP on a Diet"
project I did while at the company. Most, if not all, of the problems
I found and fixed made it back to the public sources.

I used rather rudementary tools (grep, nm, perl scripts, etc.) to
identify the sources, so I'm not terribly surprised that others still
exist.

Fortunately, these tend to be uncontroversial and quite easy to fix.

    --jtc

-- 
J.T. Conklin


From greearb at candelatech.com  Fri Oct  9 08:10:53 2009
From: greearb at candelatech.com (Ben Greear)
Date: Fri, 09 Oct 2009 08:10:53 -0700
Subject: [Xorp-hackers] PATCH:  Logging improvements,
 fix artificial deal for xorpsh commit.
In-Reply-To: <871vlcbolz.fsf@orac.acorntoolworks.com>
References: <4ACA7A19.30909@candelatech.com>
	<871vlcbolz.fsf@orac.acorntoolworks.com>
Message-ID: <4ACF527D.9070202@candelatech.com>

J.T. Conklin wrote:
> Ben Greear <greearb at candelatech.com> writes:
>   
>> 2) Change some pass-by-value string arguments to const string& in
>> router-mgr.  This will improve performance and a small bit of memory
>> usage.
>>     
>
> Hi Ben,
>
> This, and passing string literals to functions/methods that expected
> string parameters, was identified as being responsible for a huge
> proprortion of the static footprint bloat during the "XORP on a Diet"
> project I did while at the company. Most, if not all, of the problems
> I found and fixed made it back to the public sources.
>
> I used rather rudementary tools (grep, nm, perl scripts, etc.) to
> identify the sources, so I'm not terribly surprised that others still
> exist.
>
> Fortunately, these tend to be uncontroversial and quite easy to fix.
>   
Yeah, it was a trivial fix and almost certainly harmless.  I plan to 
continue fixing
such problems as I find them.  I'll remember to keep an eye out for 
passing string
literals too..haven't been watching for those.

I've found no problems with the removal of delays for xorpsh commit and 
the startup
logic in 3 days of heavy testing, by the way.  It's possible it's 
uncovered a few races
I wouldn't have noticed otherwise...but the bugs were there regardless 
(and I'm fixing
the bugs as I find them.)

Thanks,
Ben

-- 
Ben Greear <greearb at candelatech.com> 
Candela Technologies Inc  http://www.candelatech.com


From jtc at acorntoolworks.com  Fri Oct  9 08:30:41 2009
From: jtc at acorntoolworks.com (J.T. Conklin)
Date: Fri, 09 Oct 2009 08:30:41 -0700
Subject: [Xorp-hackers] Pending SCons configure change
Message-ID: <87pr8wa8ke.fsf@orac.acorntoolworks.com>

Hi,

I have a change to the SCons command line option/variable processing
pending in my workspace that should be ready to commit this weekend,
and I wanted to give everyone the heads up (and a chance to voice 
objections).

Currently SCons takes arch=..., a os=..., cross=..., and rel=... 
command line options.  arch=, os=, and cross= are for (initial) 
support of cross compiling, and set the CPU architecture and OS 
for the host system.  rel= is supposededly used for the "release"
number, but is really only used to append to the build directory.

I'm planning on completely removing rel=.  Currently it defaults to 
"public17", so the build directory defaults to obj/<arch>-<os>-public17.
IMO, This doesn't add any value. In fact it could be considered harmful.
When we change the default, either for the 1.7 release candidate or
for 1.8 development, it will orphan the current build directory and
force everything to be rebuilt, even though no (or few) changes have
been made.

I'm planning on completly removing cross=.  This is currently not
used.  After this change, we'll be able to determine that we're cross
compiling if different build= and host= options were specified.

I'm planning on replacing the arch= and os= options with build= and host=,
which would take the GNU system tripple (<cpu>-<vendor>-<os>) or alias
just like would be used just like the --build= and --host= options to
a configure script.  The build= option will allow the user to specify
the build system, instead of having it guessed as it is today.

While we could re-implement the function of the config.guess and
config.sub scripts in python and execute them within SCons, at least
for the time being I've added them to the build and have made the
SConstruct use them.  This ensures the behavior and the accepted
system tripples are the same as every other GNU project.

Like before, if no arguments are passed on the SCons command line,
build and host systems are guessed, and a native XORP installation
is built.

The default build directory will now be obj/<host>.  Since host will
now be the standard GNU system tripple, this may result in a rebuild
and a new object directory (orphaning any objdirs with the old name).

    --jtc

-- 
J.T. Conklin


From greearb at candelatech.com  Fri Oct  9 11:58:04 2009
From: greearb at candelatech.com (Ben Greear)
Date: Fri, 09 Oct 2009 11:58:04 -0700
Subject: [Xorp-hackers] Pending SCons configure change
In-Reply-To: <87pr8wa8ke.fsf@orac.acorntoolworks.com>
References: <87pr8wa8ke.fsf@orac.acorntoolworks.com>
Message-ID: <4ACF87BC.1060601@candelatech.com>

On 10/09/2009 08:30 AM, J.T. Conklin wrote:
> Hi,
>
> I have a change to the SCons command line option/variable processing
> pending in my workspace that should be ready to commit this weekend,
> and I wanted to give everyone the heads up (and a chance to voice
> objections).

This all sounds fine to me.  One small gripe about scons in general:

I liked the old ./configure method because you figured out your
configuration once, and then all you had to do was type 'make'
and not remember all of your options each build.

I wonder if you could set up scons to do something like:

scons config foo=bar blah=baz ...

This would write out a small config file with the supplied
options.

Then, when you run 'scons', it would read the config file
if exists and use that configuration.

Thanks,
Ben

-- 
Ben Greear <greearb at candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


From lizhaous2000 at yahoo.com  Fri Oct  9 13:05:16 2009
From: lizhaous2000 at yahoo.com (Li Zhao)
Date: Fri, 9 Oct 2009 13:05:16 -0700 (PDT)
Subject: [Xorp-hackers] static xrl interface calls
In-Reply-To: <569389.38414.qm@web58706.mail.re1.yahoo.com>
Message-ID: <935513.83788.qm@web58707.mail.re1.yahoo.com>

Actually this is a generic question. For any new config coming from xorpsh, how are these xrl client functions sent to the target process from rtrmgr?

--- On Thu, 10/8/09, Li Zhao <lizhaous2000 at yahoo.com> wrote:

> From: Li Zhao <lizhaous2000 at yahoo.com>
> Subject: [Xorp-hackers] static xrl interface calls
> To: xorp-hackers at icir.org
> Date: Thursday, October 8, 2009, 11:22 AM
> As document said,
> XrlStaticRoutesV0p1Client::send_add_route4 is called from
> rtrmgr. But actually i do not see that symbol in rtrmgr.
> Actually i do not see any process is calling this method. On
> the other hand, target call
> XrlStaticRoutsNode::static_routes_0_1_add_route4 was called
> on xorp_static_routes. I do not know how was this triggered.
> Can any body explain to me? Thanks.
> 
> Li
> 
> 
> ? ? ? 
> 
> _______________________________________________
> Xorp-hackers mailing list
> Xorp-hackers at icir.org
> http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/xorp-hackers
> 


From greearb at candelatech.com  Fri Oct  9 14:23:07 2009
From: greearb at candelatech.com (Ben Greear)
Date: Fri, 09 Oct 2009 14:23:07 -0700
Subject: [Xorp-hackers] static xrl interface calls
In-Reply-To: <935513.83788.qm@web58707.mail.re1.yahoo.com>
References: <935513.83788.qm@web58707.mail.re1.yahoo.com>
Message-ID: <4ACFA9BB.3030806@candelatech.com>

On 10/09/2009 01:05 PM, Li Zhao wrote:
> Actually this is a generic question. For any new config coming from xorpsh, how are these xrl client functions sent to the target process from rtrmgr?

Search for 'commit'.  There is some logic in that code to send updates to
modules through xrl commands.

I think programs also talk directly with fea...I don't understand it all that well
myself at this time.

Thanks,
Ben

>
> --- On Thu, 10/8/09, Li Zhao<lizhaous2000 at yahoo.com>  wrote:
>
>> From: Li Zhao<lizhaous2000 at yahoo.com>
>> Subject: [Xorp-hackers] static xrl interface calls
>> To: xorp-hackers at icir.org
>> Date: Thursday, October 8, 2009, 11:22 AM
>> As document said,
>> XrlStaticRoutesV0p1Client::send_add_route4 is called from
>> rtrmgr. But actually i do not see that symbol in rtrmgr.
>> Actually i do not see any process is calling this method. On
>> the other hand, target call
>> XrlStaticRoutsNode::static_routes_0_1_add_route4 was called
>> on xorp_static_routes. I do not know how was this triggered.
>> Can any body explain to me? Thanks.
>>
>> Li
>>
>>
>>
>>
>> _______________________________________________
>> Xorp-hackers mailing list
>> Xorp-hackers at icir.org
>> http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/xorp-hackers
>>
>
>
>
>
> _______________________________________________
> Xorp-hackers mailing list
> Xorp-hackers at icir.org
> http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/xorp-hackers


-- 
Ben Greear <greearb at candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


From jtc at acorntoolworks.com  Fri Oct  9 19:15:50 2009
From: jtc at acorntoolworks.com (J.T. Conklin)
Date: Fri, 09 Oct 2009 19:15:50 -0700
Subject: [Xorp-hackers] Pending SCons configure change
In-Reply-To: <4ACF87BC.1060601@candelatech.com> (Ben Greear's message of
	"Fri, 09 Oct 2009 11:58:04 -0700")
References: <87pr8wa8ke.fsf@orac.acorntoolworks.com>
	<4ACF87BC.1060601@candelatech.com>
Message-ID: <87vdio3sfd.fsf@orac.acorntoolworks.com>

Ben Greear <greearb at candelatech.com> writes:
> This all sounds fine to me.  One small gripe about scons in general:
>
> I liked the old ./configure method because you figured out your
> configuration once, and then all you had to do was type 'make'
> and not remember all of your options each build.
>
> I wonder if you could set up scons to do something like:
>
> scons config foo=bar blah=baz ...
>
> This would write out a small config file with the supplied
> options.
>
> Then, when you run 'scons', it would read the config file
> if exists and use that configuration.

Hi Ben,

SCons has the ability to cache command line variables that are set via
Variables().  Unfortunately, we are still using the older ARGUMENTS
array for most, including the new host= and build= variables I'll be
introducing in my upcoming patch.

There's still cleanup that must be done first, but I hope to convert
command line variable processing to use Variables() relatively soon.
When done, I'll definitely be adding code to cache variables between
scons invocations.

    --jtc

-- 
J.T. Conklin


From lizhaous2000 at yahoo.com  Mon Oct 12 07:10:26 2009
From: lizhaous2000 at yahoo.com (Li Zhao)
Date: Mon, 12 Oct 2009 07:10:26 -0700 (PDT)
Subject: [Xorp-hackers] static xrl interface calls
In-Reply-To: <4ACFA9BB.3030806@candelatech.com>
Message-ID: <474379.59821.qm@web58708.mail.re1.yahoo.com>

I have used gdb and cscope to trace the code flow as following:
commit_changes -> send_apply_config_change -> | rtrmgr_0_1_apply_config_change ->apply_config_change -> change_config -> commit_change_pass1 -> commit_change_pass2 -> commit_changes.

But i still can not find the code in rtrmgr explicitly calling (ANY) xrl interface functions to any target module.
On the other hand the target mudule did receive STCP ios and the corresponding target functions were called.

I do not think in the case of adding static route rtrmgr can talk to fea directly. The only puzzle was how on the earth rtrmgr called the function xrlStaticRouteV0p1Client::send_add_route4.

Thanks for you reply.

Li


--- On Fri, 10/9/09, Ben Greear <greearb at candelatech.com> wrote:

> From: Ben Greear <greearb at candelatech.com>
> Subject: Re: [Xorp-hackers] static xrl interface calls
> To: "Li Zhao" <lizhaous2000 at yahoo.com>
> Cc: xorp-hackers at icir.org
> Date: Friday, October 9, 2009, 5:23 PM
> On 10/09/2009 01:05 PM, Li Zhao
> wrote:
> > Actually this is a generic question. For any new
> config coming from xorpsh, how are these xrl client
> functions sent to the target process from rtrmgr?
> 
> Search for 'commit'.? There is some logic in that code
> to send updates to
> modules through xrl commands.
> 
> I think programs also talk directly with fea...I don't
> understand it all that well
> myself at this time.
> 
> Thanks,
> Ben
> 
> >
> > --- On Thu, 10/8/09, Li Zhao<lizhaous2000 at yahoo.com>?
> wrote:
> >
> >> From: Li Zhao<lizhaous2000 at yahoo.com>
> >> Subject: [Xorp-hackers] static xrl interface
> calls
> >> To: xorp-hackers at icir.org
> >> Date: Thursday, October 8, 2009, 11:22 AM
> >> As document said,
> >> XrlStaticRoutesV0p1Client::send_add_route4 is
> called from
> >> rtrmgr. But actually i do not see that symbol in
> rtrmgr.
> >> Actually i do not see any process is calling this
> method. On
> >> the other hand, target call
> >> XrlStaticRoutsNode::static_routes_0_1_add_route4
> was called
> >> on xorp_static_routes. I do not know how was this
> triggered.
> >> Can any body explain to me? Thanks.
> >>
> >> Li
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> Xorp-hackers mailing list
> >> Xorp-hackers at icir.org
> >> http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/xorp-hackers
> >>
> >
> >
> >
> >
> > _______________________________________________
> > Xorp-hackers mailing list
> > Xorp-hackers at icir.org
> > http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/xorp-hackers
> 
> 
> -- 
> Ben Greear <greearb at candelatech.com>
> Candela Technologies Inc? http://www.candelatech.com
> 
> 


From greearb at candelatech.com  Mon Oct 12 08:44:23 2009
From: greearb at candelatech.com (Ben Greear)
Date: Mon, 12 Oct 2009 08:44:23 -0700
Subject: [Xorp-hackers] static xrl interface calls
In-Reply-To: <474379.59821.qm@web58708.mail.re1.yahoo.com>
References: <474379.59821.qm@web58708.mail.re1.yahoo.com>
Message-ID: <4AD34ED7.4090902@candelatech.com>

Li Zhao wrote:
> I have used gdb and cscope to trace the code flow as following:
> commit_changes -> send_apply_config_change -> | rtrmgr_0_1_apply_config_change ->apply_config_change -> change_config -> commit_change_pass1 -> commit_change_pass2 -> commit_changes.
>
> But i still can not find the code in rtrmgr explicitly calling (ANY) xrl interface functions to any target module.
> On the other hand the target mudule did receive STCP ios and the corresponding target functions were called.
>
> I do not think in the case of adding static route rtrmgr can talk to fea directly. The only puzzle was how on the earth rtrmgr called the function xrlStaticRouteV0p1Client::send_add_route4.
>
> Thanks for you reply.
>   

Damn...what complicated code.  Just spent an hours trying to follow the 
commit
logic.

Anyway, I think it comes down to TaskXrlItem

An entry point to this code might be:

template_commands.cc:
int
XrlAction::execute(const MasterConfigTreeNode& ctn,
           TaskManager& task_manager,
           XrlRouter::XrlCallback cb) const

called from:
module_command.cc:
void
ModuleCommand::add_action(const list<string>& action, const XRLdb& xrldb)
    throw (ParseError)
{

I cannot figure exactly how this ties back in, but I think all of this 
must be called from:

master_conf_tree_node.cc:
bool
MasterConfigTreeNode::commit_changes(TaskManager& task_manager,
                     bool do_commit,
                     int depth, int last_depth,
                     string& error_msg,
                     bool& needs_activate,
                     bool& needs_update)
{


Commands are added directly by some parser, probably of the .xif files 
or something like that.

Probably would take enabling logging and then reading the logs very 
carefully to figure out
exactly how it actually works.

Thanks,
Ben

-- 
Ben Greear <greearb at candelatech.com> 
Candela Technologies Inc  http://www.candelatech.com


From lizhaous2000 at yahoo.com  Mon Oct 12 09:50:31 2009
From: lizhaous2000 at yahoo.com (Li Zhao)
Date: Mon, 12 Oct 2009 09:50:31 -0700 (PDT)
Subject: [Xorp-hackers] static xrl interface calls
In-Reply-To: <4AD34ED7.4090902@candelatech.com>
Message-ID: <457624.23473.qm@web58702.mail.re1.yahoo.com>

You are right. We are finding the same thing.
What I found is the configure tree node is traversed. Please watch etc/template/static_routes.tp:

route @: ipv4net {
      %create xrl "$(static.targetname)/static_routes/0.1/add_route4?..."

What happened was when the leaf node was processed, the corresponding command will call Command::execute which in turn will call XrlAction::execute. It was adding an xrl to the task manager so the task manager will have a penfing action. That is why I can not see explicate call to XrlStaticRouteV0p1Client methods.

I am studing now how the task manager is mapping from xrl->_action->_request to the real xrl calls.

I am getting much closer now.

Thanks.

Li


--- On Mon, 10/12/09, Ben Greear <greearb at candelatech.com> wrote:

> From: Ben Greear <greearb at candelatech.com>
> Subject: Re: [Xorp-hackers] static xrl interface calls
> To: "Li Zhao" <lizhaous2000 at yahoo.com>
> Cc: xorp-hackers at icir.org
> Date: Monday, October 12, 2009, 11:44 AM
> Li Zhao wrote:
> > I have used gdb and cscope to trace the code flow as
> following:
> > commit_changes -> send_apply_config_change -> |
> rtrmgr_0_1_apply_config_change ->apply_config_change
> -> change_config -> commit_change_pass1 ->
> commit_change_pass2 -> commit_changes.
> > 
> > But i still can not find the code in rtrmgr explicitly
> calling (ANY) xrl interface functions to any target module.
> > On the other hand the target mudule did receive STCP
> ios and the corresponding target functions were called.
> > 
> > I do not think in the case of adding static route
> rtrmgr can talk to fea directly. The only puzzle was how on
> the earth rtrmgr called the function
> xrlStaticRouteV0p1Client::send_add_route4.
> > 
> > Thanks for you reply.
> >???
> 
> Damn...what complicated code.? Just spent an hours
> trying to follow the commit
> logic.
> 
> Anyway, I think it comes down to TaskXrlItem
> 
> An entry point to this code might be:
> 
> template_commands.cc:
> int
> XrlAction::execute(const MasterConfigTreeNode& ctn,
> ? ? ? ? ? TaskManager&
> task_manager,
> ? ? ? ? ? XrlRouter::XrlCallback
> cb) const
> 
> called from:
> module_command.cc:
> void
> ModuleCommand::add_action(const list<string>&
> action, const XRLdb& xrldb)
> ???throw (ParseError)
> {
> 
> I cannot figure exactly how this ties back in, but I think
> all of this must be called from:
> 
> master_conf_tree_node.cc:
> bool
> MasterConfigTreeNode::commit_changes(TaskManager&
> task_manager,
> ? ? ? ? ? ? ? ?
> ? ? bool do_commit,
> ? ? ? ? ? ? ? ?
> ? ? int depth, int last_depth,
> ? ? ? ? ? ? ? ?
> ? ? string& error_msg,
> ? ? ? ? ? ? ? ?
> ? ? bool& needs_activate,
> ? ? ? ? ? ? ? ?
> ? ? bool& needs_update)
> {
> 
> 
> Commands are added directly by some parser, probably of the
> .xif files or something like that.
> 
> Probably would take enabling logging and then reading the
> logs very carefully to figure out
> exactly how it actually works.
> 
> Thanks,
> Ben
> 
> -- Ben Greear <greearb at candelatech.com>
> Candela Technologies Inc? http://www.candelatech.com
> 
> 
> 


From lizhaous2000 at yahoo.com  Mon Oct 12 12:45:22 2009
From: lizhaous2000 at yahoo.com (Li Zhao)
Date: Mon, 12 Oct 2009 12:45:22 -0700 (PDT)
Subject: [Xorp-hackers] static xrl interface calls
In-Reply-To: <457624.23473.qm@web58702.mail.re1.yahoo.com>
Message-ID: <3814.27208.qm@web58703.mail.re1.yahoo.com>

The last piece of puzzle was solved. The task which was added to the task manager was a TaskXrlItem. When this task was fired, the execute method in TaskXrlItem was asking _xorp_client to send a unresolved xrl request.

The reason why XrlStaticRoutesV0p1Client methods were not called, I guess, was because rtrmgr needs to utilize its task and taskmanager mechanism. If there is another process which does not have moduel, task, or taskmanager, then XrlStaticRoutesV0p1Client methods can be used directly and will have the similar code flow eventually.

Li

--- On Mon, 10/12/09, Li Zhao <lizhaous2000 at yahoo.com> wrote:

> From: Li Zhao <lizhaous2000 at yahoo.com>
> Subject: Re: [Xorp-hackers] static xrl interface calls
> To: "Ben Greear" <greearb at candelatech.com>
> Cc: xorp-hackers at icir.org
> Date: Monday, October 12, 2009, 12:50 PM
> You are right. We are finding the
> same thing.
> What I found is the configure tree node is traversed.
> Please watch etc/template/static_routes.tp:
> 
> route @: ipv4net {
> ? ? ? %create xrl
> "$(static.targetname)/static_routes/0.1/add_route4?..."
> 
> What happened was when the leaf node was processed, the
> corresponding command will call Command::execute which in
> turn will call XrlAction::execute. It was adding an xrl to
> the task manager so the task manager will have a penfing
> action. That is why I can not see explicate call to
> XrlStaticRouteV0p1Client methods.
> 
> I am studing now how the task manager is mapping from
> xrl->_action->_request to the real xrl calls.
> 
> I am getting much closer now.
> 
> Thanks.
> 
> Li
> 
> 
> --- On Mon, 10/12/09, Ben Greear <greearb at candelatech.com>
> wrote:
> 
> > From: Ben Greear <greearb at candelatech.com>
> > Subject: Re: [Xorp-hackers] static xrl interface
> calls
> > To: "Li Zhao" <lizhaous2000 at yahoo.com>
> > Cc: xorp-hackers at icir.org
> > Date: Monday, October 12, 2009, 11:44 AM
> > Li Zhao wrote:
> > > I have used gdb and cscope to trace the code flow
> as
> > following:
> > > commit_changes -> send_apply_config_change
> -> |
> > rtrmgr_0_1_apply_config_change
> ->apply_config_change
> > -> change_config -> commit_change_pass1 ->
> > commit_change_pass2 -> commit_changes.
> > > 
> > > But i still can not find the code in rtrmgr
> explicitly
> > calling (ANY) xrl interface functions to any target
> module.
> > > On the other hand the target mudule did receive
> STCP
> > ios and the corresponding target functions were
> called.
> > > 
> > > I do not think in the case of adding static
> route
> > rtrmgr can talk to fea directly. The only puzzle was
> how on
> > the earth rtrmgr called the function
> > xrlStaticRouteV0p1Client::send_add_route4.
> > > 
> > > Thanks for you reply.
> > >???
> > 
> > Damn...what complicated code.? Just spent an hours
> > trying to follow the commit
> > logic.
> > 
> > Anyway, I think it comes down to TaskXrlItem
> > 
> > An entry point to this code might be:
> > 
> > template_commands.cc:
> > int
> > XrlAction::execute(const MasterConfigTreeNode&
> ctn,
> > ? ? ? ? ? TaskManager&
> > task_manager,
> > ? ? ? ? ? XrlRouter::XrlCallback
> > cb) const
> > 
> > called from:
> > module_command.cc:
> > void
> > ModuleCommand::add_action(const
> list<string>&
> > action, const XRLdb& xrldb)
> > ???throw (ParseError)
> > {
> > 
> > I cannot figure exactly how this ties back in, but I
> think
> > all of this must be called from:
> > 
> > master_conf_tree_node.cc:
> > bool
> > MasterConfigTreeNode::commit_changes(TaskManager&
> > task_manager,
> > ? ? ? ? ? ? ? ?
> > ? ? bool do_commit,
> > ? ? ? ? ? ? ? ?
> > ? ? int depth, int last_depth,
> > ? ? ? ? ? ? ? ?
> > ? ? string& error_msg,
> > ? ? ? ? ? ? ? ?
> > ? ? bool& needs_activate,
> > ? ? ? ? ? ? ? ?
> > ? ? bool& needs_update)
> > {
> > 
> > 
> > Commands are added directly by some parser, probably
> of the
> > .xif files or something like that.
> > 
> > Probably would take enabling logging and then reading
> the
> > logs very carefully to figure out
> > exactly how it actually works.
> > 
> > Thanks,
> > Ben
> > 
> > -- Ben Greear <greearb at candelatech.com>
> > Candela Technologies Inc? http://www.candelatech.com
> > 
> > 
> > 
> 
> 
> ? ? ? 
> 
> _______________________________________________
> Xorp-hackers mailing list
> Xorp-hackers at icir.org
> http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/xorp-hackers
> 


From lizhaous2000 at yahoo.com  Mon Oct 12 12:52:10 2009
From: lizhaous2000 at yahoo.com (Li Zhao)
Date: Mon, 12 Oct 2009 12:52:10 -0700 (PDT)
Subject: [Xorp-hackers] static xrl interface calls
In-Reply-To: <4ACFA9BB.3030806@candelatech.com>
Message-ID: <269824.42514.qm@web58707.mail.re1.yahoo.com>


I have used gdb and cscope to trace the code flow as following:
commit_changes -> send_apply_config_change -> | rtrmgr_0_1_apply_config_change ->apply_config_change -> change_config -> commit_change_pass1 -> commit_change_pass2 -> commit_changes.

But i still can not find the code in rtrmgr explicitly calling (ANY) xrl interface functions to any target module.
On the other hand the target mudule did receive STCP ios and the corresponding target functions were called.

I do not think in the case of adding static route rtrmgr can talk to fea directly. The only puzzle was how on the earth rtrmgr called the function xrlStaticRouteV0p1Client::send_add_route4.

Thanks for you reply.

Li


--- On Fri, 10/9/09, Ben Greear <greearb at candelatech.com> wrote:

> From: Ben Greear <greearb at candelatech.com>
> Subject: Re: [Xorp-hackers] static xrl interface calls
> To: "Li Zhao" <lizhaous2000 at yahoo.com>
> Cc: xorp-hackers at icir.org
> Date: Friday, October 9, 2009, 5:23 PM
> On 10/09/2009 01:05 PM, Li Zhao
> wrote:
> > Actually this is a generic question. For any new
> config coming from xorpsh, how are these xrl client
> functions sent to the target process from rtrmgr?
> 
> Search for 'commit'.? There is some logic in that code
> to send updates to
> modules through xrl commands.
> 
> I think programs also talk directly with fea...I don't
> understand it all that well
> myself at this time.
> 
> Thanks,
> Ben
> 
> >
> > --- On Thu, 10/8/09, Li Zhao<lizhaous2000 at yahoo.com>?
> wrote:
> >
> >> From: Li Zhao<lizhaous2000 at yahoo.com>
> >> Subject: [Xorp-hackers] static xrl interface
> calls
> >> To: xorp-hackers at icir.org
> >> Date: Thursday, October 8, 2009, 11:22 AM
> >> As document said,
> >> XrlStaticRoutesV0p1Client::send_add_route4 is
> called from
> >> rtrmgr. But actually i do not see that symbol in
> rtrmgr.
> >> Actually i do not see any process is calling this
> method. On
> >> the other hand, target call
> >> XrlStaticRoutsNode::static_routes_0_1_add_route4
> was called
> >> on xorp_static_routes. I do not know how was this
> triggered.
> >> Can any body explain to me? Thanks.
> >>
> >> Li
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> Xorp-hackers mailing list
> >> Xorp-hackers at icir.org
> >> http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/xorp-hackers
> >>
> >
> >
> >
> >
> > _______________________________________________
> > Xorp-hackers mailing list
> > Xorp-hackers at icir.org
> > http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/xorp-hackers
> 
> 
> -- 
> Ben Greear <greearb at candelatech.com>
> Candela Technologies Inc? http://www.candelatech.com
> 
> 


From lizhaous2000 at yahoo.com  Mon Oct 12 12:55:04 2009
From: lizhaous2000 at yahoo.com (Li Zhao)
Date: Mon, 12 Oct 2009 12:55:04 -0700 (PDT)
Subject: [Xorp-hackers] Fw: Re:  static xrl interface calls
Message-ID: <338944.44997.qm@web58707.mail.re1.yahoo.com>


--- On Mon, 10/12/09, Ben Greear <greearb at candelatech.com> wrote:

 From: Ben Greear <greearb at candelatech.com>
 Subject: Re: [Xorp-hackers] static xrl interface calls
 To: "Li Zhao" <lizhaous2000 at yahoo.com>
 Cc: xorp-hackers at icir.org
 Date: Monday, October 12, 2009, 11:44 AM
 Li Zhao wrote:
> > I have used gdb and cscope to trace the code flow as
> following:
> > commit_changes -> send_apply_config_change -> |
> rtrmgr_0_1_apply_config_change ->apply_config_change
> -> change_config -> commit_change_pass1 ->
> commit_change_pass2 -> commit_changes.
> > 
> > But i still can not find the code in rtrmgr explicitly
> calling (ANY) xrl interface functions to any target module.
> > On the other hand the target mudule did receive STCP
> ios and the corresponding target functions were called.
> > 
> > I do not think in the case of adding static route
> rtrmgr can talk to fea directly. The only puzzle was how on
> the earth rtrmgr called the function
> xrlStaticRouteV0p1Client::send_add_route4.
> > 
> > Thanks for you reply.
> >???
> 
 Damn...what complicated code.? Just spent an hours
 trying to follow the commit
 logic.
 
 Anyway, I think it comes down to TaskXrlItem
 
 An entry point to this code might be:
 
 template_commands.cc:
 int
 XrlAction::execute(const MasterConfigTreeNode& ctn,
 ? ? ? ? ? TaskManager&
 task_manager,
 ? ? ? ? ? XrlRouter::XrlCallback
 cb) const
 
 called from:
 module_command.cc:
 void
 ModuleCommand::add_action(const list<string>&
 action, const XRLdb& xrldb)
 ???throw (ParseError)
 {
 
 I cannot figure exactly how this ties back in, but I think
 all of this must be called from:
 
 master_conf_tree_node.cc:
 bool
 MasterConfigTreeNode::commit_changes(TaskManager&
 task_manager,
 ? ? ? ? ? ? ? ?
 ? ? bool do_commit,
 ? ? ? ? ? ? ? ?
 ? ? int depth, int last_depth,
 ? ? ? ? ? ? ? ?
 ? ? string& error_msg,
 ? ? ? ? ? ? ? ?
 ? ? bool& needs_activate,
 ? ? ? ? ? ? ? ?
 ? ? bool& needs_update)
 {
  
 Commands are added directly by some parser, probably of the
 .xif files or something like that.
 
 Probably would take enabling logging and then reading the
 logs very carefully to figure out
 exactly how it actually works.
 
 Thanks,
 Ben
 
 -- Ben Greear <greearb at candelatech.com>
 Candela Technologies Inc? http://www.candelatech.com
 
 
From lizhaous2000 at yahoo.com  Mon Oct 12 12:56:33 2009
From: lizhaous2000 at yahoo.com (Li Zhao)
Date: Mon, 12 Oct 2009 12:56:33 -0700 (PDT)
Subject: [Xorp-hackers] static xrl interface calls
In-Reply-To: <4AD34ED7.4090902@candelatech.com>
Message-ID: <285054.67829.qm@web58702.mail.re1.yahoo.com>

You are right. We are finding the same thing.
What I found is the configure tree node is traversed. Please watch etc/template/static_routes.tp:

route @: ipv4net {
      %create xrl "$(static.targetname)/static_routes/0.1/add_route4?..."

What happened was when the leaf node was processed, the corresponding command will call Command::execute which in turn will call XrlAction::execute. It was adding an xrl to the task manager so the task manager will have a penfing action. That is why I can not see explicate call to XrlStaticRouteV0p1Client methods.

I am studing now how the task manager is mapping from xrl->_action->_request to the real xrl calls.

I am getting much closer now.

Thanks.

Li


--- On Mon, 10/12/09, Ben Greear <greearb at candelatech.com> wrote:

> From: Ben Greear <greearb at candelatech.com>
> Subject: Re: [Xorp-hackers] static xrl interface calls
> To: "Li Zhao" <lizhaous2000 at yahoo.com>
> Cc: xorp-hackers at icir.org
> Date: Monday, October 12, 2009, 11:44 AM
> Li Zhao wrote:
> > I have used gdb and cscope to trace the code flow as
> following:
> > commit_changes -> send_apply_config_change -> |
> rtrmgr_0_1_apply_config_change ->apply_config_change
> -> change_config -> commit_change_pass1 ->
> commit_change_pass2 -> commit_changes.
> > 
> > But i still can not find the code in rtrmgr explicitly
> calling (ANY) xrl interface functions to any target module.
> > On the other hand the target mudule did receive STCP
> ios and the corresponding target functions were called.
> > 
> > I do not think in the case of adding static route
> rtrmgr can talk to fea directly. The only puzzle was how on
> the earth rtrmgr called the function
> xrlStaticRouteV0p1Client::send_add_route4.
> > 
> > Thanks for you reply.
> >???
> 
> Damn...what complicated code.? Just spent an hours
> trying to follow the commit
> logic.
> 
> Anyway, I think it comes down to TaskXrlItem
> 
> An entry point to this code might be:
> 
> template_commands.cc:
> int
> XrlAction::execute(const MasterConfigTreeNode& ctn,
> ? ? ? ? ? TaskManager&
> task_manager,
> ? ? ? ? ? XrlRouter::XrlCallback
> cb) const
> 
> called from:
> module_command.cc:
> void
> ModuleCommand::add_action(const list<string>&
> action, const XRLdb& xrldb)
> ???throw (ParseError)
> {
> 
> I cannot figure exactly how this ties back in, but I think
> all of this must be called from:
> 
> master_conf_tree_node.cc:
> bool
> MasterConfigTreeNode::commit_changes(TaskManager&
> task_manager,
> ? ? ? ? ? ? ? ?
> ? ? bool do_commit,
> ? ? ? ? ? ? ? ?
> ? ? int depth, int last_depth,
> ? ? ? ? ? ? ? ?
> ? ? string& error_msg,
> ? ? ? ? ? ? ? ?
> ? ? bool& needs_activate,
> ? ? ? ? ? ? ? ?
> ? ? bool& needs_update)
> {
> 
> 
> Commands are added directly by some parser, probably of the
> .xif files or something like that.
> 
> Probably would take enabling logging and then reading the
> logs very carefully to figure out
> exactly how it actually works.
> 
> Thanks,
> Ben
> 
> -- Ben Greear <greearb at candelatech.com>
> Candela Technologies Inc? http://www.candelatech.com
> 
> 
> 


From lizhaous2000 at yahoo.com  Mon Oct 12 12:58:28 2009
From: lizhaous2000 at yahoo.com (Li Zhao)
Date: Mon, 12 Oct 2009 12:58:28 -0700 (PDT)
Subject: [Xorp-hackers] static xrl interface calls
Message-ID: <947168.479.qm@web58705.mail.re1.yahoo.com>

The last piece of puzzle was solved. The task which was added to the task manager was a TaskXrlItem. When this task was fired, the execute method in TaskXrlItem was asking _xorp_client to send a unresolved xrl request.

The reason why XrlStaticRoutesV0p1Client methods were not called, I guess, was because rtrmgr needs to utilize its task and taskmanager mechanism. If there is another process which does not have moduel, task, or taskmanager, then XrlStaticRoutesV0p1Client methods can be used directly and will have the similar code flow eventually.

Li


--- On Mon, 10/12/09, Li Zhao <lizhaous2000 at yahoo.com> wrote:

> From: Li Zhao <lizhaous2000 at yahoo.com>
> Subject: Re: [Xorp-hackers] static xrl interface calls
> To: "Ben Greear" <greearb at candelatech.com>
> Cc: xorp-hackers at icir.org
> Date: Monday, October 12, 2009, 12:50 PM
> You are right. We are finding the
> same thing.
> What I found is the configure tree node is traversed.
> Please watch etc/template/static_routes.tp:
> 
> route @: ipv4net {
> ? ? ? %create xrl
> "$(static.targetname)/static_routes/0.1/add_route4?..."
> 
> What happened was when the leaf node was processed, the
> corresponding command will call Command::execute which in
> turn will call XrlAction::execute. It was adding an xrl to
> the task manager so the task manager will have a penfing
> action. That is why I can not see explicate call to
> XrlStaticRouteV0p1Client methods.
> 
> I am studing now how the task manager is mapping from
> xrl->_action->_request to the real xrl calls.
> 
> I am getting much closer now.
> 
> Thanks.
> 
> Li
> 
> 
> --- On Mon, 10/12/09, Ben Greear <greearb at candelatech.com>
> wrote:
> 
> > From: Ben Greear <greearb at candelatech.com>
> > Subject: Re: [Xorp-hackers] static xrl interface
> calls
> > To: "Li Zhao" <lizhaous2000 at yahoo.com>
> > Cc: xorp-hackers at icir.org
> > Date: Monday, October 12, 2009, 11:44 AM
> > Li Zhao wrote:
> > > I have used gdb and cscope to trace the code flow
> as
> > following:
> > > commit_changes -> send_apply_config_change
> -> |
> > rtrmgr_0_1_apply_config_change
> ->apply_config_change
> > -> change_config -> commit_change_pass1 ->
> > commit_change_pass2 -> commit_changes.
> > > 
> > > But i still can not find the code in rtrmgr
> explicitly
> > calling (ANY) xrl interface functions to any target
> module.
> > > On the other hand the target mudule did receive
> STCP
> > ios and the corresponding target functions were
> called.
> > > 
> > > I do not think in the case of adding static
> route
> > rtrmgr can talk to fea directly. The only puzzle was
> how on
> > the earth rtrmgr called the function
> > xrlStaticRouteV0p1Client::send_add_route4.
> > > 
> > > Thanks for you reply.
> > >???
> > 
> > Damn...what complicated code.? Just spent an hours
> > trying to follow the commit
> > logic.
> > 
> > Anyway, I think it comes down to TaskXrlItem
> > 
> > An entry point to this code might be:
> > 
> > template_commands.cc:
> > int
> > XrlAction::execute(const MasterConfigTreeNode&
> ctn,
> > ? ? ? ? ? TaskManager&
> > task_manager,
> > ? ? ? ? ? XrlRouter::XrlCallback
> > cb) const
> > 
> > called from:
> > module_command.cc:
> > void
> > ModuleCommand::add_action(const
> list<string>&
> > action, const XRLdb& xrldb)
> > ???throw (ParseError)
> > {
> > 
> > I cannot figure exactly how this ties back in, but I
> think
> > all of this must be called from:
> > 
> > master_conf_tree_node.cc:
> > bool
> > MasterConfigTreeNode::commit_changes(TaskManager&
> > task_manager,
> > ? ? ? ? ? ? ? ?
> > ? ? bool do_commit,
> > ? ? ? ? ? ? ? ?
> > ? ? int depth, int last_depth,
> > ? ? ? ? ? ? ? ?
> > ? ? string& error_msg,
> > ? ? ? ? ? ? ? ?
> > ? ? bool& needs_activate,
> > ? ? ? ? ? ? ? ?
> > ? ? bool& needs_update)
> > {
> > 
> > 
> > Commands are added directly by some parser, probably
> of the
> > .xif files or something like that.
> > 
> > Probably would take enabling logging and then reading
> the
> > logs very carefully to figure out
> > exactly how it actually works.
> > 
> > Thanks,
> > Ben
> > 
> > -- Ben Greear <greearb at candelatech.com>
> > Candela Technologies Inc? http://www.candelatech.com
> > 
> > 
> > 
> 
> 
> 
> 


From greearb at candelatech.com  Tue Oct 13 10:15:58 2009
From: greearb at candelatech.com (Ben Greear)
Date: Tue, 13 Oct 2009 10:15:58 -0700
Subject: [Xorp-hackers] static xrl interface calls
In-Reply-To: <947168.479.qm@web58705.mail.re1.yahoo.com>
References: <947168.479.qm@web58705.mail.re1.yahoo.com>
Message-ID: <4AD4B5CE.1030602@candelatech.com>

On 10/12/2009 12:58 PM, Li Zhao wrote:
> The last piece of puzzle was solved. The task which was added to the task manager was a TaskXrlItem. When this task was fired, the execute method in TaskXrlItem was asking _xorp_client to send a unresolved xrl request.
>
> The reason why XrlStaticRoutesV0p1Client methods were not called, I guess, was because rtrmgr needs to utilize its task and taskmanager mechanism. If there is another process which does not have moduel, task, or taskmanager, then XrlStaticRoutesV0p1Client methods can be used directly and will have the similar code flow eventually.
>
> Li

So, did you get this working?  If you have a patch, please post it...

Thanks,
Ben

-- 
Ben Greear <greearb at candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


From lizhaous2000 at yahoo.com  Tue Oct 13 11:37:31 2009
From: lizhaous2000 at yahoo.com (Li Zhao)
Date: Tue, 13 Oct 2009 11:37:31 -0700 (PDT)
Subject: [Xorp-hackers] static xrl interface calls
In-Reply-To: <4AD4B5CE.1030602@candelatech.com>
Message-ID: <193113.5326.qm@web58707.mail.re1.yahoo.com>


I am studying the code so I have not coded anything. What I am working on is to write a control plane process which will add and delete some special static routes. These static routes can be redistributed by ospf etc. The the new daemon will use the xrl interface calls. I do not want this process talk to rtrmgr because the config tree structure is adding unnecessary complixity. This new process can be started by rtrmgr when rtrmgr starts. Then I want this new process update the static routes directly to xorp_static_routes. Then the problem is how to start xorp_static_routes and its depending processes like fea/fib/policy and make them working properly with xrl finder. This is a really a pain for me because I have just started to learn xorp for a few weeks.

I am thinking if there is a simple API by which a process other than xorpsh can ask rtrmgr to start static_routes.

Another problem. Commit is taking awkawrdly long time.

Thanks.
--- On Tue, 10/13/09, Ben Greear <greearb at candelatech.com> wrote:

> From: Ben Greear <greearb at candelatech.com>
> Subject: Re: [Xorp-hackers] static xrl interface calls
> To: "Li Zhao" <lizhaous2000 at yahoo.com>
> Cc: xorp-hackers at icir.org
> Date: Tuesday, October 13, 2009, 1:15 PM
> On 10/12/2009 12:58 PM, Li Zhao
> wrote:
> > The last piece of puzzle was solved. The task which
> was added to the task manager was a TaskXrlItem. When this
> task was fired, the execute method in TaskXrlItem was asking
> _xorp_client to send a unresolved xrl request.
> >
> > The reason why XrlStaticRoutesV0p1Client methods were
> not called, I guess, was because rtrmgr needs to utilize its
> task and taskmanager mechanism. If there is another process
> which does not have moduel, task, or taskmanager, then
> XrlStaticRoutesV0p1Client methods can be used directly and
> will have the similar code flow eventually.
> >
> > Li
> 
> So, did you get this working?? If you have a patch,
> please post it...
> 
> Thanks,
> Ben
> 
> -- 
> Ben Greear <greearb at candelatech.com>
> Candela Technologies Inc? http://www.candelatech.com
> 
> 


From lizhaous2000 at yahoo.com  Tue Oct 13 11:37:31 2009
From: lizhaous2000 at yahoo.com (Li Zhao)
Date: Tue, 13 Oct 2009 11:37:31 -0700 (PDT)
Subject: [Xorp-hackers] static xrl interface calls
In-Reply-To: <4AD4B5CE.1030602@candelatech.com>
Message-ID: <193113.5326.qm@web58707.mail.re1.yahoo.com>


I am studying the code so I have not coded anything. What I am working on is to write a control plane process which will add and delete some special static routes. These static routes can be redistributed by ospf etc. The the new daemon will use the xrl interface calls. I do not want this process talk to rtrmgr because the config tree structure is adding unnecessary complixity. This new process can be started by rtrmgr when rtrmgr starts. Then I want this new process update the static routes directly to xorp_static_routes. Then the problem is how to start xorp_static_routes and its depending processes like fea/fib/policy and make them working properly with xrl finder. This is a really a pain for me because I have just started to learn xorp for a few weeks.

I am thinking if there is a simple API by which a process other than xorpsh can ask rtrmgr to start static_routes.

Another problem. Commit is taking awkawrdly long time.

Thanks.
--- On Tue, 10/13/09, Ben Greear <greearb at candelatech.com> wrote:

> From: Ben Greear <greearb at candelatech.com>
> Subject: Re: [Xorp-hackers] static xrl interface calls
> To: "Li Zhao" <lizhaous2000 at yahoo.com>
> Cc: xorp-hackers at icir.org
> Date: Tuesday, October 13, 2009, 1:15 PM
> On 10/12/2009 12:58 PM, Li Zhao
> wrote:
> > The last piece of puzzle was solved. The task which
> was added to the task manager was a TaskXrlItem. When this
> task was fired, the execute method in TaskXrlItem was asking
> _xorp_client to send a unresolved xrl request.
> >
> > The reason why XrlStaticRoutesV0p1Client methods were
> not called, I guess, was because rtrmgr needs to utilize its
> task and taskmanager mechanism. If there is another process
> which does not have moduel, task, or taskmanager, then
> XrlStaticRoutesV0p1Client methods can be used directly and
> will have the similar code flow eventually.
> >
> > Li
> 
> So, did you get this working?? If you have a patch,
> please post it...
> 
> Thanks,
> Ben
> 
> -- 
> Ben Greear <greearb at candelatech.com>
> Candela Technologies Inc? http://www.candelatech.com
> 
> 


From greearb at candelatech.com  Tue Oct 13 11:51:49 2009
From: greearb at candelatech.com (Ben Greear)
Date: Tue, 13 Oct 2009 11:51:49 -0700
Subject: [Xorp-hackers] static xrl interface calls
In-Reply-To: <193113.5326.qm@web58707.mail.re1.yahoo.com>
References: <193113.5326.qm@web58707.mail.re1.yahoo.com>
Message-ID: <4AD4CC45.80603@candelatech.com>

On 10/13/2009 11:37 AM, Li Zhao wrote:
>
> I am studying the code so I have not coded anything. What I am working on is to write a control plane process which will add and delete some special static routes. These static routes can be redistributed by ospf etc. The the new daemon will use the xrl interface calls. I do not want this process talk to rtrmgr because the config tree structure is adding unnecessary complixity. This new process can be started by rtrmgr when rtrmgr starts. Then I want this new process update the static routes directly to xorp_static_routes. Then the problem is how to start xorp_static_routes and its depending processes like fea/fib/policy and make them working properly with xrl finder. This is a really a pain for me because I have just started to learn xorp for a few weeks.

Can you just have the control plane process call xorpsh to have it update
routes in the existing static-routes logic?  I've used xorpsh in similar manner
to update IPs, interfaces, etc and it has worked reasonably well (after I fixed
a lot of bugs with dynamic interfaces!)

> I am thinking if there is a simple API by which a process other than xorpsh can ask rtrmgr to start static_routes.
>
> Another problem. Commit is taking awkawrdly long time.

I fixed the commit problem in my tree:

http://www.candelatech.com/oss/xorp-ct.html

I get commit times of about 0.10 to 0.20 seconds now (counting launching xorpsh).

Thanks,
Ben

-- 
Ben Greear <greearb at candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


From lizhaous2000 at yahoo.com  Tue Oct 13 12:22:25 2009
From: lizhaous2000 at yahoo.com (Li Zhao)
Date: Tue, 13 Oct 2009 12:22:25 -0700 (PDT)
Subject: [Xorp-hackers] static xrl interface calls
In-Reply-To: <4AD4CC45.80603@candelatech.com>
Message-ID: <267244.59385.qm@web58704.mail.re1.yahoo.com>

That was my first plan. But I thought I do not want unnecessay complexities related to config control, so I tried to first ask rtrmgr to start static_routes, then use the channel between daemon and static_routes directly to update static routes. But a big problem is that if a user use xorpsh CLI to "delete protocol static", then my daemon will not only lose the channel to static_routes which is terminated by CLI, but also will lose all the static routes installed by my daemon. Basically xorpsh CLI sessions can not cooperate with my daemon.

I am still looking for a good design.

--- On Tue, 10/13/09, Ben Greear <greearb at candelatech.com> wrote:

> From: Ben Greear <greearb at candelatech.com>
> Subject: Re: [Xorp-hackers] static xrl interface calls
> To: "Li Zhao" <lizhaous2000 at yahoo.com>
> Cc: xorp-hackers at icir.org
> Date: Tuesday, October 13, 2009, 2:51 PM
> On 10/13/2009 11:37 AM, Li Zhao
> wrote:
> >
> > I am studying the code so I have not coded anything.
> What I am working on is to write a control plane process
> which will add and delete some special static routes. These
> static routes can be redistributed by ospf etc. The the new
> daemon will use the xrl interface calls. I do not want this
> process talk to rtrmgr because the config tree structure is
> adding unnecessary complixity. This new process can be
> started by rtrmgr when rtrmgr starts. Then I want this new
> process update the static routes directly to
> xorp_static_routes. Then the problem is how to start
> xorp_static_routes and its depending processes like
> fea/fib/policy and make them working properly with xrl
> finder. This is a really a pain for me because I have just
> started to learn xorp for a few weeks.
> 
> Can you just have the control plane process call xorpsh to
> have it update
> routes in the existing static-routes logic?? I've used
> xorpsh in similar manner
> to update IPs, interfaces, etc and it has worked reasonably
> well (after I fixed
> a lot of bugs with dynamic interfaces!)
> 
> > I am thinking if there is a simple API by which a
> process other than xorpsh can ask rtrmgr to start
> static_routes.
> >
> > Another problem. Commit is taking awkawrdly long
> time.
> 
> I fixed the commit problem in my tree:
> 
> http://www.candelatech.com/oss/xorp-ct.html
> 
> I get commit times of about 0.10 to 0.20 seconds now
> (counting launching xorpsh).
> 
> Thanks,
> Ben
> 
> -- 
> Ben Greear <greearb at candelatech.com>
> Candela Technologies Inc? http://www.candelatech.com
> 
> 


From greearb at candelatech.com  Tue Oct 13 13:36:08 2009
From: greearb at candelatech.com (Ben Greear)
Date: Tue, 13 Oct 2009 13:36:08 -0700
Subject: [Xorp-hackers] static xrl interface calls
In-Reply-To: <267244.59385.qm@web58704.mail.re1.yahoo.com>
References: <267244.59385.qm@web58704.mail.re1.yahoo.com>
Message-ID: <4AD4E4B8.2030008@candelatech.com>

On 10/13/2009 12:22 PM, Li Zhao wrote:
> That was my first plan. But I thought I do not want unnecessay complexities related to config control, so I tried to first ask rtrmgr to start static_routes, then use the channel between daemon and static_routes directly to update static routes. But a big problem is that if a user use xorpsh CLI to "delete protocol static", then my daemon will not only lose the channel to static_routes which is terminated by CLI, but also will lose all the static routes installed by my daemon. Basically xorpsh CLI sessions can not cooperate with my daemon.
>
> I am still looking for a good design.

If your daemon communicates to xorp through xorpsh, it seems like it would work OK.

A user could always screw something by manually messing with xorpsh (or
doing worse things on the linux command-line, for example).

Maybe you are worried about concurrent xorpsh usage by your script and
a user?  I'm not sure how that would work..but I can imagine it being
a problem.

Thanks,
Ben

-- 
Ben Greear <greearb at candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


From lizhaous2000 at yahoo.com  Tue Oct 13 18:49:54 2009
From: lizhaous2000 at yahoo.com (Li Zhao)
Date: Tue, 13 Oct 2009 18:49:54 -0700 (PDT)
Subject: [Xorp-hackers] static xrl interface calls
In-Reply-To: <4AD4E4B8.2030008@candelatech.com>
Message-ID: <375084.43781.qm@web58703.mail.re1.yahoo.com>

Basically I am adding a new application process to the xorp linux router. That application requires xorp_static_routes running and it periodically 
updates the static routes through xrl interface API. Because it is a router, an administrator can easily configure CLI via command "delete protocol static" and it will end up with terminating xorp_static_routes and removing static routes from rib.
--- On Tue, 10/13/09, Ben Greear <greearb at candelatech.com> wrote:

> From: Ben Greear <greearb at candelatech.com>
> Subject: Re: [Xorp-hackers] static xrl interface calls
> To: "Li Zhao" <lizhaous2000 at yahoo.com>
> Cc: xorp-hackers at icir.org
> Date: Tuesday, October 13, 2009, 4:36 PM
> On 10/13/2009 12:22 PM, Li Zhao
> wrote:
> > That was my first plan. But I thought I do not want
> unnecessay complexities related to config control, so I
> tried to first ask rtrmgr to start static_routes, then use
> the channel between daemon and static_routes directly to
> update static routes. But a big problem is that if a user
> use xorpsh CLI to "delete protocol static", then my daemon
> will not only lose the channel to static_routes which is
> terminated by CLI, but also will lose all the static routes
> installed by my daemon. Basically xorpsh CLI sessions can
> not cooperate with my daemon.
> >
> > I am still looking for a good design.
> 
> If your daemon communicates to xorp through xorpsh, it
> seems like it would work OK.
> 
> A user could always screw something by manually messing
> with xorpsh (or
> doing worse things on the linux command-line, for
> example).
> 
> Maybe you are worried about concurrent xorpsh usage by your
> script and
> a user?? I'm not sure how that would work..but I can
> imagine it being
> a problem.
> 
> Thanks,
> Ben
> 
> -- 
> Ben Greear <greearb at candelatech.com>
> Candela Technologies Inc? http://www.candelatech.com
> 
> 


From globalcouchsurfer at gmail.com  Mon Oct 19 03:14:52 2009
From: globalcouchsurfer at gmail.com (CouchSurfer)
Date: Mon, 19 Oct 2009 11:14:52 +0100
Subject: [Xorp-hackers] IPv4
Message-ID: <2148c4de0910190314m76ae69c1k6053e6a67bb37bf0@mail.gmail.com>

Hi guys,

Just wanted to say that I tried to disable IPv6 when compiling xorp on
a CentOS box and it produced a number of ipv6 error messages.

Thanks


From globalcouchsurfer at gmail.com  Mon Oct 19 03:23:07 2009
From: globalcouchsurfer at gmail.com (CouchSurfer)
Date: Mon, 19 Oct 2009 11:23:07 +0100
Subject: [Xorp-hackers] vlan Config
Message-ID: <2148c4de0910190323r50407e2ct35b2c247fb95c859@mail.gmail.com>

I am having trouble configuring vlan interfaces - or indeed vif's in
general - in xorp.

For example, if I have something like
interfaces {
  interface eth0 {
    vif eth0 {
      ...
    }
    vif xxx {
      [vlan {
        vlan-id: yyy
      }]
      ...
    }
  }
}
I get an error saying cannot create interface eth0/xxx regardless of
how I name the vif. (i have trued including and excluding the clause
inside the square brackets)

The only way I can get around this is if I create the interface
manually in the system (such as eth0.20 etc) and use these in the
config file without using the vlan clause.

Can anyone tell me whether doing th above would affect vlan tagging?

Thanks


From globalcouchsurfer at gmail.com  Mon Oct 19 03:37:22 2009
From: globalcouchsurfer at gmail.com (CouchSurfer)
Date: Mon, 19 Oct 2009 11:37:22 +0100
Subject: [Xorp-hackers] BGP Config
Message-ID: <2148c4de0910190337y5d1c39bctb3d8199c1b24373b@mail.gmail.com>

I am havig problem with my BGP configuration. So far it seems I have
configured basic essentials such as AS numbers, peers IPs ,next-hop
and ipv4-unicast settings. On running xorp with this configuration, I
can see the routes from my BGP peers. However, apparently my routes
are not being distributed.

Basically, I want to (in cisco terms) redistribute my static (and
connected) routes. I have created protocols->static. I have also
created two policies  ( and applied them as export and import
respectively) as follow:
policy
  policy-statement "to_bgp"
    term 0
      from
        protocol: static
      then
        accept
    term 1
      from
        protocol: bgp
      then
        accept
  policy-statement "from_bgp"
    term 0
      from
        protocol: static
      then
        accept
    term 1
      from
      then
        accept
However, my routes are still not being distributed. I was wondering if
anyone can help me on this matter.

Thanks


From bms at incunabulum.net  Mon Oct 19 07:59:36 2009
From: bms at incunabulum.net (Bruce Simpson)
Date: Mon, 19 Oct 2009 15:59:36 +0100
Subject: [Xorp-hackers] PATCH:  Logging improvements,
 fix artificial deal for xorpsh commit.
In-Reply-To: <4ACCC6F7.2020404@candelatech.com>
References: <4ACA7A19.30909@candelatech.com> <4ACB3B92.3050505@incunabulum.net>
	<4ACB84B2.2010909@candelatech.com>
	<4ACC5300.3010703@incunabulum.net>
	<4ACCC6F7.2020404@candelatech.com>
Message-ID: <4ADC7ED8.4020109@incunabulum.net>

Ben Greear wrote:
> ...
>>
>> %j and intmax_t is ISO C99 portable. It sucks because it means casting
>> to the widest integer type on the platform, but it's a known quantity.
>> 'long long' has been a problem since well before Sun brought out 
>> SPARCV9.
>
> From MS's page, they may not support %j (or %ll for that matter).  Maybe
> the just don't document it:

There are a number of places where MS don't fully comply with the ISO 
C99 spec in either their CL.EXE compiler or the runtime library 
MSVCRT.DLL, this is but one of them.

They have made more progress towards this in MSVC 7 and 8, but it's 
still far from ideal. The snprintf() behaviour took a bit of hacking to 
track down in the textual XRL code.

I'd still be much happier if intmax_t is used, because it's a portable 
code construct.

cheers,
BMS


From bms at incunabulum.net  Mon Oct 19 08:04:55 2009
From: bms at incunabulum.net (Bruce Simpson)
Date: Mon, 19 Oct 2009 16:04:55 +0100
Subject: [Xorp-hackers] PATCH:  Allow delayed start of PIM vif
In-Reply-To: <4ACD6B16.6080500@candelatech.com>
References: <4ACD6B16.6080500@candelatech.com>
Message-ID: <4ADC8017.9010507@incunabulum.net>

Thanks for the patch.

If you can preserve existing code style, then it's more likely changes 
can be taken as-is (i.e. don't use camelCase if possible, opening brace 
of {} block on separate line for methods, etc). I'd probably call the 
flag 'start_is_pending'.

What I'm likely to do, when I return (I'm catching up on email now, 
although I'm still on my break, and might have some social stuff going 
on when I return to London) is to flag patches for possible future 
inclusion. I really need to finish what I've started with XRL; it's 
probably easier to deal with stuff like this as a sweep during a 1.7-RC.

thanks,
BMS


From bms at incunabulum.net  Mon Oct 19 08:06:51 2009
From: bms at incunabulum.net (Bruce Simpson)
Date: Mon, 19 Oct 2009 16:06:51 +0100
Subject: [Xorp-hackers] static xrl interface calls
In-Reply-To: <935513.83788.qm@web58707.mail.re1.yahoo.com>
References: <935513.83788.qm@web58707.mail.re1.yahoo.com>
Message-ID: <4ADC808B.5020302@incunabulum.net>

Li Zhao wrote:
> Actually this is a generic question. For any new config coming from xorpsh, how are these xrl client functions sent to the target process from rtrmgr?
>   

The Router Manager uses the textual Finder protocol to make indirect XRL 
method calls, as it parses the configuration tree; it does not use the 
C++ bindings directly. Please see the '*.xrls' files generated as part 
of the XRL stubs.

thanks,
BMS


From bms at incunabulum.net  Mon Oct 19 08:32:52 2009
From: bms at incunabulum.net (Bruce Simpson)
Date: Mon, 19 Oct 2009 16:32:52 +0100
Subject: [Xorp-hackers] IPv4
In-Reply-To: <2148c4de0910190314m76ae69c1k6053e6a67bb37bf0@mail.gmail.com>
References: <2148c4de0910190314m76ae69c1k6053e6a67bb37bf0@mail.gmail.com>
Message-ID: <4ADC86A4.2020609@incunabulum.net>

CouchSurfer wrote:
> Hi guys,
>
> Just wanted to say that I tried to disable IPv6 when compiling xorp on
> a CentOS box and it produced a number of ipv6 error messages.
>   

Patches were recently committed to the tree to fix the IPv6 build, 
please try updating your SVN sources.

If this does not resolve the issue, can you please raise a Trac ticket 
on Sourceforge about this issue and someone can try to look at it during 
the 1.7-RC? Thanks.

regards,
BMS


From bms at incunabulum.net  Mon Oct 19 08:35:09 2009
From: bms at incunabulum.net (Bruce Simpson)
Date: Mon, 19 Oct 2009 16:35:09 +0100
Subject: [Xorp-hackers] Pending SCons configure change
In-Reply-To: <87pr8wa8ke.fsf@orac.acorntoolworks.com>
References: <87pr8wa8ke.fsf@orac.acorntoolworks.com>
Message-ID: <4ADC872D.3070003@incunabulum.net>

J.T. Conklin wrote:
> The default build directory will now be obj/<host>.  Since host will
> now be the standard GNU system tripple, this may result in a rebuild
> and a new object directory (orphaning any objdirs with the old name).
>   

I like this change, thanks for committing it. It does make us dependent 
on a POSIX shell, though, but since we've pretty much ditched Windows 
backwards compatibility, that's fine.

regards,
BMS


From lizhaous2000 at yahoo.com  Mon Oct 19 09:57:08 2009
From: lizhaous2000 at yahoo.com (Li Zhao)
Date: Mon, 19 Oct 2009 09:57:08 -0700 (PDT)
Subject: [Xorp-hackers] static xrl interface calls
In-Reply-To: <4ADC808B.5020302@incunabulum.net>
Message-ID: <909406.86958.qm@web58703.mail.re1.yahoo.com>

Thanks for the reply. I have coded my prototype protocol process. Two things I am still working on. In order to start dependended modules, it takes a long time. Sencond, it static routes is having a depending nodule, I dont want cli to delete xorp_static_routes. C++ xrl interface functions are working fine. My process can use them directly talking to static routes to update the routes. 

--- On Mon, 10/19/09, Bruce Simpson <bms at incunabulum.net> wrote:

> From: Bruce Simpson <bms at incunabulum.net>
> Subject: Re: [Xorp-hackers] static xrl interface calls
> To: "Li Zhao" <lizhaous2000 at yahoo.com>
> Cc: xorp-hackers at icir.org
> Date: Monday, October 19, 2009, 11:06 AM
> Li Zhao wrote:
> > Actually this is a generic question. For any new
> config coming from xorpsh, how are these xrl client
> functions sent to the target process from rtrmgr?
> >???
> 
> The Router Manager uses the textual Finder protocol to make
> indirect XRL method calls, as it parses the configuration
> tree; it does not use the C++ bindings directly. Please see
> the '*.xrls' files generated as part of the XRL stubs.
> 
> thanks,
> BMS
> 
> 


From greearb at candelatech.com  Mon Oct 19 10:29:38 2009
From: greearb at candelatech.com (Ben Greear)
Date: Mon, 19 Oct 2009 10:29:38 -0700
Subject: [Xorp-hackers] static xrl interface calls
In-Reply-To: <909406.86958.qm@web58703.mail.re1.yahoo.com>
References: <909406.86958.qm@web58703.mail.re1.yahoo.com>
Message-ID: <4ADCA202.3040509@candelatech.com>

On 10/19/2009 09:57 AM, Li Zhao wrote:
> Thanks for the reply. I have coded my prototype protocol process. Two things I am still working on. In order to start dependended modules, it takes a long time. Sencond, it static routes is having a depending nodule, I dont want cli to delete xorp_static_routes. C++ xrl interface functions are working fine. My process can use them directly talking to static routes to update the routes.

I also have patches in my tree to start up modules quicker...(removes a 2-second sleep for each module, basically).

But, since this is a one-time cost, it shouldn't be too bad even w/out the patch?

Thanks,
Ben


>
> --- On Mon, 10/19/09, Bruce Simpson<bms at incunabulum.net>  wrote:
>
>> From: Bruce Simpson<bms at incunabulum.net>
>> Subject: Re: [Xorp-hackers] static xrl interface calls
>> To: "Li Zhao"<lizhaous2000 at yahoo.com>
>> Cc: xorp-hackers at icir.org
>> Date: Monday, October 19, 2009, 11:06 AM
>> Li Zhao wrote:
>>> Actually this is a generic question. For any new
>> config coming from xorpsh, how are these xrl client
>> functions sent to the target process from rtrmgr?
>>>
>>
>> The Router Manager uses the textual Finder protocol to make
>> indirect XRL method calls, as it parses the configuration
>> tree; it does not use the C++ bindings directly. Please see
>> the '*.xrls' files generated as part of the XRL stubs.
>>
>> thanks,
>> BMS
>>
>>
>
>
>
>
> _______________________________________________
> Xorp-hackers mailing list
> Xorp-hackers at icir.org
> http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/xorp-hackers


-- 
Ben Greear <greearb at candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


From greearb at candelatech.com  Mon Oct 19 14:04:50 2009
From: greearb at candelatech.com (Ben Greear)
Date: Mon, 19 Oct 2009 14:04:50 -0700
Subject: [Xorp-hackers] PATCH:  Allow delayed start of PIM vif
In-Reply-To: <4ADC8017.9010507@incunabulum.net>
References: <4ACD6B16.6080500@candelatech.com>
	<4ADC8017.9010507@incunabulum.net>
Message-ID: <4ADCD472.2020203@candelatech.com>

On 10/19/2009 08:04 AM, Bruce Simpson wrote:
> Thanks for the patch.
>
> If you can preserve existing code style, then it's more likely changes
> can be taken as-is (i.e. don't use camelCase if possible, opening brace
> of {} block on separate line for methods, etc). I'd probably call the
> flag 'start_is_pending'.
>
> What I'm likely to do, when I return (I'm catching up on email now,
> although I'm still on my break, and might have some social stuff going
> on when I return to London) is to flag patches for possible future
> inclusion. I really need to finish what I've started with XRL; it's
> probably easier to deal with stuff like this as a sweep during a 1.7-RC.

I can change the coding style, but this particular patch is useless
without a bunch of other fixes relating to transient interfaces,
since those hit before this one would be noticeable.

Probably best to wait until the next dev cycle when we can work
towards integrating more of my changes.

With regard to XRL, I've a question:

If an application makes 3 XRL calls:

do_a()
do_b()
commit_all()

Is there any guarantee that these are strictly delivered to
the peer process in the order called?  Code appears to expect
this to be true, but I'm suspicious that perhaps it does not.

Thanks,
Ben

-- 
Ben Greear <greearb at candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


From lizhaous2000 at yahoo.com  Tue Oct 20 05:53:03 2009
From: lizhaous2000 at yahoo.com (Li Zhao)
Date: Tue, 20 Oct 2009 05:53:03 -0700 (PDT)
Subject: [Xorp-hackers] static xrl interface calls
In-Reply-To: <4ADCA202.3040509@candelatech.com>
Message-ID: <442246.87308.qm@web58702.mail.re1.yahoo.com>

If we pick 2 second as sleep time, that might not a good idea.

--- On Mon, 10/19/09, Ben Greear <greearb at candelatech.com> wrote:

> From: Ben Greear <greearb at candelatech.com>
> Subject: Re: [Xorp-hackers] static xrl interface calls
> To: "Li Zhao" <lizhaous2000 at yahoo.com>
> Cc: "Bruce Simpson" <bms at incunabulum.net>, xorp-hackers at icir.org
> Date: Monday, October 19, 2009, 1:29 PM
> On 10/19/2009 09:57 AM, Li Zhao
> wrote:
> > Thanks for the reply. I have coded my prototype
> protocol process. Two things I am still working on. In order
> to start dependended modules, it takes a long time. Sencond,
> it static routes is having a depending nodule, I dont want
> cli to delete xorp_static_routes. C++ xrl interface
> functions are working fine. My process can use them directly
> talking to static routes to update the routes.
> 
> I also have patches in my tree to start up modules
> quicker...(removes a 2-second sleep for each module,
> basically).
> 
> But, since this is a one-time cost, it shouldn't be too bad
> even w/out the patch?
> 
> Thanks,
> Ben
> 
> 
> >
> > --- On Mon, 10/19/09, Bruce Simpson<bms at incunabulum.net>?
> wrote:
> >
> >> From: Bruce Simpson<bms at incunabulum.net>
> >> Subject: Re: [Xorp-hackers] static xrl interface
> calls
> >> To: "Li Zhao"<lizhaous2000 at yahoo.com>
> >> Cc: xorp-hackers at icir.org
> >> Date: Monday, October 19, 2009, 11:06 AM
> >> Li Zhao wrote:
> >>> Actually this is a generic question. For any
> new
> >> config coming from xorpsh, how are these xrl
> client
> >> functions sent to the target process from rtrmgr?
> >>>
> >>
> >> The Router Manager uses the textual Finder
> protocol to make
> >> indirect XRL method calls, as it parses the
> configuration
> >> tree; it does not use the C++ bindings directly.
> Please see
> >> the '*.xrls' files generated as part of the XRL
> stubs.
> >>
> >> thanks,
> >> BMS
> >>
> >>
> >
> >
> >
> >
> > _______________________________________________
> > Xorp-hackers mailing list
> > Xorp-hackers at icir.org
> > http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/xorp-hackers
> 
> 
> -- 
> Ben Greear <greearb at candelatech.com>
> Candela Technologies Inc? http://www.candelatech.com
> 
> 


From greearb at candelatech.com  Tue Oct 20 08:26:20 2009
From: greearb at candelatech.com (Ben Greear)
Date: Tue, 20 Oct 2009 08:26:20 -0700
Subject: [Xorp-hackers] static xrl interface calls
In-Reply-To: <442246.87308.qm@web58702.mail.re1.yahoo.com>
References: <442246.87308.qm@web58702.mail.re1.yahoo.com>
Message-ID: <4ADDD69C.4080909@candelatech.com>

Li Zhao wrote:
> If we pick 2 second as sleep time, that might not a good idea.
>   
I managed to remove it entirely in my tree...with no bad effects so far,
but it requires a relatively large amount of (simple) changes.

Ben

-- 
Ben Greear <greearb at candelatech.com> 
Candela Technologies Inc  http://www.candelatech.com


From lizhaous2000 at yahoo.com  Wed Oct 21 11:49:10 2009
From: lizhaous2000 at yahoo.com (Li Zhao)
Date: Wed, 21 Oct 2009 11:49:10 -0700 (PDT)
Subject: [Xorp-hackers] set_command_map
Message-ID: <472319.96385.qm@web58708.mail.re1.yahoo.com>

set_command_map was only invoked in three files: test_fea_rawlink.cc, 
test_xrl_sockets4_tcp.cc and test_xrl_sockets4_udp.cc. It is interesting to see that: in these three test_main functions:
there is no wait_until_xrl_router_is_ready called. But magically they are working just fine.

I have a process which has a class implemented interface socket4_user/0.1. 
I have wait_until_xrl_router_is_ready. But if I leave out set_command_map, then send_bind and send_listen does not really work properly. That is the packets delivered to the sockets are not passed to my implemented function
socket4_user_0_1_inbound_connect_event or socket4_user_0_1_recv_event.

But in olsr4 and rip, they dont have set_command_map. Maybe because they do
not register send_bind or send_listen?


From lizhaous2000 at yahoo.com  Wed Oct 21 11:49:59 2009
From: lizhaous2000 at yahoo.com (Li Zhao)
Date: Wed, 21 Oct 2009 11:49:59 -0700 (PDT)
Subject: [Xorp-hackers] set_command_map
Message-ID: <112855.60121.qm@web58702.mail.re1.yahoo.com>

set_command_map was only invoked in three files: test_fea_rawlink.cc, 
test_xrl_sockets4_tcp.cc and test_xrl_sockets4_udp.cc. It is interesting to see that: in these three test_main functions:
there is no wait_until_xrl_router_is_ready called. But magically they are working just fine.

I have a process which has a class implemented interface socket4_user/0.1. 
I have wait_until_xrl_router_is_ready. But if I leave out set_command_map, then send_bind and send_listen does not really work properly. That is the packets delivered to the sockets are not passed to my implemented function
socket4_user_0_1_inbound_connect_event or socket4_user_0_1_recv_event.

But in olsr4 and rip, they dont have set_command_map. Maybe because they do
not register send_bind or send_listen?


From bms at incunabulum.net  Thu Oct 22 04:21:48 2009
From: bms at incunabulum.net (Bruce Simpson)
Date: Thu, 22 Oct 2009 12:21:48 +0100
Subject: [Xorp-hackers] set_command_map
In-Reply-To: <472319.96385.qm@web58708.mail.re1.yahoo.com>
References: <472319.96385.qm@web58708.mail.re1.yahoo.com>
Message-ID: <4AE0404C.4010209@incunabulum.net>

Li Zhao wrote:
> I have a process which has a class implemented interface socket4_user/0.1. 
> I have wait_until_xrl_router_is_ready. But if I leave out set_command_map, then send_bind and send_listen does not really work properly. That is the packets delivered to the sockets are not passed to my implemented function
> socket4_user_0_1_inbound_connect_event or socket4_user_0_1_recv_event.
>   

The command map is used implicitly by the XRL target stubs. All 
instances of XrlRouter embed a default implementation of it, which just 
shims to the basic functionality required by any process speaking XRL 
internally.

Normally set_command_map() is called 'behind the scenes' by the XRL 
target stub, and there's no need to override it. However, if you are 
making an XRL endpoint look like a target on the fly, or need to switch 
between multiple XRL target implementations *in the same process*, you 
will need to call this method directly.


From bms at incunabulum.net  Tue Oct 27 08:53:47 2009
From: bms at incunabulum.net (Bruce Simpson)
Date: Tue, 27 Oct 2009 15:53:47 +0000
Subject: [Xorp-hackers] Omitting XrlDB from Router Manager
Message-ID: <4AE7178B.9000709@incunabulum.net>

Hi all,

    I'm still looking at the XRL replacement since I got back from 
holiday, which is why I've been mostly silent on lists.

    Something came up in analysis, which broadly relates to Ben Greear's 
work on reducing Router Manager startup times, etc. and some of the 
questions Li Zhao has been asking in other threads on this list.

@Ben: It would be interesting to know what difference omitting the XRLDB 
code makes to your Router Manager startup times.
 * The XRLDB seems to exist pretty much to validate what's in the 
template files and how the Router Manager uses them, although this is 
done completely at run time.
 * I wonder if disabling this code would make a difference to performance.
 * To do this, I'd hack rtrmgr/template_commands.cc, and comment out the 
calls to the XRLdb methods.
 * The rtrmgr/xrldb.cc is the only place in the whole system where the 
'*.xrls' files are parsed and used. They are used only to validate the 
syntax and structure of potential XRL method calls.
 * It would mean that there is no up-front validation of the XRLs, but 
in practice, this validation step is probably only of interest to people 
developing XORP, to catch problems with template files.
 * It's probably best folded under a compile-time #define for developer use.

@Li: You were looking for information on how XRLs are sent by the Router 
Manager to the XORP routing processes.
  * I've been looking at this code with a view to replacement.
  * This uses an indirect method call and lookup from the finder:// XRLs 
in the *.xrls files.
  * Implementing Thrift directly affects the Router Manager: in 
particular, the core functionality which configures processes by sending 
XRLs to them, in rtrmgr/template_commands.cc, class XrlAction.
   * In any event, because the Router Manager is trying to do method 
calls without an IDL or C++ stubs, using the textual Finder protocol, a 
different mechanism would be needed in Thrift.
    * The '*.tp' template files explicitly identify all argument and 
result types used when configuring a XORP process via an XRL. If these 
are correct, then additional validation shouldn't be needed.
    * Therefore: it's possible to construct a binary blob at runtime, 
using exactly the same techniques as in the clnt-gen Thrifted code 
generator.

cheers,
BMS


From greearb at candelatech.com  Tue Oct 27 15:48:09 2009
From: greearb at candelatech.com (Ben Greear)
Date: Tue, 27 Oct 2009 15:48:09 -0700
Subject: [Xorp-hackers] oprofile reports
Message-ID: <4AE778A9.3020201@candelatech.com>

I'm running 100 xorp instances on a dual-quad core system (E5530, 2.4Ghz)

I let them run a while, which tends to hide the xrl stuff that is mainly
a startup cost.

This is my patched tree, by the way.

FEA is the top entry on the entire OS (but, there are 100 FEAs running, so that's not un-expected)

At least in my code tree, the get_ready_priority is a couple of for loops..and
could probably be optimized, to better ignore fds that are not in use.

samples  %        image name               app name                 symbol name
-------------------------------------------------------------------------------
   22        0.0766  xorp_fea                 xorp_fea                 EventLoop::do_work(bool)
   28687    99.9234  xorp_fea                 xorp_fea                 SelectorList::wait_and_dispatch(TimeVal&
)
27759     6.5897  xorp_fea                 xorp_fea                 SelectorList::get_ready_priority(bool)
   27759    99.6196  xorp_fea                 xorp_fea                 SelectorList::get_ready_priority(bool) [
self]
   54        0.1938  xorp_fea                 xorp_fea                 SelectorList::do_select(timeval*, bool)
   52        0.1866  xorp_fea                 xorp_fea                 std::vector<SelectorList::Node, std::all
ocator<SelectorList::Node> >::operator[](unsigned long)

Nothing else really stands out, except that we are probably creating and deleting a lot
of strings (or something else with an underlying vector in it), which calls memset.

I can't tell from oprofile what the call chain for the memset usage is though,
will look for other ways to get at that...

Thanks,
Ben


-- 
Ben Greear <greearb at candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


From greearb at candelatech.com  Tue Oct 27 18:09:53 2009
From: greearb at candelatech.com (Ben Greear)
Date: Tue, 27 Oct 2009 18:09:53 -0700
Subject: [Xorp-hackers] Crash due to stale cached xrl sender pointer.
Message-ID: <4AE799E1.8010300@candelatech.com>

XRL caches a pointer to the resolved_sender, but when something
deletes a sender, it doesn't appear to clean up any existing XRLs.
This leads to a crash on a highly loaded system (where senders must be 
timing out
or something like that).

Looks like a good place for smart pointers.  I'm going to attempt that 
unless
someone has another idea...

Thanks,
Ben

XrlPFSender*
XrlRouter::get_sender(const Xrl& xrl, FinderDBEntry* dbe)
{
    const Xrl& x = dbe->xrls().front();
    XrlPFSender* s = NULL;

    // Use the cache pointer to the sender.
    if (xrl.resolved()) {
    s = xrl.resolved_sender();

 >>> CRASH HERE, s is pointing to bogus memory..probably deleted and 
scribbled upon:
    if (s->alive())
        return s;

(gdb) bt
#0  0x00000000005e3eac in XrlRouter::get_sender (this=0x7fff13b9ae30, 
xrl=@0x18ca130, dbe=0x18caa38) at libxipc/xrl_router.cc:424
#1  0x00000000005e39f1 in XrlRouter::send_resolved (this=0x7fff13b9ae30, 
xrl=@0x18ca130, dbe=0x18caa38, cb=@0x7fff13b99730, direct_call=true)
    at libxipc/xrl_router.cc:391
#2  0x00000000005e4784 in XrlRouter::send (this=0x7fff13b9ae30, 
xrl=@0x18ca130, user_cb=@0x7fff13b99730) at libxipc/xrl_router.cc:630
#3  0x00000000005bdcc2 in 
XrlRawPacket4V0p1Client::send_register_receiver (this=0x7fff13b997b0, 
dst_xrl_target_name=0x189a938 "fea",
    
xrl_target_instance_name="ospfv2-4a020f2e6b53955e3362796be672a55e at 127.0.0.1", 
if_name="17.100.17", vif_name="17.100.17",
    ip_protocol=@0x7fff13b99808, 
enable_multicast_loopback=@0x7fff13b99807, cb=@0x7fff13b997c0)
    at obj/x86_64-linux-public17/xrl/interfaces/fea_rawpkt4_xif.cc:111
#4  0x00000000004a2956 in XrlIO<IPv4>::enable_interface_vif 
(this=0x7fff13b9abd0, interface="17.100.17", vif="17.100.17") at 
ospf/xrl_io.cc:215
#5  0x0000000000420f73 in Ospf<IPv4>::enable_interface_vif 
(this=0x7fff13b9a8a0, interface="17.100.17", vif="17.100.17") at 
ospf/ospf.cc:130
#6  0x000000000045e578 in PeerOut<IPv4>::start_receiving_packets 
(this=0x18cdb50) at ospf/peer.cc:635
#7  0x000000000045ead4 in PeerOut<IPv4>::bring_up_peering 
(this=0x18cdb50) at ospf/peer.cc:566
#8  0x000000000045c158 in PeerOut<IPv4>::peer_change (this=0x18cdb50) at 
ospf/peer.cc:316
#9  0x000000000045c032 in PeerOut<IPv4>::set_link_status 
(this=0x18cdb50, state=true) at ospf/peer.cc:297
#10 0x000000000043ae82 in PeerManager<IPv4>::vif_status_change 
(this=0x7fff13b9a978, interface="17.100.17", vif="17.100.17", state=true)
    at ospf/peer_manager.cc:789
#11 0x000000000045670e in XorpMemberCallback3B0<void, PeerManager<IPv4>, 
std::string const&, std::string const&, bool>::dispatch (
    this=0x18c4450, a1="17.100.17", a2="17.100.17", a3=true) at 
./libxorp/callback_nodebug.hh:6801
#12 0x00000000004a4ea0 in XrlIO<IPv4>::updates_made 
(this=0x7fff13b9abd0) at ospf/xrl_io.cc:1259
#13 0x0000000000596fa9 in IfMgrXrlMirror::do_updates 
(this=0x7fff13b9aca8) at libfeaclient/ifmgr_xrl_mirror.cc:1168
#14 0x0000000000596e21 in IfMgrXrlMirror::updates_made 
(this=0x7fff13b9aca8) at libfeaclient/ifmgr_xrl_mirror.cc:1145
#15 0x000000000059540e in 
IfMgrXrlMirrorTarget::fea_ifmgr_mirror_0_1_hint_updates_made 
(this=0x18b3c00) at libfeaclient/ifmgr_xrl_mirror.cc:927
#16 0x00000000005cae6a in 
XrlFeaIfmgrMirrorTargetBase::handle_fea_ifmgr_mirror_0_1_hint_updates_made 
(this=0x18b3c00, xa_inputs=@0x18bae28)
    at obj/x86_64-linux-public17/xrl/targets/fea_ifmgr_mirror_base.cc:1362
#17 0x00000000005cb57a in XorpMemberCallback2B0<XrlCmdError const, 
XrlFeaIfmgrMirrorTargetBase, XrlArgs const&, XrlArgs*>::dispatch (
    this=0x18b5f60, a1=@0x18bae28, a2=0x7fff13b99f80) at 
./libxorp/callback_nodebug.hh:4616
#18 0x00000000005f9692 in XrlCmdEntry::dispatch (this=0x18b6008, 
inputs=@0x18bae28, outputs=0x7fff13b99f80) at libxipc/xrl_cmd_map.hh:44
#19 0x000000000060032c in XrlDispatcher::dispatch_xrl_fast 
(this=0x18b3420, xi=@0x18bae10, outputs=@0x7fff13b99f80)
    at libxipc/xrl_dispatcher.cc:83
#20 0x000000000060114a in STCPRequestHandler::do_dispatch 
(this=0x18bf610, packed_xrl=0x7f2947ae1776 "", packed_xrl_bytes=0,
    response=@0x7fff13b99f80) at libxipc/xrl_pf_stcp.cc:288
#21 0x0000000000601237 in STCPRequestHandler::dispatch_request 
(this=0x18bf610, seqno=518, batch=false,
    packed_xrl=0x7f2947ae171f  <incomplete sequence \314>, 
packed_xrl_bytes=87) at libxipc/xrl_pf_stcp.cc:300
#22 0x0000000000600dc9 in STCPRequestHandler::read_event 
(this=0x18bf610, ev=BufferedAsyncReader::DATA, buffer=0x7f2947ae1707 
"STCP\1\1",
    buffer_bytes=111) at libxipc/xrl_pf_stcp.cc:234
---Type <return> to continue, or q <return> to quit---
#23 0x000000000060a12a in XorpMemberCallback4B0<void, 
STCPRequestHandler, BufferedAsyncReader*, BufferedAsyncReader::Event, 
unsigned char*, unsigned long>::dispatch (this=0x18b9410, a1=0x18bf620, 
a2=BufferedAsyncReader::DATA, a3=0x7f2947ae1707 "STCP\1\1", a4=111)
    at ./libxorp/callback_nodebug.hh:8966
#24 0x0000000000620728 in BufferedAsyncReader::announce_event 
(this=0x18bf620, ev=BufferedAsyncReader::DATA) at 
libxorp/buffered_asyncio.cc:261
#25 0x0000000000620600 in BufferedAsyncReader::io_event (this=0x18bf620, 
fd={_filedesc = 48}, type=IOT_READ) at libxorp/buffered_asyncio.cc:214
#26 0x0000000000620eda in XorpMemberCallback2B0<void, 
BufferedAsyncReader, XorpFd, IoEventType>::dispatch (this=0x18bde50, 
a1={_filedesc = 48},
    a2=IOT_READ) at ./libxorp/callback_nodebug.hh:4636
#27 0x0000000000634b46 in SelectorList::Node::run_hooks (this=0x1885990, 
m=SEL_RD, fd={_filedesc = 48}) at libxorp/selector.cc:200
#28 0x0000000000634004 in SelectorList::wait_and_dispatch 
(this=0x7fff13b9a540, timeout=@0x7fff13b9a320) at libxorp/selector.cc:523
#29 0x0000000000622be9 in EventLoop::do_work (this=0x7fff13b9a3b0, 
can_block=true) at libxorp/eventloop.cc:147
#30 0x0000000000622a7e in EventLoop::run (this=0x7fff13b9a3b0) at 
libxorp/eventloop.cc:100
#31 0x000000000040514a in main (argv=0x7fff13b9b098) at 
ospf/xorp_ospfv2.cc:77
(gdb)

-- 
Ben Greear <greearb at candelatech.com> 
Candela Technologies Inc  http://www.candelatech.com


From greearb at candelatech.com  Tue Oct 27 22:14:03 2009
From: greearb at candelatech.com (Ben Greear)
Date: Tue, 27 Oct 2009 22:14:03 -0700
Subject: [Xorp-hackers] Crash due to stale cached xrl sender pointer.
In-Reply-To: <4AE799E1.8010300@candelatech.com>
References: <4AE799E1.8010300@candelatech.com>
Message-ID: <4AE7D31B.1040107@candelatech.com>

Ben Greear wrote:
> XRL caches a pointer to the resolved_sender, but when something
> deletes a sender, it doesn't appear to clean up any existing XRLs.
> This leads to a crash on a highly loaded system (where senders must be 
> timing out
> or something like that).
>
> Looks like a good place for smart pointers.  I'm going to attempt that 
> unless
> someone has another idea...
>   
The attached patch seems to fix the problem.

Thanks,
Ben

-- 
Ben Greear <greearb at candelatech.com> 
Candela Technologies Inc  http://www.candelatech.com


-------------- next part --------------
A non-text attachment was scrubbed...
Name: xorp_xrlsender_ref_ptr.patch
Type: text/x-patch
Size: 13637 bytes
Desc: not available
Url : http://mailman.ICSI.Berkeley.EDU/pipermail/xorp-hackers/attachments/20091027/d06b4c87/attachment.bin 

From greearb at candelatech.com  Wed Oct 28 15:11:40 2009
From: greearb at candelatech.com (Ben Greear)
Date: Wed, 28 Oct 2009 15:11:40 -0700
Subject: [Xorp-hackers] OLSR:  Fix olsr/tools build problems.
Message-ID: <4AE8C19C.4040009@candelatech.com>

This patch, when layered on top of my previous OLSR related patches,
lets the olsr/tools build as expected.

Thanks,
Ben

-- 
Ben Greear <greearb at candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: olsr_tools_scons.patch
Url: http://mailman.ICSI.Berkeley.EDU/pipermail/xorp-hackers/attachments/20091028/b2629258/attachment.ksh 

From greearb at candelatech.com  Wed Oct 28 16:48:57 2009
From: greearb at candelatech.com (Ben Greear)
Date: Wed, 28 Oct 2009 16:48:57 -0700
Subject: [Xorp-hackers] [Xorp-users] Xorp installation fails on Ubuntu
In-Reply-To: <6e49b4d40910281616k703782d1v37860204e0cbf96f@mail.gmail.com>
References: <6e49b4d40910281554u4e2932f9mfcf21142e38588d4@mail.gmail.com>	
	<4AE8CF0B.9070402@candelatech.com>
	<6e49b4d40910281616k703782d1v37860204e0cbf96f@mail.gmail.com>
Message-ID: <4AE8D869.1080304@candelatech.com>

On 10/28/2009 04:16 PM, mahendra nunna wrote:
> thanks ben. I got the source (xorp-1.6.tar.gz) from
> http://www.xorp.org/downloads.html....
>
> is there a newer version than 1.6?....
>
> and strangly ...... locate xorpsh gives me no result....
>
> but i have done ./configure and make.....
>
> thanks

They have later code on sourceforge.  I just put some binaries
I compiled on Fedora up at:

http://www.candelatech.com/oss/xorp_binaries/

They might require that you install some different libraries.  There is a
xorp_install.bash script in the package that attempts to fix up some of
the library issues and create proper users, etc.  The files
are meant to be un-tarred in /usr/local

These are from our xorp tree, but should support everything that the vanilla
xorp does.

Info on our tree is at:  http://www.candelatech.com/oss/xorp-ct.html

Thanks,
Ben

-- 
Ben Greear <greearb at candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


From mnunna0 at gmail.com  Wed Oct 28 22:36:51 2009
From: mnunna0 at gmail.com (mahendra nunna)
Date: Thu, 29 Oct 2009 01:36:51 -0400
Subject: [Xorp-hackers] Using Java Native Interference with XORP
Message-ID: <6e49b4d40910282236o4138d570ka7ab087ff2e75ea9@mail.gmail.com>

hi

I want to modify the xorp. Considering the complexities involved in the
modification of native XORP code, it was proposed to use Java code on top of
XORP, Use interfaces and manage the XORP behaviour through Java code.
It could either be done as
1. Implementing the Java code and the native XORP code in the same process,
using Java Native Interface (Faster Processing)
2. or having the java code and the native XORP code run in seperate process,
using Inter Process Communication.

 is it good to do this... or should we proceed modifying the native xorp
code and compile it

Please advise us on this .... we need your opinion about this....

thanks

mahen
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ICSI.Berkeley.EDU/pipermail/xorp-hackers/attachments/20091029/2d84b961/attachment.html 

From greearb at candelatech.com  Wed Oct 28 23:17:16 2009
From: greearb at candelatech.com (Ben Greear)
Date: Wed, 28 Oct 2009 23:17:16 -0700
Subject: [Xorp-hackers] [Xorp-users] Using Java Native Interference with
	XORP
In-Reply-To: <6e49b4d40910282236o4138d570ka7ab087ff2e75ea9@mail.gmail.com>
References: <6e49b4d40910282236o4138d570ka7ab087ff2e75ea9@mail.gmail.com>
Message-ID: <4AE9336C.5090107@candelatech.com>

mahendra nunna wrote:
> hi
>
> I want to modify the xorp. Considering the complexities involved in 
> the modification of native XORP code, it was proposed to use Java code 
> on top of XORP, Use interfaces and manage the XORP behaviour through 
> Java code.
> It could either be done as
> 1. Implementing the Java code and the native XORP code in the same 
> process, using Java Native Interface (Faster Processing)
This seems like a bad idea...you'd have to understand Xorp well enough 
to bind to it, and then pay all the price of
making JNI work on top of that.
> 2. or having the java code and the native XORP code run in seperate 
> process, using Inter Process Communication.
That might work, but probably painful to integrate with XRL since I 
don't think there is any automatic
code generation for java.

I'd just copy something relatively simple (maybe rip?) and start hacking 
C++ code, but perhaps I'm biased!

Thanks,
Ben

-- 
Ben Greear <greearb at candelatech.com> 
Candela Technologies Inc  http://www.candelatech.com


From greearb at candelatech.com  Wed Oct 28 23:20:28 2009
From: greearb at candelatech.com (Ben Greear)
Date: Wed, 28 Oct 2009 23:20:28 -0700
Subject: [Xorp-hackers] [Xorp-users] Xorp installation fails on Ubuntu
In-Reply-To: <4AE8D869.1080304@candelatech.com>
References: <6e49b4d40910281554u4e2932f9mfcf21142e38588d4@mail.gmail.com>		<4AE8CF0B.9070402@candelatech.com>	<6e49b4d40910281616k703782d1v37860204e0cbf96f@mail.gmail.com>
	<4AE8D869.1080304@candelatech.com>
Message-ID: <4AE9342C.2030904@candelatech.com>

Ben Greear wrote:
> They have later code on sourceforge.  I just put some binaries
> I compiled on Fedora up at:
>
> http://www.candelatech.com/oss/xorp_binaries/
>   

I just uploaded a lanforge-xorp .deb file to that directory.  No idea if 
it actually works...will
do some testing on it tomorrow if all goes well.

Thanks,
Ben

-- 
Ben Greear <greearb at candelatech.com> 
Candela Technologies Inc  http://www.candelatech.com


From lizhaous2000 at yahoo.com  Thu Oct 29 07:54:05 2009
From: lizhaous2000 at yahoo.com (Li Zhao)
Date: Thu, 29 Oct 2009 07:54:05 -0700 (PDT)
Subject: [Xorp-hackers] rtrmgr crash on SIGABRT because of pop_front in
	task_done
Message-ID: <633162.71070.qm@web58707.mail.re1.yahoo.com>

I added a new protocol and I can start it in CLI by command "create protocol XXX", but the rtrmgr crashed after command "delete protocol XXX".
I can also easily reproduce the exactlt same crash via the following steps:

0. I am running xorp processes on an embedded system.
1. start rtrmgr from linux shell on the system;
2. manually start xorp_static_routes from linux shell. This static will hijack the xrl channels to rtrmgr;
3. use cli command "create protocol static" to start a second xorp_static_routes.
4. use cli command "delete protocol static" to stop static. both xorp_static_routes were terminated. depended process like fea, rib and policy were also terminated. rtrmgr crash.

I am attaching two stack traces. the first one is for my new protocl XXX case and the second is for the static triggered case.

Anybody has any clue? Thanks.

Li

case 1:

(gdb) tar rem 10.65.1.117:6666
Remote debugging using 10.65.1.117:6666
0x0059a850 in _start () from /lib/ld-linux.so.2
Current language:  auto; currently c
(gdb) dis b
(gdb) c
Continuing.
[New Thread 0]

Program received signal SIGABRT, Aborted.
[Switching to Thread 0]
0xb80cd424 in ?? ()
(gdb) bt
#0  0xb80cd424 in ?? ()
#1  0xbffc2624 in ?? ()
#2  0x00000006 in ?? ()
#3  0x000017fe in ?? ()
#4  0x00a71450 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#5  0x00a72e18 in abort () at abort.c:88
#6  0x00aaefdd in __libc_message (do_abort=2, 
    fmt=0xb89bc8 "*** glibc detected *** %s: %s: 0x%s ***\n")
    at ../sysdeps/unix/sysv/linux/libc_fatal.c:170
#7  0x00ab5394 in malloc_printerr (action=2, 
    str=0xb86a88 "free(): invalid pointer", ptr=0x8d55238) at malloc.c:5994
#8  0x00ab7346 in __libc_free (mem=0x8d55238) at malloc.c:3625
#9  0x05438591 in operator delete (ptr=0x0)
    at ../../../../libstdc++-v3/libsupc++/del_op.cc:49
#10 0x080a2f5f in __gnu_cxx::new_allocator<std::_List_node<Task*> >::deallocate
    (this=0x8d55238, __p=0x8d55238)
    at /usr/lib/gcc/i386-redhat-linux/4.3.2/../../../../include/c++/4.3.2/ext/new_allocator.h:98
#11 0x080a2f84 in std::_List_base<Task*, std::allocator<Task*> >::_M_put_node (
    this=0x8d55238, __p=0x8d55238)
    at /usr/lib/gcc/i386-redhat-linux/4.3.2/../../../../include/c++/4.3.2/bits/stl_list.h:318
#12 0x080a6f39 in std::list<Task*, std::allocator<Task*> >::_M_erase (
---Type <return> to continue, or q <return> to quit---
    this=0x8d55238, __position={_M_node = 0x8d55238})
    at /usr/lib/gcc/i386-redhat-linux/4.3.2/../../../../include/c++/4.3.2/bits/stl_list.h:1361
#13 0x080a6f6b in std::list<Task*, std::allocator<Task*> >::pop_front (
    this=0x8d55238)
    at /usr/lib/gcc/i386-redhat-linux/4.3.2/../../../../include/c++/4.3.2/bits/stl_list.h:861
#14 0x08098c23 in TaskManager::task_done (this=0x8d55210, success=true, errmsg=
        {static npos = 4294967295, _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, _M_p = 0x546ccd4 ""}}) at task.cc:2251
#15 0x080a5911 in XorpMemberCallback2B0<void, TaskManager, bool, std::string>::dispatch (this=0x8d60228, a1=true, a2=
        {static npos = 4294967295, _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, _M_p = 0x546ccd4 ""}}) at ../libxorp/callback_nodebug.hh:4636
#16 0x08095bd1 in Task::step8_report (this=0x8d60460) at task.cc:1993
#17 0x080a22e7 in XorpMemberCallback0B0<void, Task>::dispatch (this=0x8d5fd90)
    at ../libxorp/callback_nodebug.hh:306
#18 0x0808b2c1 in Module::terminate_with_prejudice (this=0x8d58450, cb=
      {_M_ptr = 0x8d5fd90, _M_index = 110}) at module_manager.cc:218
#19 0x0808f36e in XorpMemberCallback0B1<void, Module, ref_ptr<XorpCallback0<void> > >::dispatch (this=0x8d60938) at ../libxorp/callback_nodebug.hh:598
---Type <return> to continue, or q <return> to quit---
#20 0x081af7da in OneoffTimerNode2::expire (this=0x8d5ff28) at timer.cc:167
#21 0x081ae8ed in TimerList::expire_one (this=0xbffcce4c, worst_priority=4)
    at timer.cc:441
#22 0x081aea48 in TimerList::run (this=0xbffcce4c) at timer.cc:389
#23 0x08198564 in EventLoop::do_work (this=0xbffcce48, can_block=true)
    at eventloop.cc:153
#24 0x08198828 in EventLoop::run (this=0xbffcce48) at eventloop.cc:99
#25 0x080682df in Rtrmgr::run (this=0xbffcd4b4) at main_rtrmgr.cc:418
#26 0x08069432 in main (argc=6, argv=0xbffcd5c4) at main_rtrmgr.cc:725
(gdb) 


Case 2:

Program received signal SIGABRT, Aborted.
[Switching to Thread 0]
0xb80db424 in ?? ()
(gdb) bt
#0  0xb80db424 in ?? ()
#1  0xbffceeb4 in ?? ()
#2  0x00000006 in ?? ()
#3  0x00001802 in ?? ()
#4  0x00a71450 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#5  0x00a72e18 in abort () at abort.c:88
#6  0x00aaefdd in __libc_message (do_abort=2, 
    fmt=0xb89bc8 "*** glibc detected *** %s: %s: 0x%s ***\n")
    at ../sysdeps/unix/sysv/linux/libc_fatal.c:170
#7  0x00ab5394 in malloc_printerr (action=2, 
    str=0xb89bf4 "munmap_chunk(): invalid pointer", ptr=0x93ed238)
    at malloc.c:5994
#8  0x05438591 in operator delete (ptr=0x0)
    at ../../../../libstdc++-v3/libsupc++/del_op.cc:49
#9  0x080a2f5f in __gnu_cxx::new_allocator<std::_List_node<Task*> >::deallocate
    (this=0x93ed238, __p=0x93ed238)
    at /usr/lib/gcc/i386-redhat-linux/4.3.2/../../../../include/c++/4.3.2/ext/new_allocator.h:98
#10 0x080a2f84 in std::_List_base<Task*, std::allocator<Task*> >::_M_put_node (
    this=0x93ed238, __p=0x93ed238)
    at /usr/lib/gcc/i386-redhat-linux/4.3.2/../../../../include/c++/4.3.2/bits/stl_list.h:318
#11 0x080a6f39 in std::list<Task*, std::allocator<Task*> >::_M_erase (
---Type <return> to continue, or q <return> to quit---
    this=0x93ed238, __position={_M_node = 0x93ed238})
    at /usr/lib/gcc/i386-redhat-linux/4.3.2/../../../../include/c++/4.3.2/bits/stl_list.h:1361
#12 0x080a6f6b in std::list<Task*, std::allocator<Task*> >::pop_front (
    this=0x93ed238)
    at /usr/lib/gcc/i386-redhat-linux/4.3.2/../../../../include/c++/4.3.2/bits/stl_list.h:861
#13 0x08098c23 in TaskManager::task_done (this=0x93ed210, success=true, errmsg=
        {static npos = 4294967295, _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, _M_p = 0x546ccd4 ""}}) at task.cc:2251
#14 0x080a5911 in XorpMemberCallback2B0<void, TaskManager, bool, std::string>::dispatch (this=0x93f4e80, a1=true, a2=
        {static npos = 4294967295, _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, _M_p = 0x546ccd4 ""}}) at ../libxorp/callback_nodebug.hh:4636
#15 0x08095bd1 in Task::step8_report (this=0x93f3c78) at task.cc:1993
#16 0x080a22e7 in XorpMemberCallback0B0<void, Task>::dispatch (this=0x93f4ba0)
    at ../libxorp/callback_nodebug.hh:306
#17 0x0808b64b in Module::terminate (this=0x93f39a0, cb=
      {_M_ptr = 0x93f4ba0, _M_index = 284}) at module_manager.cc:166
#18 0x0808c0a5 in ModuleManager::kill_module (this=0xbffdbb68, 
    module_name=@0x93f3c80, cb={_M_ptr = 0x93f4ba0, _M_index = 284})
---Type <return> to continue, or q <return> to quit---
    at module_manager.cc:472
#19 0x08093e38 in Task::step7_kill (this=0x93f3c78) at task.cc:1983
#20 0x080a22e7 in XorpMemberCallback0B0<void, Task>::dispatch (this=0x93f3910)
    at ../libxorp/callback_nodebug.hh:306
#21 0x081af7da in OneoffTimerNode2::expire (this=0x942f198) at timer.cc:167
#22 0x081ae8ed in TimerList::expire_one (this=0xbffdb65c, worst_priority=4)
    at timer.cc:441
#23 0x081aea48 in TimerList::run (this=0xbffdb65c) at timer.cc:389
#24 0x08198564 in EventLoop::do_work (this=0xbffdb658, can_block=true)
    at eventloop.cc:153
#25 0x08198828 in EventLoop::run (this=0xbffdb658) at eventloop.cc:99
#26 0x080682df in Rtrmgr::run (this=0xbffdbcc4) at main_rtrmgr.cc:418
#27 0x08069432 in main (argc=6, argv=0xbffdbdd4) at main_rtrmgr.cc:725
(gdb) c
Continuing.

Program terminated with signal SIGABRT, Aborted.
The program no longer exists.


From lizhaous2000 at yahoo.com  Thu Oct 29 08:16:32 2009
From: lizhaous2000 at yahoo.com (Li Zhao)
Date: Thu, 29 Oct 2009 08:16:32 -0700 (PDT)
Subject: [Xorp-hackers] rtrmgr crash on SIGABRT because of pop_front in
	task_done
In-Reply-To: <633162.71070.qm@web58707.mail.re1.yahoo.com>
Message-ID: <89697.2773.qm@web58705.mail.re1.yahoo.com>

I am puzzled by operator delete(prt=0x0). But inside deallocate(this=0x8d55238, __p=0x8d55238), the __p is not 0x0. pop_front means "removes and deletes". So somewhere else this list node was deleted again?

--- On Thu, 10/29/09, Li Zhao <lizhaous2000 at yahoo.com> wrote:

> From: Li Zhao <lizhaous2000 at yahoo.com>
> Subject: [Xorp-hackers] rtrmgr crash on SIGABRT because of pop_front in task_done
> To: xorp-hackers at icir.org
> Date: Thursday, October 29, 2009, 10:54 AM
> I added a new protocol and I can
> start it in CLI by command "create protocol XXX", but the
> rtrmgr crashed after command "delete protocol XXX".
> I can also easily reproduce the exactlt same crash via the
> following steps:
> 
> 0. I am running xorp processes on an embedded system.
> 1. start rtrmgr from linux shell on the system;
> 2. manually start xorp_static_routes from linux shell. This
> static will hijack the xrl channels to rtrmgr;
> 3. use cli command "create protocol static" to start a
> second xorp_static_routes.
> 4. use cli command "delete protocol static" to stop static.
> both xorp_static_routes were terminated. depended process
> like fea, rib and policy were also terminated. rtrmgr
> crash.
> 
> I am attaching two stack traces. the first one is for my
> new protocl XXX case and the second is for the static
> triggered case.
> 
> Anybody has any clue? Thanks.
> 
> Li
> 
> case 1:
> 
> (gdb) tar rem 10.65.1.117:6666
> Remote debugging using 10.65.1.117:6666
> 0x0059a850 in _start () from /lib/ld-linux.so.2
> Current language:? auto; currently c
> (gdb) dis b
> (gdb) c
> Continuing.
> [New Thread 0]
> 
> Program received signal SIGABRT, Aborted.
> [Switching to Thread 0]
> 0xb80cd424 in ?? ()
> (gdb) bt
> #0? 0xb80cd424 in ?? ()
> #1? 0xbffc2624 in ?? ()
> #2? 0x00000006 in ?? ()
> #3? 0x000017fe in ?? ()
> #4? 0x00a71450 in raise (sig=6) at
> ../nptl/sysdeps/unix/sysv/linux/raise.c:64
> #5? 0x00a72e18 in abort () at abort.c:88
> #6? 0x00aaefdd in __libc_message (do_abort=2, 
> ? ? fmt=0xb89bc8 "*** glibc detected *** %s: %s:
> 0x%s ***\n")
> ? ? at
> ../sysdeps/unix/sysv/linux/libc_fatal.c:170
> #7? 0x00ab5394 in malloc_printerr (action=2, 
> ? ? str=0xb86a88 "free(): invalid pointer",
> ptr=0x8d55238) at malloc.c:5994
> #8? 0x00ab7346 in __libc_free (mem=0x8d55238) at
> malloc.c:3625
> #9? 0x05438591 in operator delete (ptr=0x0)
> ? ? at
> ../../../../libstdc++-v3/libsupc++/del_op.cc:49
> #10 0x080a2f5f in
> __gnu_cxx::new_allocator<std::_List_node<Task*>
> >::deallocate
> ? ? (this=0x8d55238, __p=0x8d55238)
> ? ? at
> /usr/lib/gcc/i386-redhat-linux/4.3.2/../../../../include/c++/4.3.2/ext/new_allocator.h:98
> #11 0x080a2f84 in std::_List_base<Task*,
> std::allocator<Task*> >::_M_put_node (
> ? ? this=0x8d55238, __p=0x8d55238)
> ? ? at
> /usr/lib/gcc/i386-redhat-linux/4.3.2/../../../../include/c++/4.3.2/bits/stl_list.h:318
> #12 0x080a6f39 in std::list<Task*,
> std::allocator<Task*> >::_M_erase (
> ---Type <return> to continue, or q <return> to
> quit---
> ? ? this=0x8d55238, __position={_M_node =
> 0x8d55238})
> ? ? at
> /usr/lib/gcc/i386-redhat-linux/4.3.2/../../../../include/c++/4.3.2/bits/stl_list.h:1361
> #13 0x080a6f6b in std::list<Task*,
> std::allocator<Task*> >::pop_front (
> ? ? this=0x8d55238)
> ? ? at
> /usr/lib/gcc/i386-redhat-linux/4.3.2/../../../../include/c++/4.3.2/bits/stl_list.h:861
> #14 0x08098c23 in TaskManager::task_done (this=0x8d55210,
> success=true, errmsg=
> ? ? ? ? {static npos = 4294967295,
> _M_dataplus = {<std::allocator<char>> =
> {<__gnu_cxx::new_allocator<char>> = {<No data
> fields>}, <No data fields>}, _M_p = 0x546ccd4 ""}})
> at task.cc:2251
> #15 0x080a5911 in XorpMemberCallback2B0<void,
> TaskManager, bool, std::string>::dispatch
> (this=0x8d60228, a1=true, a2=
> ? ? ? ? {static npos = 4294967295,
> _M_dataplus = {<std::allocator<char>> =
> {<__gnu_cxx::new_allocator<char>> = {<No data
> fields>}, <No data fields>}, _M_p = 0x546ccd4 ""}})
> at ../libxorp/callback_nodebug.hh:4636
> #16 0x08095bd1 in Task::step8_report (this=0x8d60460) at
> task.cc:1993
> #17 0x080a22e7 in XorpMemberCallback0B0<void,
> Task>::dispatch (this=0x8d5fd90)
> ? ? at ../libxorp/callback_nodebug.hh:306
> #18 0x0808b2c1 in Module::terminate_with_prejudice
> (this=0x8d58450, cb=
> ? ? ? {_M_ptr = 0x8d5fd90, _M_index = 110})
> at module_manager.cc:218
> #19 0x0808f36e in XorpMemberCallback0B1<void, Module,
> ref_ptr<XorpCallback0<void> > >::dispatch
> (this=0x8d60938) at ../libxorp/callback_nodebug.hh:598
> ---Type <return> to continue, or q <return> to
> quit---
> #20 0x081af7da in OneoffTimerNode2::expire (this=0x8d5ff28)
> at timer.cc:167
> #21 0x081ae8ed in TimerList::expire_one (this=0xbffcce4c,
> worst_priority=4)
> ? ? at timer.cc:441
> #22 0x081aea48 in TimerList::run (this=0xbffcce4c) at
> timer.cc:389
> #23 0x08198564 in EventLoop::do_work (this=0xbffcce48,
> can_block=true)
> ? ? at eventloop.cc:153
> #24 0x08198828 in EventLoop::run (this=0xbffcce48) at
> eventloop.cc:99
> #25 0x080682df in Rtrmgr::run (this=0xbffcd4b4) at
> main_rtrmgr.cc:418
> #26 0x08069432 in main (argc=6, argv=0xbffcd5c4) at
> main_rtrmgr.cc:725
> (gdb) 
> 
> 
> Case 2:
> 
> Program received signal SIGABRT, Aborted.
> [Switching to Thread 0]
> 0xb80db424 in ?? ()
> (gdb) bt
> #0? 0xb80db424 in ?? ()
> #1? 0xbffceeb4 in ?? ()
> #2? 0x00000006 in ?? ()
> #3? 0x00001802 in ?? ()
> #4? 0x00a71450 in raise (sig=6) at
> ../nptl/sysdeps/unix/sysv/linux/raise.c:64
> #5? 0x00a72e18 in abort () at abort.c:88
> #6? 0x00aaefdd in __libc_message (do_abort=2, 
> ? ? fmt=0xb89bc8 "*** glibc detected *** %s: %s:
> 0x%s ***\n")
> ? ? at
> ../sysdeps/unix/sysv/linux/libc_fatal.c:170
> #7? 0x00ab5394 in malloc_printerr (action=2, 
> ? ? str=0xb89bf4 "munmap_chunk(): invalid
> pointer", ptr=0x93ed238)
> ? ? at malloc.c:5994
> #8? 0x05438591 in operator delete (ptr=0x0)
> ? ? at
> ../../../../libstdc++-v3/libsupc++/del_op.cc:49
> #9? 0x080a2f5f in
> __gnu_cxx::new_allocator<std::_List_node<Task*>
> >::deallocate
> ? ? (this=0x93ed238, __p=0x93ed238)
> ? ? at
> /usr/lib/gcc/i386-redhat-linux/4.3.2/../../../../include/c++/4.3.2/ext/new_allocator.h:98
> #10 0x080a2f84 in std::_List_base<Task*,
> std::allocator<Task*> >::_M_put_node (
> ? ? this=0x93ed238, __p=0x93ed238)
> ? ? at
> /usr/lib/gcc/i386-redhat-linux/4.3.2/../../../../include/c++/4.3.2/bits/stl_list.h:318
> #11 0x080a6f39 in std::list<Task*,
> std::allocator<Task*> >::_M_erase (
> ---Type <return> to continue, or q <return> to
> quit---
> ? ? this=0x93ed238, __position={_M_node =
> 0x93ed238})
> ? ? at
> /usr/lib/gcc/i386-redhat-linux/4.3.2/../../../../include/c++/4.3.2/bits/stl_list.h:1361
> #12 0x080a6f6b in std::list<Task*,
> std::allocator<Task*> >::pop_front (
> ? ? this=0x93ed238)
> ? ? at
> /usr/lib/gcc/i386-redhat-linux/4.3.2/../../../../include/c++/4.3.2/bits/stl_list.h:861
> #13 0x08098c23 in TaskManager::task_done (this=0x93ed210,
> success=true, errmsg=
> ? ? ? ? {static npos = 4294967295,
> _M_dataplus = {<std::allocator<char>> =
> {<__gnu_cxx::new_allocator<char>> = {<No data
> fields>}, <No data fields>}, _M_p = 0x546ccd4 ""}})
> at task.cc:2251
> #14 0x080a5911 in XorpMemberCallback2B0<void,
> TaskManager, bool, std::string>::dispatch
> (this=0x93f4e80, a1=true, a2=
> ? ? ? ? {static npos = 4294967295,
> _M_dataplus = {<std::allocator<char>> =
> {<__gnu_cxx::new_allocator<char>> = {<No data
> fields>}, <No data fields>}, _M_p = 0x546ccd4 ""}})
> at ../libxorp/callback_nodebug.hh:4636
> #15 0x08095bd1 in Task::step8_report (this=0x93f3c78) at
> task.cc:1993
> #16 0x080a22e7 in XorpMemberCallback0B0<void,
> Task>::dispatch (this=0x93f4ba0)
> ? ? at ../libxorp/callback_nodebug.hh:306
> #17 0x0808b64b in Module::terminate (this=0x93f39a0, cb=
> ? ? ? {_M_ptr = 0x93f4ba0, _M_index = 284})
> at module_manager.cc:166
> #18 0x0808c0a5 in ModuleManager::kill_module
> (this=0xbffdbb68, 
> ? ? module_name=@0x93f3c80, cb={_M_ptr =
> 0x93f4ba0, _M_index = 284})
> ---Type <return> to continue, or q <return> to
> quit---
> ? ? at module_manager.cc:472
> #19 0x08093e38 in Task::step7_kill (this=0x93f3c78) at
> task.cc:1983
> #20 0x080a22e7 in XorpMemberCallback0B0<void,
> Task>::dispatch (this=0x93f3910)
> ? ? at ../libxorp/callback_nodebug.hh:306
> #21 0x081af7da in OneoffTimerNode2::expire (this=0x942f198)
> at timer.cc:167
> #22 0x081ae8ed in TimerList::expire_one (this=0xbffdb65c,
> worst_priority=4)
> ? ? at timer.cc:441
> #23 0x081aea48 in TimerList::run (this=0xbffdb65c) at
> timer.cc:389
> #24 0x08198564 in EventLoop::do_work (this=0xbffdb658,
> can_block=true)
> ? ? at eventloop.cc:153
> #25 0x08198828 in EventLoop::run (this=0xbffdb658) at
> eventloop.cc:99
> #26 0x080682df in Rtrmgr::run (this=0xbffdbcc4) at
> main_rtrmgr.cc:418
> #27 0x08069432 in main (argc=6, argv=0xbffdbdd4) at
> main_rtrmgr.cc:725
> (gdb) c
> Continuing.
> 
> Program terminated with signal SIGABRT, Aborted.
> The program no longer exists.
> 
> 
> 
> ? ? ? 
> 
> _______________________________________________
> Xorp-hackers mailing list
> Xorp-hackers at icir.org
> http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/xorp-hackers
> 


From bms at incunabulum.net  Thu Oct 29 08:30:29 2009
From: bms at incunabulum.net (Bruce Simpson)
Date: Thu, 29 Oct 2009 15:30:29 +0000
Subject: [Xorp-hackers] [Xorp-users] Using Java Native Interference with
	XORP
In-Reply-To: <6e49b4d40910282236o4138d570ka7ab087ff2e75ea9@mail.gmail.com>
References: <6e49b4d40910282236o4138d570ka7ab087ff2e75ea9@mail.gmail.com>
Message-ID: <4AE9B515.1060901@incunabulum.net>

mahendra nunna wrote:
> ...
> 1. Implementing the Java code and the native XORP code in the same 
> process, using Java Native Interface (Faster Processing)

Regardless of JNI, cross-language interop isn't happening until the 
Thrift drop of XORP is done. I am edging closer to this goal.

cheers,
BMS


From bms at incunabulum.net  Thu Oct 29 08:42:09 2009
From: bms at incunabulum.net (Bruce Simpson)
Date: Thu, 29 Oct 2009 15:42:09 +0000
Subject: [Xorp-hackers] Crash due to stale cached xrl sender pointer.
In-Reply-To: <4AE7D31B.1040107@candelatech.com>
References: <4AE799E1.8010300@candelatech.com>
	<4AE7D31B.1040107@candelatech.com>
Message-ID: <4AE9B7D1.803@incunabulum.net>

Ben Greear wrote:
> The attached patch seems to fix the problem.

Thanks for the patch, and the analysis.

This seems to introduce a ref_ptr -- a class I'm not 100% happy about. 
Are you sure that this patch does not leak any memory?

Passing a ref_ptr around is bad, because every time it crosses a C++ 
scope boundary, the refcount is bumped -- Boost at least has a weak_ptr 
and a shared_ptr, which cleanly separates the smart pointer semantics 
between 'I am passing this around' and 'I am sharing ownership of the 
pointed-to object'.

Is there a simpler workaround possible for the issue? I'd rather not get 
too deep into reviewing a patch which cuts fairly deep into internals 
which are probably about to get rewritten.

thanks,
BMS


From greearb at candelatech.com  Thu Oct 29 08:55:17 2009
From: greearb at candelatech.com (Ben Greear)
Date: Thu, 29 Oct 2009 08:55:17 -0700
Subject: [Xorp-hackers] Crash due to stale cached xrl sender pointer.
In-Reply-To: <4AE9B7D1.803@incunabulum.net>
References: <4AE799E1.8010300@candelatech.com>
	<4AE7D31B.1040107@candelatech.com> <4AE9B7D1.803@incunabulum.net>
Message-ID: <4AE9BAE5.20406@candelatech.com>

Bruce Simpson wrote:
> Ben Greear wrote:
>> The attached patch seems to fix the problem.
>
> Thanks for the patch, and the analysis.
>
> This seems to introduce a ref_ptr -- a class I'm not 100% happy about. 
> Are you sure that this patch does not leak any memory?
If it does, then xorp leaks memory everywhere it uses this ref_ptr.  It 
does stop the crash...I haven't run
valgrind on it lately, but if ref_ptr was broken, earlier valgrind runs 
should have seen it.
>
> Passing a ref_ptr around is bad, because every time it crosses a C++ 
> scope boundary, the refcount is bumped -- Boost at least has a 
> weak_ptr and a shared_ptr, which cleanly separates the smart pointer 
> semantics between 'I am passing this around' and 'I am sharing 
> ownership of the pointed-to object'.
That's why I pass by reference...keeps ref counts from changing needlessly.
Either way, a bit of addition and subtraction is cheap..not like we're 
doing millions of
xrls a second!
>
> Is there a simpler workaround possible for the issue? I'd rather not 
> get too deep into reviewing a patch which cuts fairly deep into 
> internals which are probably about to get rewritten.
I doubt it...don't know where all the xrls are stored..would have to 
search all of them and clean out any with
pointers to the sender that is to be deleted.

In general, I dislike smart pointers, but in this case, they seem tailor 
made for the problem.

Thanks,
Ben

-- 
Ben Greear <greearb at candelatech.com> 
Candela Technologies Inc  http://www.candelatech.com


From greearb at candelatech.com  Thu Oct 29 09:53:44 2009
From: greearb at candelatech.com (Ben Greear)
Date: Thu, 29 Oct 2009 09:53:44 -0700
Subject: [Xorp-hackers] rtrmgr crash on SIGABRT because of pop_front in
 task_done
In-Reply-To: <89697.2773.qm@web58705.mail.re1.yahoo.com>
References: <89697.2773.qm@web58705.mail.re1.yahoo.com>
Message-ID: <4AE9C898.9070100@candelatech.com>

On 10/29/2009 08:16 AM, Li Zhao wrote:
> I am puzzled by operator delete(prt=0x0). But inside deallocate(this=0x8d55238, __p=0x8d55238), the __p is not 0x0. pop_front means "removes and deletes". So somewhere else this list node was deleted again?
>
> --- On Thu, 10/29/09, Li Zhao<lizhaous2000 at yahoo.com>  wrote:
>
>> From: Li Zhao<lizhaous2000 at yahoo.com>
>> Subject: [Xorp-hackers] rtrmgr crash on SIGABRT because of pop_front in task_done
>> To: xorp-hackers at icir.org
>> Date: Thursday, October 29, 2009, 10:54 AM
>> I added a new protocol and I can
>> start it in CLI by command "create protocol XXX", but the
>> rtrmgr crashed after command "delete protocol XXX".
>> I can also easily reproduce the exactlt same crash via the
>> following steps:
>>
>> 0. I am running xorp processes on an embedded system.
>> 1. start rtrmgr from linux shell on the system;
>> 2. manually start xorp_static_routes from linux shell. This
>> static will hijack the xrl channels to rtrmgr;
>> 3. use cli command "create protocol static" to start a
>> second xorp_static_routes.
>> 4. use cli command "delete protocol static" to stop static.
>> both xorp_static_routes were terminated. depended process
>> like fea, rib and policy were also terminated. rtrmgr
>> crash.

I can reproduce it here..will take a quick look to see if
I can figure it out.

Thanks,
Ben


-- 
Ben Greear <greearb at candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


From bms at incunabulum.net  Thu Oct 29 10:10:03 2009
From: bms at incunabulum.net (Bruce Simpson)
Date: Thu, 29 Oct 2009 17:10:03 +0000
Subject: [Xorp-hackers] Crash due to stale cached xrl sender pointer.
In-Reply-To: <4AE9BAE5.20406@candelatech.com>
References: <4AE799E1.8010300@candelatech.com>
	<4AE7D31B.1040107@candelatech.com>
	<4AE9B7D1.803@incunabulum.net> <4AE9BAE5.20406@candelatech.com>
Message-ID: <4AE9CC6B.9050302@incunabulum.net>

Hi Ben,

    Not really meant to be spending time on this at the moment, but I 
shall... it is not too far off from what I'm actually doing, and I think 
we probably do need to go over this again ground a bit, given that I am 
effectively rewriting the affected code right now.

Ben Greear wrote:
>
> In general, I dislike smart pointers, but in this case, they seem 
> tailor made for the problem.

    I would disagree that smart pointers are even necessarily the right 
answer to the issue you've found; in some cases, they can do more harm 
than good.

    I got the electronic equivalent of dirty looks, at first, when I had 
to work around a problem in template class Spt using ref_ptr<..>&. 
However, that was an existing, isolated use of ref_ptr within the tree. 
I'd prefer not to use ref_ptr for new code if at all possible.

    XrlPFSender is a stereotype of an object which should not be created 
and destroyed trivially. If something in libxipc is tripping over it, it 
is possibly a race condition, or updates not being propagated elsewhere.

    In this scenario, the Xrl blob contains a cached pointer to a 
transport channel (XrlPFSender) which has now potentially gone away. 
Given that Xrl instances are like confetti, it would be difficult to 
track them all, and I'm not sure a refcount is the most appropriate way 
to deal with that (see below).

It's a little trickier in this scenario because of how class Xrl is 
treated in the code base.

>>
>> Is there a simpler workaround possible for the issue? I'd rather not 
>> get too deep into reviewing a patch which cuts fairly deep into 
>> internals which are probably about to get rewritten.
> I doubt it...don't know where all the xrls are stored..would have to 
> search all of them and clean out any with
> pointers to the sender that is to be deleted.

There are several layers of indirection and caching going on in the XRL 
layer, but the important ones here are:-
  1) the cached FinderDBEntry used to hold the previous results of an 
indirect XRL call through the Finder. The most fundamental cache 
mechanism in libxipc.
  2) the cached resolved_sender() pointer in Xrl  -- what we're 
interested in here.

The Xrl instance involved, based on your backtrace, seems to be 
allocated by XrlRawPacket4V0p1Client::send_register_receiver(), and held 
in a statically declared pointer.

When an XRL is to be sent through the C++ bindings, it will call back 
into XrlRouter to see if there is a cached XrlPFSender for the given 
XRL. The lookup is done w/o arguments.

One glaring blot on the landscape beckons this question:
 * Are any processes sharing the segment that this 'static Xrl*' pointer 
happens to be in? The pointer looks like it should be in BSS and thus 
subject to copy-on-write, so this should not be an issue. However, if 
multiple entities in the same process are calling the C++ bindings, this 
COULD be a reentrancy issue.

If the Finder learns that an XRL target has gone away, it should blow 
away the FinderClient cache entries, and then the XrlRouter::send() 
method should notice this.

    Unfortunately, this may not help in the failure case we're 
examining. If this check is raced by the XRL being withdrawn and later 
re-advertised by its target (e.g. its host process got restarted), then 
obviously the cached XrlPFSender is going to be invalid in the XRL.

    It's not 100% impossible that this notification has been raced. If 
the code in your OSPF process which wants to send the XRL, is running 
from a timer callback, and this callback happens to collide with the 
FinderClient learning about the XRL target moving somewhere else in the 
system -- then where the XRL data is going to get sent, will be affected 
by which point in time it races the FinderClient cache update.

    In many ways, the fact that the problem exists, is an artefact of 
how method call resolution is working in the XRL RPC layer; it is 
per-method rather than per-service, and this is really one of the things 
I'm trying to address through the Thrift rewrite.

    What the code is trying to do, is to cache the transport pointer 
right next to the outgoing data. In principle this would be fine, were 
it not for the fact that the transport can go away for a variety of 
reasons. XrlPFSender has no knowledge of Xrl referencing it, and no 
meaningful way to convey the failure mode to Xrl. It's really 
XrlRouter's role to deal with this.

    In the situation above, even if we held a ref_ptr on an XrlPFSender, 
we wouldn't even know if the underlying network transport is still 
valid. The "right thing" to do would be to force the inner Xrl's cached 
resolved_sender pointer to be invalidated -- or validate the pointer 
upfront when it's used. Again, this is really XrlRouter's responsibility.

    It's possible for the Xrl's target to be known, and its XRL method 
resolved, but its destination transport still unresolved, which is what 
XrlRouter::get_sender() is trying to deal with.

    For what it's worth, class Xrl largely exists because XORP RPC calls 
can't be expressed as simple binary blobs. There are *two* RPC protocols 
running in tandem inside libxipc, and one of them is textual. To my 
mind, Xrl should be a more lightweight class than it actually is.

    Caching the transport pointer (XrlPFSender) in the Xrl itself is 
just asking for trouble in situations like this, given that we have no 
means of telling the Xrl 'your resolved sender has gone away' -- it's 
buried in libfooxif.so's copy-on-write BSS segment.

    It's a non-trivial issue to fix. Using the ref_ptr seems deceptively 
simple -- we get a handle on the transport, and so the code doesn't blow 
up, but we probably don't fix the underlying issue (unless I'm missing 
something).

    Something interesting to try might be to modify clnt-gen to do a few 
things in the client shims:
%%%
return _sender->send(*x, callback(....));
%%%
to become
%%%
bool retval = _sender->send(*x, callback(....));
x->set_resolved(false);
return retval;
%%%

    This, however, doesn't fix the root problem either - it just makes 
it possible to work around the issue without changing the allocation 
semantics for XrlPFSender, by deprecating one of the cache mechanisms in 
libxipc.
    The current cache mechanism is fubar, because it apparently can't 
deal with something in the XrlPFSender life cycle which causes it to be 
deleted.

In summary:
    I strongly believe that what you're actually seeing is a race which 
class Xrl is not able to defend itself against... because the 
responsibility for it belongs in class XrlRouter.

    It would be good to get a handle first on who introduced the 
secondary caching mechanism, and why. Most likely this was to avoid any 
STL container traversals when an XRL is actually being sent, but given 
that you've probably run into a race which blows this mechanism up, it 
needs revisiting. (Yes, this has fallen under the axe in the Thrift 
branch...!)

cheers,
BMS


From greearb at candelatech.com  Thu Oct 29 10:26:49 2009
From: greearb at candelatech.com (Ben Greear)
Date: Thu, 29 Oct 2009 10:26:49 -0700
Subject: [Xorp-hackers] rtrmgr crash on SIGABRT because of pop_front in
 task_done
In-Reply-To: <89697.2773.qm@web58705.mail.re1.yahoo.com>
References: <89697.2773.qm@web58705.mail.re1.yahoo.com>
Message-ID: <4AE9D059.20102@candelatech.com>

On 10/29/2009 08:16 AM, Li Zhao wrote:
> I am puzzled by operator delete(prt=0x0). But inside deallocate(this=0x8d55238, __p=0x8d55238), the __p is not 0x0. pop_front means "removes and deletes". So somewhere else this list node was deleted again?
>
> --- On Thu, 10/29/09, Li Zhao<lizhaous2000 at yahoo.com>  wrote:
>
>> From: Li Zhao<lizhaous2000 at yahoo.com>
>> Subject: [Xorp-hackers] rtrmgr crash on SIGABRT because of pop_front in task_done
>> To: xorp-hackers at icir.org
>> Date: Thursday, October 29, 2009, 10:54 AM
>> I added a new protocol and I can
>> start it in CLI by command "create protocol XXX", but the
>> rtrmgr crashed after command "delete protocol XXX".
>> I can also easily reproduce the exactlt same crash via the
>> following steps:
>>
>> 0. I am running xorp processes on an embedded system.
>> 1. start rtrmgr from linux shell on the system;
>> 2. manually start xorp_static_routes from linux shell. This
>> static will hijack the xrl channels to rtrmgr;
>> 3. use cli command "create protocol static" to start a
>> second xorp_static_routes.
>> 4. use cli command "delete protocol static" to stop static.
>> both xorp_static_routes were terminated. depended process
>> like fea, rib and policy were also terminated. rtrmgr
>> crash.

I ran under valgrind, and saw this info:

==27820== Invalid free() / delete / delete[]
==27820==    at 0x4A05E3F: operator delete(void*) (vg_replace_malloc.c:342)
==27820==    by 0x463531: __gnu_cxx::new_allocator<std::_List_node<Task*> >::deallocate(std::_List_node<Task*>*, unsigned long) (new_a
llocator.h:95)
==27820==    by 0x462427: std::_List_base<Task*, std::allocator<Task*> >::_M_put_node(std::_List_node<Task*>*) (stl_list.h:320)
==27820==    by 0x46143B: std::list<Task*, std::allocator<Task*> >::_M_erase(std::_List_iterator<Task*>) (stl_list.h:1431)
==27820==    by 0x45FF0B: std::list<Task*, std::allocator<Task*> >::pop_front() (stl_list.h:906)
==27820==    by 0x45DB73: TaskManager::task_done(bool, std::string const&) (task.cc:2256)
==27820==    by 0x465970: XorpMemberCallback2B0<void, TaskManager, bool, std::string const&>::dispatch(bool, std::string const&) (call
back_nodebug.hh:4636)
==27820==    by 0x45C540: Task::step8_report() (task.cc:1998)
==27820==    by 0x4659DF: XorpMemberCallback0B0<void, Task>::dispatch() (callback_nodebug.hh:306)
==27820==    by 0x449613: Module::terminate_with_prejudice(ref_ptr<XorpCallback0<void> >) (module_manager.cc:218)
==27820==    by 0x44F63C: XorpMemberCallback0B1<void, Module, ref_ptr<XorpCallback0<void> > >::dispatch() (callback_nodebug.hh:598)
==27820==    by 0x549D72: OneoffTimerNode2::expire(XorpTimer&, void*) (timer.cc:167)
==27820==  Address 0x50c9340 is 80 bytes inside a block of size 200 alloc'd
==27820==    at 0x4A06FFC: operator new(unsigned long) (vg_replace_malloc.c:230)
==27820==    by 0x42C81F: MasterConfigTree::MasterConfigTree(std::string const&, MasterTemplateTree*, ModuleManager&, XorpClient&, boo
l, bool) (master_conf_tree.cc:119)
==27820==    by 0x406ED6: Rtrmgr::run() (main_rtrmgr.cc:319)
==27820==    by 0x407E57: main (main_rtrmgr.cc:665)


It appears to me that the task-manager object (this) is already deleted when
the taskmanager::task_done() method is called.

Could probably add some debugging to the destructors and constructors of TaskManager
to verify.  I have some other things to do first..but will look at this a bit later
if no one beats me to it.

Thanks,
Ben

-- 
Ben Greear <greearb at candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


From greearb at candelatech.com  Thu Oct 29 10:43:55 2009
From: greearb at candelatech.com (Ben Greear)
Date: Thu, 29 Oct 2009 10:43:55 -0700
Subject: [Xorp-hackers] Crash due to stale cached xrl sender pointer.
In-Reply-To: <4AE9CC6B.9050302@incunabulum.net>
References: <4AE799E1.8010300@candelatech.com>
	<4AE7D31B.1040107@candelatech.com>
	<4AE9B7D1.803@incunabulum.net> <4AE9BAE5.20406@candelatech.com>
	<4AE9CC6B.9050302@incunabulum.net>
Message-ID: <4AE9D45B.1090003@candelatech.com>

On 10/29/2009 10:10 AM, Bruce Simpson wrote:
> Hi Ben,
>
> Not really meant to be spending time on this at the moment, but I
> shall... it is not too far off from what I'm actually doing, and I think
> we probably do need to go over this again ground a bit, given that I am
> effectively rewriting the affected code right now.
>
> Ben Greear wrote:
>>
>> In general, I dislike smart pointers, but in this case, they seem
>> tailor made for the problem.
>
> I would disagree that smart pointers are even necessarily the right
> answer to the issue you've found; in some cases, they can do more harm
> than good.
>
> I got the electronic equivalent of dirty looks, at first, when I had to
> work around a problem in template class Spt using ref_ptr<..>&. However,
> that was an existing, isolated use of ref_ptr within the tree. I'd
> prefer not to use ref_ptr for new code if at all possible.
>
> XrlPFSender is a stereotype of an object which should not be created and
> destroyed trivially. If something in libxipc is tripping over it, it is
> possibly a race condition, or updates not being propagated elsewhere.
>
> In this scenario, the Xrl blob contains a cached pointer to a transport
> channel (XrlPFSender) which has now potentially gone away. Given that
> Xrl instances are like confetti, it would be difficult to track them
> all, and I'm not sure a refcount is the most appropriate way to deal
> with that (see below).

The refcount just keeps the sender object from being destroyed until
all xrls referencing it are cleaned up.  The sender was probably destroyed
because it timed out (I was starting 100 virtual router processes...loads
the system very heavy).

Please note that the sender will be marked in-active, so the XRL will not actually
try to use it, but if the memory is gone, then it can't even check the foo->active()
flag w/out crashing.

It seems a pretty simple use-after-free bug, and the fix seems pretty
trivial to me.


> Caching the transport pointer (XrlPFSender) in the Xrl itself is just
> asking for trouble in situations like this, given that we have no means
> of telling the Xrl 'your resolved sender has gone away' -- it's buried
> in libfooxif.so's copy-on-write BSS segment.
>
> It's a non-trivial issue to fix. Using the ref_ptr seems deceptively
> simple -- we get a handle on the transport, and so the code doesn't blow
> up, but we probably don't fix the underlying issue (unless I'm missing
> something).

Assuming a new sender is created, the Xrl will notice the cached one
is inactive and search for a new one.  Seems like it all works out to me.

> It would be good to get a handle first on who introduced the secondary
> caching mechanism, and why. Most likely this was to avoid any STL
> container traversals when an XRL is actually being sent, but given that
> you've probably run into a race which blows this mechanism up, it needs
> revisiting. (Yes, this has fallen under the axe in the Thrift branch...!)

I think you are over-thinking this one!

Thanks,
Ben


-- 
Ben Greear <greearb at candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


From greearb at candelatech.com  Thu Oct 29 10:57:18 2009
From: greearb at candelatech.com (Ben Greear)
Date: Thu, 29 Oct 2009 10:57:18 -0700
Subject: [Xorp-hackers] PATCH:  Small memory leak in CliCommand
Message-ID: <4AE9D77E.4050502@candelatech.com>

Found this while using valgrind.  I think the patch will work, but I haven't
actually tested it yet.

==27835== 6,184 (240 direct, 5,944 indirect) bytes in 1 blocks are definitely lost in loss record 48 of 59
==27835==    at 0x4A06FFC: operator new(unsigned long) (vg_replace_malloc.c:230)
==27835==    by 0x53D5DD: CliCommand::add_pipes(std::string&) (cli_command.cc:426)
==27835==    by 0x5215C3: CliNode::CliNode(int, xorp_module_id, EventLoop&) (cli_node.cc:94)
==27835==    by 0x40714F: XrlFeaNode::XrlFeaNode(EventLoop&, std::string const&, std::string const&, std::string const&, unsigned shor
t, bool) (xrl_fea_node.cc:79)
==27835==    by 0x40638C: fea_main(std::string const&, unsigned short) (xorp_fea.cc:97)
==27835==    by 0x406681: main (xorp_fea.cc:181)
==27835==


[greearb at ben-dt2 xorp.ct]$ git diff
diff --git a/cli/cli_command.cc b/cli/cli_command.cc
index 99a003b..256157f 100644
--- a/cli/cli_command.cc
+++ b/cli/cli_command.cc
@@ -95,6 +95,7 @@ CliCommand::~CliCommand()
  {
      // Delete recursively all child commands
      delete_pointers_list(_child_command_list);
+    delete_pipes();
  }

  //
@@ -428,6 +429,7 @@ CliCommand::add_pipes(string& error_msg)
      if (com0 == NULL) {
         return (XORP_ERROR);
      }
+    delete_pipes(); // be sure to not leak memory if one is already set.
      set_cli_command_pipe(com0);

      cli_pipe = new CliPipe("count");


-- 
Ben Greear <greearb at candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


From bms at incunabulum.net  Thu Oct 29 11:02:07 2009
From: bms at incunabulum.net (Bruce Simpson)
Date: Thu, 29 Oct 2009 18:02:07 +0000
Subject: [Xorp-hackers] XRL call serialization
In-Reply-To: <4ADCD472.2020203@candelatech.com>
References: <4ACD6B16.6080500@candelatech.com>
	<4ADC8017.9010507@incunabulum.net>
	<4ADCD472.2020203@candelatech.com>
Message-ID: <4AE9D89F.7080406@incunabulum.net>

Hi Ben,

Just saw this... sorry for the delay.

Ben Greear wrote:
> With regard to XRL, I've a question:
>
> If an application makes 3 XRL calls:
>
> do_a()
> do_b()
> commit_all()
>
> Is there any guarantee that these are strictly delivered to
> the peer process in the order called?  Code appears to expect
> this to be true, but I'm suspicious that perhaps it does not.

XORP processes are intended to be asynchronous; this is realized using 
explicit coroutines, with no additional C++ runtime support other than 
UNIX system calls.

XRL is intended to be an asynchronous IPC layer. The calls in your 
example won't be guaranteed to be serialized, unless you explicitly 
serialize them in your process.

You can see this in XORP processes and tools in the form of a ping-pong 
between callback routines.
    An example of forcing serialization can be found in 
contrib/olsr/tools/print_databases.cc, where you can see the EventLoop 
being run whilst the Getter does its thing. get() is called, this fires 
off an XRL, and the list_cb() will successively be called for each 
fetch, until Getter::_done is set to true by the final fetch.

    BTW: One example of what NOT to do would be the 
XrlIO::register_rib() function in contrib/olsr/xrl_io.cc. The semantics 
behind those two XRL calls are co-dependent, and will be different 
depending on whether or not an OLSR origin table is already registered 
with the RIB. But you can see that two different XRLs can be fired off 
'in parallel'.

Class Xrl has no notion of call/reply sequence numbers, which are 
necessary in order to deal with out-of-order delivery, as well as 
identifying individual method calls on-the-wire.

    However, the XORP application code is written with the expectation 
that XRL is async. The fact that a few things 'under the hood' in XRL 
prevent it being fully async, is largely academic -- the tutorial 
materials are pretty clear you shouldn't assume serial method call 
returns, etc.

    In practice, what happens is that the XRL transport(s) themselves 
will stamp each call with a sequence ID. You can see this happening in 
XrlPFSTCPSender::send(). Although it *does* expect delivery in sequence 
(you can see this in XrlPFSTCPSender::read_event()), this is purely how 
it's been done here.

    In this respect, XRL is totally tied to TCP semantics in its 
implemention, and RPCs should not be reordered, given that their 
dispatch in the XRL target is synchronous with their delivery -- there 
is no intermediate queueing, apart from the kernel's socket buffers.

    But there should be no expectation of this by application 
developers. Indeed, if you look at the stubs which Thrift generates, the 
client code only allows 1 request in-flight; it always sets the sequence 
number to 0. In practice, this isn't a problem, because in Thrift, the 
servers tell clients apart per session.

    In XRL, we tell calls apart by method name. Something tells me this 
gets really interesting if we try to thread the RIB or otherwise move it 
into another process. I should point out that XRL targets never actually 
get to see the Xrl itself -- they just get passed a bunch of arguments 
by the XrlRouter, and their handler function invoked.

On a Grim Code Reaper's note:
    This makes it pretty much impossible, using the existing code, to 
implement any serialization or parallelism policy within each XORP 
process, as well as making it impossible to decentralize the method call 
disposition, because it's tied to TCP streams.

Therefore:
    Synchronous dispatch of method calls doesn't change in a Thrifted 
XORP to begin with -- too much of the existing router code is written 
around this expectation.

cheers,
BMS


From greearb at candelatech.com  Thu Oct 29 11:15:33 2009
From: greearb at candelatech.com (Ben Greear)
Date: Thu, 29 Oct 2009 11:15:33 -0700
Subject: [Xorp-hackers] XRL call serialization
In-Reply-To: <4AE9D89F.7080406@incunabulum.net>
References: <4ACD6B16.6080500@candelatech.com>
	<4ADC8017.9010507@incunabulum.net>
	<4ADCD472.2020203@candelatech.com>
	<4AE9D89F.7080406@incunabulum.net>
Message-ID: <4AE9DBC5.8050600@candelatech.com>

On 10/29/2009 11:02 AM, Bruce Simpson wrote:
> Hi Ben,
>
> Just saw this... sorry for the delay.
>
> Ben Greear wrote:
>> With regard to XRL, I've a question:
>>
>> If an application makes 3 XRL calls:
>>
>> do_a()
>> do_b()
>> commit_all()
>>
>> Is there any guarantee that these are strictly delivered to
>> the peer process in the order called? Code appears to expect
>> this to be true, but I'm suspicious that perhaps it does not.
>
> XORP processes are intended to be asynchronous; this is realized using
> explicit coroutines, with no additional C++ runtime support other than
> UNIX system calls.

It seems that the router-mgr *might* could read and queue several xrl
requests, and then possibly answer them out of order.  (Been a few
days since I poked at the router mgr code, not sure I fully understood
it when I did).

OSPF, at least, seems to expect the XRL calls (and responses) are
serialized, at least in a few places.

Considering TCP is the transport, if the rtr-mgr was made to be strictly
serialized in handling requests for each client, that should do the trick.

> You can see this in XORP processes and tools in the form of a ping-pong
> between callback routines.

Yes, I've seen this..but in other cases, it seems programmers got lazy
and made assumptions that are *almost* always right.  If I can reproduce
the problems I saw in OSPF, I'll keep this async'ness in mind while
debugging...

Thanks,
Ben

-- 
Ben Greear <greearb at candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


From bms at incunabulum.net  Thu Oct 29 11:34:08 2009
From: bms at incunabulum.net (Bruce Simpson)
Date: Thu, 29 Oct 2009 18:34:08 +0000
Subject: [Xorp-hackers] Crash due to stale cached xrl sender pointer.
In-Reply-To: <4AE9D45B.1090003@candelatech.com>
References: <4AE799E1.8010300@candelatech.com>
	<4AE7D31B.1040107@candelatech.com>
	<4AE9B7D1.803@incunabulum.net> <4AE9BAE5.20406@candelatech.com>
	<4AE9CC6B.9050302@incunabulum.net>
	<4AE9D45B.1090003@candelatech.com>
Message-ID: <4AE9E020.4060403@incunabulum.net>

Ben Greear wrote:
> ...
> Please note that the sender will be marked in-active, so the XRL will 
> not actually
> try to use it, but if the memory is gone, then it can't even check the 
> foo->active()
> flag w/out crashing.
>
> It seems a pretty simple use-after-free bug, and the fix seems pretty
> trivial to me.

I'm pleased that you've found an issue, and come up with a fix that 
appears to work for you in the here and now. I would also class part of 
the issue you've run into as a design bug in XRL, and have tried to 
explain (as best I can) why I believe that is the case.

I would prefer to know what the root cause of the transport pointer 
being invalidated is; this is mostly so that I can avoid introducing a 
similar situation in new code.

However, I'm concerned that the suggested fix, actually makes the code 
more difficult to read than it already is. I'm not happy with ref_ptr, 
and it has been a source of problems for me in the past.

Of course, it's worth bearing in mind that I am looking at this from a 
very critical viewpoint at the moment. ;-)

cheers,
BMS


From bms at incunabulum.net  Thu Oct 29 12:09:16 2009
From: bms at incunabulum.net (Bruce Simpson)
Date: Thu, 29 Oct 2009 19:09:16 +0000
Subject: [Xorp-hackers] XRL call serialization
In-Reply-To: <4AE9DBC5.8050600@candelatech.com>
References: <4ACD6B16.6080500@candelatech.com>
	<4ADC8017.9010507@incunabulum.net>
	<4ADCD472.2020203@candelatech.com>
	<4AE9D89F.7080406@incunabulum.net>
	<4AE9DBC5.8050600@candelatech.com>
Message-ID: <4AE9E85C.60509@incunabulum.net>

Hi Ben,

Time for more devil's advocate action.

Ben Greear wrote:
>
> It seems that the router-mgr *might* could read and queue several xrl
> requests, and then possibly answer them out of order.

Based on my recent footprints in XrlAction, that is very likely. But not 
because XRL is doing anything wrong.

One of the things I wanted to mention in my previous reply on this 
thread: if you keep calling different XRL methods, reentrancy in the 
client isn't a problem -- you can tell your own requests apart just 
fine, they're for different methods.

But but but: if we have multiple XRL calls in-flight, for the same 
method, this breaks down. Now the dispatch of the callback ('here's the 
answer to my question') will be on a per-call basis. And so the only 
guarantee you get of in-order dispatch, is the fact that XRL transport 
is using a stream (TCP out of the box).

If you mix possibly co-dependent operations and fire them off, problems 
may happen. [Although the XRLs in these scenarios aren't being batched.]

This is why the Router Manager is pretty tight about its timings, and 
keeping the XRL actions tied down to particular commit steps, is pretty 
critical to making sure stuff doesn't go out of control.

    Again, it might be worth revisiting Pavlin's original idea, that we 
teach the routing processes to keep their own snapshots of state and 
implement commit/rollback there. The more I stare at Thrift and XRL, the 
more I believe that's a good idea. It simplifies the Router Manager 
interface with the other processes.

    Although as you point out, we still need to keep those snapshots 
around in the Router Manager so that the process can restart OK -- 
either that or we give processes some abstract form of non-volatile 
storage we can easily propagate back to the management point at the 
point of commit.

However, you're quite right -- I see no reason why you can't introduce 
funk into the system from the Router Manager, the same way that olsr's 
register_rib() method might.

Consider this scenario -- let's imagine that xorp_olsr has crashed. It 
left a whole bunch of OLSR routes in the RIB. It is using a non-default 
admin distance. For whatever reason, this was configured on-the-fly, and 
was an uncommitted change. That process is restarted.

Along comes the existing register_rib() function. Let's assume the 
set-admin-distance step modifies the old origin table from the previous 
incarnation of xorp_olsr. Let's also assume that there is a 
redistribution policy in effect for OSPF, which is redistributing routes 
above a given admin distance.on another interface to an OSPF backbone area.

You can see how that gets really interesting. As soon as the call to 
change the admin distance has fired, the routes will be rewritten to 
contain the new admin distance, the RIB will redistribute the routes 
(via policy) to xorp_ospf, and we've got a fair amount of system 
activity going on, just due to a process restart.

Fortunately, the RIB method to set the admin distance does not rewrite 
existing routes at the moment, and that was deliberately left unfinished 
(although not for this reason). So this scenario, whilst it's been 
elaborated on somewhat, isn't possible just now with the mechanisms I've 
described.

But it does point towards the need to either have a configurable policy 
for method disposition, or strong guaranteees about the RPC layer 
behaving in-order.

You end up having to rely on a reliable network transport. You can 
assume that the XRL request you just got is to be executed right away, 
but only insofar as the transport you read from, has not re-ordered 
anything in transit.

Reliability doesn't imply in-order delivery to the user process. If you 
receive XRL requests out of order, you'll need to buffer them. If your 
transport isn't reliable, you have no way of knowing that you won't get 
an earlier message -- without implementing the concept of a time-out; 
i.e. if Mr Server don't see an out-of-order message within N time units, 
I will time it out and send you a NACK, to stop blocking all other 
access to the resource. [Sounds like kernel driver locking to me...]

Up until now, we have relied on TCP to do all of this for us behind the 
scenes. The price we pay for that is some inefficiency in the 
implementation: head-of-line blocking, and being unable to preserve RPC 
method boundaries.

(This is why the AMQP guys have the hots for STCP, but the STCP guys 
can't do much about pushing the model forward until Microsoft sit up and 
take notice -- no-one's shipping STCP as a Windows 7 NDIS/TDI driver, as 
far as I know.)

You can see why stuff like TIPC happens. But I seriously disagree with 
their approach. Pushing all asynchrony into the kernel isn't the answer, 
and it limits your client uptake -- Linux is not the only game in town, 
and there are very good reasons for that which I won't go into here. 
Just using the existing Berkeley Sockets API is cute, but far from 
perfect -- it has holes of its own.

Also, they never really tackled the cross-language interop issue the way 
Thrift has.

...


So I guess it boils down to: caveat implementor. If you use XRL, don't 
rely on call serialization from the API. If you need to cross road after 
pushing button, do so. Otherwise, you might end up in a traffic 
accident. :-)

>   (Been a few
> days since I poked at the router mgr code, not sure I fully understood
> it when I did).

There is a lot going on in there.

XRLs should be dispatched in the order in which they are received. 
However, there are actually no guarantees for this behaviour -- it is 
'best effort'.

    When an XRL call is received, for example, STCPRequestHandler will 
attempt to dispatch it immediately, in line with further reception.
    XRL targets are internally synchronous. The method call dispatch 
happens in the context of XrlRouter's event I/O callbacks, which are 
registered with the outer EventLoop.

So from the server's point of view, XRL is pretty much synchronous. But, 
even on the same host, that dispatch could happen on another CPU. [As 
I've probably mentioned elsewhere, most of XORP's inter-process sync in 
the time domain, is actually pinioned on the host's socket buffer locks.]

The uncertainty in the whole system to do with time and call dispatch is 
however localized:
 * When/how did that XRL get fired off?
 * How are my socket buffers?
 * How many cores do I have?
 * How's my scheduling?

    Just out of interest, I will reveal that as of this week, that I 
have written most of the code generator needed to shim XRL calls 
directly into Thrift ones.

    This is so that adopting Thrift does not mean a dragnet across all 
400+ KLOCs of XORP, but should make it a mostly drop-in replacement for 
XRL in the existing code.

    I have yet to write most of the new libxipc, though, which is why 
I'm feeling that space out just now, and being pretty conservative in 
what I'm disclosing (people have got a whiff of what I'm doing; 
knowledge breeds expectations; expectations pump up the volume).

    Thrift's C++ RPC libraries are actually pretty written. They make it 
possible to pull off a few tricks for making the method calls a bit more 
scalable, and for providing guarantees about call serialization in a 
scalable system.

    However, making that work requires some additional movement. As you 
can see, there are a few assumptions about how the whole system actually 
behaves, which are incorrect in some places.

cheers,
BMS


From greearb at candelatech.com  Thu Oct 29 12:16:41 2009
From: greearb at candelatech.com (Ben Greear)
Date: Thu, 29 Oct 2009 12:16:41 -0700
Subject: [Xorp-hackers] Crash due to stale cached xrl sender pointer.
In-Reply-To: <4AE9E020.4060403@incunabulum.net>
References: <4AE799E1.8010300@candelatech.com>
	<4AE7D31B.1040107@candelatech.com>
	<4AE9B7D1.803@incunabulum.net> <4AE9BAE5.20406@candelatech.com>
	<4AE9CC6B.9050302@incunabulum.net>
	<4AE9D45B.1090003@candelatech.com>
	<4AE9E020.4060403@incunabulum.net>
Message-ID: <4AE9EA19.5030702@candelatech.com>

On 10/29/2009 11:34 AM, Bruce Simpson wrote:
> Ben Greear wrote:
>> ...
>> Please note that the sender will be marked in-active, so the XRL will
>> not actually
>> try to use it, but if the memory is gone, then it can't even check the
>> foo->active()
>> flag w/out crashing.
>>
>> It seems a pretty simple use-after-free bug, and the fix seems pretty
>> trivial to me.
>
> I'm pleased that you've found an issue, and come up with a fix that
> appears to work for you in the here and now. I would also class part of
> the issue you've run into as a design bug in XRL, and have tried to
> explain (as best I can) why I believe that is the case.

If anything can ever delete a sender, and if we don't clean up outstanding
XRLs when we delete the sender, then the bug exists.  Grep for 'destroy_sender'
to see how xrl_router.cc can destroy them..because the are no longer 'live'.
Base-class doesn't handle setting 'aliveness', so no idea what object is actually
no longer thinking it is alive.

We either need to clean up those XRLs by invalidating their cache,
remove the xrl sender cache entirely, or make sure the sender can't
be deleted while XRLs referencing it exist.

The first is liable to be difficult.

The second a performance penalty.

The third used ref-ptrs and changed very little actual logic.

> I would prefer to know what the root cause of the transport pointer
> being invalidated is; this is mostly so that I can avoid introducing a
> similar situation in new code.
>
> However, I'm concerned that the suggested fix, actually makes the code
> more difficult to read than it already is. I'm not happy with ref_ptr,
> and it has been a source of problems for me in the past.

Xorp is a royal pain in the arse to read, with all it's typedefs, deep class inheritance,
auto-generated templated code (try to read the callback code some day..impossible),
timers, xrl black hole, chained (and unchained, for that matter) callbacks, etc.

It is one of the most hard to read pieces of code I've ever looked at (only the original
Vocal project was just barely worse, primarily because it was threaded and full of bugs).
Maybe Thrift will help..but if it's just yet another indirection or black hole of
magic, I doubt it.

> Of course, it's worth bearing in mind that I am looking at this from a
> very critical viewpoint at the moment. ;-)

I think if I spent more than 2 days looking at XRL I'd rip it's guts out and
re-implement it entirely.  I don't envy your task, and I hope it works out well.

Thanks,
Ben

-- 
Ben Greear <greearb at candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


From bms at incunabulum.net  Thu Oct 29 13:07:47 2009
From: bms at incunabulum.net (Bruce Simpson)
Date: Thu, 29 Oct 2009 20:07:47 +0000
Subject: [Xorp-hackers] More on XRL and Thrift.
In-Reply-To: <4AE9EA19.5030702@candelatech.com>
References: <4AE799E1.8010300@candelatech.com>
	<4AE7D31B.1040107@candelatech.com>
	<4AE9B7D1.803@incunabulum.net> <4AE9BAE5.20406@candelatech.com>
	<4AE9CC6B.9050302@incunabulum.net>
	<4AE9D45B.1090003@candelatech.com>
	<4AE9E020.4060403@incunabulum.net>
	<4AE9EA19.5030702@candelatech.com>
Message-ID: <4AE9F613.8080504@incunabulum.net>

Ben Greear wrote:
>
> Xorp is a royal pain in the arse to read, with all it's typedefs, deep 
> class inheritance,
> auto-generated templated code (try to read the callback code some 
> day..impossible),
> timers, xrl black hole, chained (and unchained, for that matter) 
> callbacks, etc.
>
> It is one of the most hard to read pieces of code I've ever looked at 
> (only the original
> Vocal project was just barely worse, primarily because it was threaded 
> and full of bugs).
> Maybe Thrift will help..but if it's just yet another indirection or 
> black hole of
> magic, I doubt it.

XRL and Thrift bear some comparison.

    The one that sprang to mind just as I put my dinner on the hob, was 
this one: Thrift draws a clean distinction between the RPC transport, 
and the representation used on the transport. XRL does have such a 
distinction, but it isn't as clear cut for the transport, or the 
representation. So there is quite a bit of bleed-through across the code 
for each XrlPF.

In places this leads to some dire performance problems due to the level 
of indirection involved. Thrift, on the other hand, is pretty lean and 
mean in its C++ library.

    What I'll aim to do is to release parts of the Thrifted XORP tree 
I'm comfortable with and which could use further review.

    Earlier in this thread, we saw a situation which could only arise 
because in XRL, we attempt to cache a pointer to the transport endpoint, 
in the client-side RPC stub itself, with no way that pointer could be 
cleanly invalidated. Xrl's use of XrlPFSender here is strictly as a 
cache -- class Xrl does not participate in the life cycle of 
XrlPFSender, beyond being used by it.

    This is actually a really good use of a Boost weak_ptr. It is quite 
literally an observation of a shared_ptr. That pointer can happen to be 
invalid. But the situation arose in the first place because of XRL's 
granularity being per-method only, which is why I'd argue it's a design bug.

    But let's flip back to how callback stubs are generated, and end up 
in libfubarxif.so. The XrlFooClient object is instantiated. Whilst it's 
associated with an XrlRouter to begin with, this is in fact an 
association with the stereotype XrlSender (a thing which can send XRLs). 
We don't interact with this object much beyond invoking its send() 
method, when the client calls XrlFooClient::send_foo().

    Part of the problem here is that XRL attempts to do call resolution 
per-method.

    In a Thrifted world, the XrlSender can inform the XrlFooClient 
object that the endpoint changed, but this need only be on a per-service 
basis; there's no need to cache every single method call resolution, as 
the XRL foo_xif.cc stubs currently attempt to, and as you saw, this just 
caused problems when the endpoints themselves were possibly subject to a 
race.

    Broadly, in Thrift, the equivalent of that Xrl::resolved_sender() 
pointer is the output transport pointer. Because the transport is just 
something which can be written to, to issue an RPC call, we can deal 
with the semantics of moving to a new endpoint outside of the scope of 
that call.
    In fact, we may be best off providing the XORP apps a TMemoryBuffer 
to scribble the shimmed XRL calls into. This means a transport is always 
available.

    Because we're then dealing with a binary blob, rather than a local 
copy of an Xrl frankenblob, dispatching it is really easy, and we can 
cache the endpoint to our heart's content, probably using a 
boost::weak_ptr to boot.
    We can then let libxipc decide how to route it, in a runtime scope 
where we are more in control of the endpoint situation, rather than 
being at the mercy of a lone pointer.

>
> I think if I spent more than 2 days looking at XRL I'd rip it's guts 
> out and
> re-implement it entirely.  I don't envy your task, and I hope it works 
> out well.

No comment. ;-)

I am generally pleased with how it's going, and got much closer to the 
action this week.

It took a lot of reading to figure out that what is being attempted is 
in fact possible, but it's going to take a bit of effort to get it off 
the ground.

thanks,
BMS


From greearb at candelatech.com  Thu Oct 29 20:24:42 2009
From: greearb at candelatech.com (Ben Greear)
Date: Thu, 29 Oct 2009 20:24:42 -0700
Subject: [Xorp-hackers] rtrmgr crash on SIGABRT because of pop_front in
 task_done
In-Reply-To: <633162.71070.qm@web58707.mail.re1.yahoo.com>
References: <633162.71070.qm@web58707.mail.re1.yahoo.com>
Message-ID: <4AEA5C7A.9070807@candelatech.com>

Li Zhao wrote:
> I added a new protocol and I can start it in CLI by command "create protocol XXX", but the rtrmgr crashed after command "delete protocol XXX".
> I can also easily reproduce the exactlt same crash via the following steps:
>
> 0. I am running xorp processes on an embedded system.
> 1. start rtrmgr from linux shell on the system;
> 2. manually start xorp_static_routes from linux shell. This static will hijack the xrl channels to rtrmgr;
> 3. use cli command "create protocol static" to start a second xorp_static_routes.
> 4. use cli command "delete protocol static" to stop static. both xorp_static_routes were terminated. depended process like fea, rib and policy were also terminated. rtrmgr crash.
>   
Ok, the crash is because if you do a pop_front() on an empty list, it's 
going to crash.

I'm not sure why the list is empty here.  Seems to indicate task-manager 
logic is busted
with regard to task list management and/or callbacks are being called 
against a wrong
task-manager.

Do you actually need to do this operation for your project?  If so, you 
probably will want
to investigate task-manager logic in detail to figure out why this is 
happening.

The attached patch fixes the crash, but the underlying bug persists.  
Most of the patch is debugging
code, but I'm leaving it in my tree because it will help next time we 
hit a similar problem.

Thanks,
Ben

-- 
Ben Greear <greearb at candelatech.com> 
Candela Technologies Inc  http://www.candelatech.com


-------------- next part --------------
A non-text attachment was scrubbed...
Name: patch0.patch
Type: text/x-patch
Size: 3306 bytes
Desc: not available
Url : http://mailman.ICSI.Berkeley.EDU/pipermail/xorp-hackers/attachments/20091029/eb2b6df3/attachment.bin 

From greearb at candelatech.com  Thu Oct 29 22:30:25 2009
From: greearb at candelatech.com (Ben Greear)
Date: Thu, 29 Oct 2009 22:30:25 -0700
Subject: [Xorp-hackers] PATCH:   Enable libxipc tests
Message-ID: <4AEA79F1.5050507@candelatech.com>

FYI:  Here are stats from my 2.4Ghz E5530 system:

Patch to enable these tests is attached (and pushed to my tree).

[root at i7-dqc-1 tests]# ./test_xrl_receiver&
[root at i7-dqc-1 tests]# ./test_xrl_sender
XrlAtoms per call = 1
Send method = pipeline
start_transmission_cb 100 Okay
Received 10000 XRLs; delta_time = 0.738458 secs; speed = 13541.731554 XRLs/s
start_transmission_cb 100 Okay
Received 10000 XRLs; delta_time = 0.439395 secs; speed = 22758.565755 XRLs/s
start_transmission_cb 100 Okay
Received 10000 XRLs; delta_time = 0.408516 secs; speed = 24478.845382 XRLs/s
start_transmission_cb 100 Okay
Received 10000 XRLs; delta_time = 0.407115 secs; speed = 24563.084141 XRLs/s

Thanks,
Ben

-- 
Ben Greear <greearb at candelatech.com> 
Candela Technologies Inc  http://www.candelatech.com


-------------- next part --------------
A non-text attachment was scrubbed...
Name: xorp_xipc_tests.patch
Type: text/x-patch
Size: 5858 bytes
Desc: not available
Url : http://mailman.ICSI.Berkeley.EDU/pipermail/xorp-hackers/attachments/20091029/7127b404/attachment-0001.bin 

From greearb at candelatech.com  Thu Oct 29 22:47:43 2009
From: greearb at candelatech.com (Ben Greear)
Date: Thu, 29 Oct 2009 22:47:43 -0700
Subject: [Xorp-hackers] PATCH:   Enable libxipc tests
In-Reply-To: <4AEA79F1.5050507@candelatech.com>
References: <4AEA79F1.5050507@candelatech.com>
Message-ID: <4AEA7DFF.5050002@candelatech.com>

Ben Greear wrote:
> FYI:  Here are stats from my 2.4Ghz E5530 system:

Here's oprofile output for a similar test (test_xrl_sender)

-- 
Ben Greear <greearb at candelatech.com> 
Candela Technologies Inc  http://www.candelatech.com


-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: xrl_test_oprofile_summary.txt
Url: http://mailman.ICSI.Berkeley.EDU/pipermail/xorp-hackers/attachments/20091029/c4a6c46e/attachment.txt 

From lizhaous2000 at yahoo.com  Fri Oct 30 07:23:25 2009
From: lizhaous2000 at yahoo.com (Li Zhao)
Date: Fri, 30 Oct 2009 07:23:25 -0700 (PDT)
Subject: [Xorp-hackers] rtrmgr crash on SIGABRT because of pop_front in
	task_done
In-Reply-To: <4AEA5C7A.9070807@candelatech.com>
Message-ID: <150870.8613.qm@web58705.mail.re1.yahoo.com>

I have three cases in which this crash occured. The one you set up is one of them.
I used you fix. It did prevent rtrmgr from crashing in all three cases. That is good.
But i am afraid that is not the root cause because task manager always check if the tasklist
is not empty before it run any task.
I will keep debugging to look for the root cause and will let you know if
I found anything.

Thank you for spending time on this.

Li

--- On Thu, 10/29/09, Ben Greear <greearb at candelatech.com> wrote:

> From: Ben Greear <greearb at candelatech.com>
> Subject: Re: [Xorp-hackers] rtrmgr crash on SIGABRT because of pop_front in task_done
> To: "Li Zhao" <lizhaous2000 at yahoo.com>
> Cc: xorp-hackers at icir.org
> Date: Thursday, October 29, 2009, 11:24 PM
> Li Zhao wrote:
> > I added a new protocol and I can start it in CLI by
> command "create protocol XXX", but the rtrmgr crashed after
> command "delete protocol XXX".
> > I can also easily reproduce the exactlt same crash via
> the following steps:
> > 
> > 0. I am running xorp processes on an embedded system.
> > 1. start rtrmgr from linux shell on the system;
> > 2. manually start xorp_static_routes from linux shell.
> This static will hijack the xrl channels to rtrmgr;
> > 3. use cli command "create protocol static" to start a
> second xorp_static_routes.
> > 4. use cli command "delete protocol static" to stop
> static. both xorp_static_routes were terminated. depended
> process like fea, rib and policy were also terminated.
> rtrmgr crash.
> >???
> Ok, the crash is because if you do a pop_front() on an
> empty list, it's going to crash.
> 
> I'm not sure why the list is empty here.? Seems to
> indicate task-manager logic is busted
> with regard to task list management and/or callbacks are
> being called against a wrong
> task-manager.
> 
> Do you actually need to do this operation for your
> project?? If so, you probably will want
> to investigate task-manager logic in detail to figure out
> why this is happening.
> 
> The attached patch fixes the crash, but the underlying bug
> persists.? Most of the patch is debugging
> code, but I'm leaving it in my tree because it will help
> next time we hit a similar problem.
> 
> Thanks,
> Ben
> 
> -- Ben Greear <greearb at candelatech.com>
> Candela Technologies Inc? http://www.candelatech.com
> 
> 
> 
> -----Inline Attachment Follows-----
> 
> 


From greearb at candelatech.com  Fri Oct 30 07:29:55 2009
From: greearb at candelatech.com (Ben Greear)
Date: Fri, 30 Oct 2009 07:29:55 -0700
Subject: [Xorp-hackers] Omitting XrlDB from Router Manager
In-Reply-To: <4AE7178B.9000709@incunabulum.net>
References: <4AE7178B.9000709@incunabulum.net>
Message-ID: <4AEAF863.7000500@candelatech.com>

Bruce Simpson wrote:
> Hi all,
>
>    I'm still looking at the XRL replacement since I got back from 
> holiday, which is why I've been mostly silent on lists.
>
>    Something came up in analysis, which broadly relates to Ben 
> Greear's work on reducing Router Manager startup times, etc. and some 
> of the questions Li Zhao has been asking in other threads on this list.
>
> @Ben: It would be interesting to know what difference omitting the 
> XRLDB code makes to your Router Manager startup times.
> * The XRLDB seems to exist pretty much to validate what's in the 
> template files and how the Router Manager uses them, although this is 
> done completely at run time.
> * I wonder if disabling this code would make a difference to performance.
> * To do this, I'd hack rtrmgr/template_commands.cc, and comment out 
> the calls to the XRLdb methods.
> * The rtrmgr/xrldb.cc is the only place in the whole system where the 
> '*.xrls' files are parsed and used. They are used only to validate the 
> syntax and structure of potential XRL method calls.
> * It would mean that there is no up-front validation of the XRLs, but 
> in practice, this validation step is probably only of interest to 
> people developing XORP, to catch problems with template files.
> * It's probably best folded under a compile-time #define for developer 
> use.

Something like the attached patch?

Thanks,
Ben

-- 
Ben Greear <greearb at candelatech.com> 
Candela Technologies Inc  http://www.candelatech.com


-------------- next part --------------
A non-text attachment was scrubbed...
Name: xorp_xrldb_verification.patch
Type: text/x-patch
Size: 4454 bytes
Desc: not available
Url : http://mailman.ICSI.Berkeley.EDU/pipermail/xorp-hackers/attachments/20091030/28957ef2/attachment.bin 

From lizhaous2000 at yahoo.com  Fri Oct 30 07:30:30 2009
From: lizhaous2000 at yahoo.com (Li Zhao)
Date: Fri, 30 Oct 2009 07:30:30 -0700 (PDT)
Subject: [Xorp-hackers] rtrmgr crash on SIGABRT because of pop_front in
	task_done
In-Reply-To: <4AE9D059.20102@candelatech.com>
Message-ID: <982408.82472.qm@web58702.mail.re1.yahoo.com>

I thought task manager was fine. But it might be that the first node was deleted twice, one of which is this pop_front and another hidden one.

--- On Thu, 10/29/09, Ben Greear <greearb at candelatech.com> wrote:

> From: Ben Greear <greearb at candelatech.com>
> Subject: Re: [Xorp-hackers] rtrmgr crash on SIGABRT because of pop_front in task_done
> To: "Li Zhao" <lizhaous2000 at yahoo.com>
> Cc: xorp-hackers at icir.org
> Date: Thursday, October 29, 2009, 1:26 PM
> On 10/29/2009 08:16 AM, Li Zhao
> wrote:
> > I am puzzled by operator delete(prt=0x0). But inside
> deallocate(this=0x8d55238, __p=0x8d55238), the __p is not
> 0x0. pop_front means "removes and deletes". So somewhere
> else this list node was deleted again?
> >
> > --- On Thu, 10/29/09, Li Zhao<lizhaous2000 at yahoo.com>?
> wrote:
> >
> >> From: Li Zhao<lizhaous2000 at yahoo.com>
> >> Subject: [Xorp-hackers] rtrmgr crash on SIGABRT
> because of pop_front in task_done
> >> To: xorp-hackers at icir.org
> >> Date: Thursday, October 29, 2009, 10:54 AM
> >> I added a new protocol and I can
> >> start it in CLI by command "create protocol XXX",
> but the
> >> rtrmgr crashed after command "delete protocol
> XXX".
> >> I can also easily reproduce the exactlt same crash
> via the
> >> following steps:
> >>
> >> 0. I am running xorp processes on an embedded
> system.
> >> 1. start rtrmgr from linux shell on the system;
> >> 2. manually start xorp_static_routes from linux
> shell. This
> >> static will hijack the xrl channels to rtrmgr;
> >> 3. use cli command "create protocol static" to
> start a
> >> second xorp_static_routes.
> >> 4. use cli command "delete protocol static" to
> stop static.
> >> both xorp_static_routes were terminated. depended
> process
> >> like fea, rib and policy were also terminated.
> rtrmgr
> >> crash.
> 
> I ran under valgrind, and saw this info:
> 
> ==27820== Invalid free() / delete / delete[]
> ==27820==? ? at 0x4A05E3F: operator delete(void*)
> (vg_replace_malloc.c:342)
> ==27820==? ? by 0x463531:
> __gnu_cxx::new_allocator<std::_List_node<Task*>
> >::deallocate(std::_List_node<Task*>*, unsigned
> long) (new_a
> llocator.h:95)
> ==27820==? ? by 0x462427:
> std::_List_base<Task*, std::allocator<Task*>
> >::_M_put_node(std::_List_node<Task*>*)
> (stl_list.h:320)
> ==27820==? ? by 0x46143B: std::list<Task*,
> std::allocator<Task*>
> >::_M_erase(std::_List_iterator<Task*>)
> (stl_list.h:1431)
> ==27820==? ? by 0x45FF0B: std::list<Task*,
> std::allocator<Task*> >::pop_front()
> (stl_list.h:906)
> ==27820==? ? by 0x45DB73:
> TaskManager::task_done(bool, std::string const&)
> (task.cc:2256)
> ==27820==? ? by 0x465970:
> XorpMemberCallback2B0<void, TaskManager, bool,
> std::string const&>::dispatch(bool, std::string
> const&) (call
> back_nodebug.hh:4636)
> ==27820==? ? by 0x45C540: Task::step8_report()
> (task.cc:1998)
> ==27820==? ? by 0x4659DF:
> XorpMemberCallback0B0<void, Task>::dispatch()
> (callback_nodebug.hh:306)
> ==27820==? ? by 0x449613:
> Module::terminate_with_prejudice(ref_ptr<XorpCallback0<void>
> >) (module_manager.cc:218)
> ==27820==? ? by 0x44F63C:
> XorpMemberCallback0B1<void, Module,
> ref_ptr<XorpCallback0<void> > >::dispatch()
> (callback_nodebug.hh:598)
> ==27820==? ? by 0x549D72:
> OneoffTimerNode2::expire(XorpTimer&, void*)
> (timer.cc:167)
> ==27820==? Address 0x50c9340 is 80 bytes inside a
> block of size 200 alloc'd
> ==27820==? ? at 0x4A06FFC: operator new(unsigned
> long) (vg_replace_malloc.c:230)
> ==27820==? ? by 0x42C81F:
> MasterConfigTree::MasterConfigTree(std::string const&,
> MasterTemplateTree*, ModuleManager&, XorpClient&,
> boo
> l, bool) (master_conf_tree.cc:119)
> ==27820==? ? by 0x406ED6: Rtrmgr::run()
> (main_rtrmgr.cc:319)
> ==27820==? ? by 0x407E57: main
> (main_rtrmgr.cc:665)
> 
> 
> It appears to me that the task-manager object (this) is
> already deleted when
> the taskmanager::task_done() method is called.
> 
> Could probably add some debugging to the destructors and
> constructors of TaskManager
> to verify.? I have some other things to do first..but
> will look at this a bit later
> if no one beats me to it.
> 
> Thanks,
> Ben
> 
> -- 
> Ben Greear <greearb at candelatech.com>
> Candela Technologies Inc? http://www.candelatech.com
> 
> 


From greearb at candelatech.com  Fri Oct 30 07:48:44 2009
From: greearb at candelatech.com (Ben Greear)
Date: Fri, 30 Oct 2009 07:48:44 -0700
Subject: [Xorp-hackers] rtrmgr crash on SIGABRT because of pop_front in
 task_done
In-Reply-To: <982408.82472.qm@web58702.mail.re1.yahoo.com>
References: <982408.82472.qm@web58702.mail.re1.yahoo.com>
Message-ID: <4AEAFCCC.8070802@candelatech.com>

Li Zhao wrote:
> I thought task manager was fine. But it might be that the first node was deleted twice, one of which is this pop_front and another hidden one.
>
>   
The task-manager is fine.  (See the assert_not_deleted() check in my patch).

I bet if you added print statements around adding/deleting tasks, and 
print out the 'this'
pointer, you'd learn something interesting...

Ben

-- 
Ben Greear <greearb at candelatech.com> 
Candela Technologies Inc  http://www.candelatech.com


From greearb at candelatech.com  Fri Oct 30 11:47:43 2009
From: greearb at candelatech.com (Ben Greear)
Date: Fri, 30 Oct 2009 11:47:43 -0700
Subject: [Xorp-hackers] PATCH:  Enable compiling with gprof support
Message-ID: <4AEB34CF.5050500@candelatech.com>

Also, this allows writing config vars to a file so an external program
(maybe a packager or installer), can use them automatically.

Thanks,
Ben

-- 
Ben Greear <greearb at candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: xorp_gprof.patch
Url: http://mailman.ICSI.Berkeley.EDU/pipermail/xorp-hackers/attachments/20091030/d4c0d162/attachment.ksh 

From bms at incunabulum.net  Sat Oct 31 09:53:53 2009
From: bms at incunabulum.net (Bruce Simpson)
Date: Sat, 31 Oct 2009 16:53:53 +0000
Subject: [Xorp-hackers] Omitting XrlDB from Router Manager
In-Reply-To: <4AEAF863.7000500@candelatech.com>
References: <4AE7178B.9000709@incunabulum.net>
	<4AEAF863.7000500@candelatech.com>
Message-ID: <4AEC6BA1.3010007@incunabulum.net>

Ben Greear wrote:
>>
>> * The rtrmgr/xrldb.cc is the only place in the whole system where the 
>> '*.xrls' files are parsed and used. They are used only to validate 
>> the syntax and structure of potential XRL method calls.
>> * It would mean that there is no up-front validation of the XRLs, but 
>> in practice, this validation step is probably only of interest to 
>> people developing XORP, to catch problems with template files.
>> * It's probably best folded under a compile-time #define for 
>> developer use.
>
> Something like the attached patch?

Great stuff :-) Does it work for you? Have you seen any measurable 
increase in performance for production systems?

I have actually chopped the entire Router Manager from my dev branch. 
There are parts of libxipc which are neither used or needed by anything 
but the Finder or Router Manager, and aren't essential for knitting 
processes together. I'll be merging it back on a piecemeal basis once 
I've actually got Thrift protocol working.


From greearb at candelatech.com  Sat Oct 31 15:51:52 2009
From: greearb at candelatech.com (Ben Greear)
Date: Sat, 31 Oct 2009 15:51:52 -0700
Subject: [Xorp-hackers] Omitting XrlDB from Router Manager
In-Reply-To: <4AEC6BA1.3010007@incunabulum.net>
References: <4AE7178B.9000709@incunabulum.net>
	<4AEAF863.7000500@candelatech.com>
	<4AEC6BA1.3010007@incunabulum.net>
Message-ID: <4AECBF88.7030704@candelatech.com>

Bruce Simpson wrote:
> Ben Greear wrote:
>>>
>>> * The rtrmgr/xrldb.cc is the only place in the whole system where 
>>> the '*.xrls' files are parsed and used. They are used only to 
>>> validate the syntax and structure of potential XRL method calls.
>>> * It would mean that there is no up-front validation of the XRLs, 
>>> but in practice, this validation step is probably only of interest 
>>> to people developing XORP, to catch problems with template files.
>>> * It's probably best folded under a compile-time #define for 
>>> developer use.
>>
>> Something like the attached patch?
>
> Great stuff :-) Does it work for you? Have you seen any measurable 
> increase in performance for production systems?
>
> I have actually chopped the entire Router Manager from my dev branch. 
> There are parts of libxipc which are neither used or needed by 
> anything but the Finder or Router Manager, and aren't essential for 
> knitting processes together. I'll be merging it back on a piecemeal 
> basis once I've actually got Thrift protocol working.
It can't hurt, but I didn't do any performance tests specifically for 
this change.  It does seem to function fine,
however.

My bigger problem is an N^2 problem with routes and number of routers 
(with 100 routers, and 300 routes each,
I get extreme numbers of netlink route update messages on each router.  
I'm patching the kernel to allow netlink to bind to
a particular routing table, so I should get rid of all the un-needed 
route updates for other routers' tables.  Hope to test
this in a day or two.

Do you have an estimate for when you plan to post your changes?

Thanks,
Ben

-- 
Ben Greear <greearb at candelatech.com> 
Candela Technologies Inc  http://www.candelatech.com