[Xorp-hackers] rtrmgr crash on SIGABRT because of pop_front in task_done

Li Zhao lizhaous2000 at yahoo.com
Mon Nov 2 09:31:42 PST 2009


This is a good link which might be interesting. 

http://www.ece.ucsb.edu/~kastner/labyrinth/bug1.txt


--- On Fri, 10/30/09, Li Zhao <lizhaous2000 at yahoo.com> wrote:

> From: Li Zhao <lizhaous2000 at yahoo.com>
> Subject: Re: [Xorp-hackers] rtrmgr crash on SIGABRT because of pop_front in task_done
> To: "Ben Greear" <greearb at candelatech.com>
> Cc: xorp-hackers at icir.org
> Date: Friday, October 30, 2009, 10:30 AM
> I thought task manager was fine. But
> it might be that the first node was deleted twice, one of
> which is this pop_front and another hidden one.
> 
> --- On Thu, 10/29/09, Ben Greear <greearb at candelatech.com>
> wrote:
> 
> > From: Ben Greear <greearb at candelatech.com>
> > Subject: Re: [Xorp-hackers] rtrmgr crash on SIGABRT
> because of pop_front in task_done
> > To: "Li Zhao" <lizhaous2000 at yahoo.com>
> > Cc: xorp-hackers at icir.org
> > Date: Thursday, October 29, 2009, 1:26 PM
> > On 10/29/2009 08:16 AM, Li Zhao
> > wrote:
> > > I am puzzled by operator delete(prt=0x0). But
> inside
> > deallocate(this=0x8d55238, __p=0x8d55238), the __p is
> not
> > 0x0. pop_front means "removes and deletes". So
> somewhere
> > else this list node was deleted again?
> > >
> > > --- On Thu, 10/29/09, Li Zhao<lizhaous2000 at yahoo.com> > wrote:
> > >
> > >> From: Li Zhao<lizhaous2000 at yahoo.com>
> > >> Subject: [Xorp-hackers] rtrmgr crash on
> SIGABRT
> > because of pop_front in task_done
> > >> To: xorp-hackers at icir.org
> > >> Date: Thursday, October 29, 2009, 10:54 AM
> > >> I added a new protocol and I can
> > >> start it in CLI by command "create protocol
> XXX",
> > but the
> > >> rtrmgr crashed after command "delete
> protocol
> > XXX".
> > >> I can also easily reproduce the exactlt same
> crash
> > via the
> > >> following steps:
> > >>
> > >> 0. I am running xorp processes on an
> embedded
> > system.
> > >> 1. start rtrmgr from linux shell on the
> system;
> > >> 2. manually start xorp_static_routes from
> linux
> > shell. This
> > >> static will hijack the xrl channels to
> rtrmgr;
> > >> 3. use cli command "create protocol static"
> to
> > start a
> > >> second xorp_static_routes.
> > >> 4. use cli command "delete protocol static"
> to
> > stop static.
> > >> both xorp_static_routes were terminated.
> depended
> > process
> > >> like fea, rib and policy were also
> terminated.
> > rtrmgr
> > >> crash.
> > 
> > I ran under valgrind, and saw this info:
> > 
> > ==27820== Invalid free() / delete / delete[]
> > ==27820==    at 0x4A05E3F: operator delete(void*)
> > (vg_replace_malloc.c:342)
> > ==27820==    by 0x463531:
> >
> __gnu_cxx::new_allocator<std::_List_node<Task*>
> > >::deallocate(std::_List_node<Task*>*,
> unsigned
> > long) (new_a
> > llocator.h:95)
> > ==27820==    by 0x462427:
> > std::_List_base<Task*, std::allocator<Task*>
> > >::_M_put_node(std::_List_node<Task*>*)
> > (stl_list.h:320)
> > ==27820==    by 0x46143B: std::list<Task*,
> > std::allocator<Task*>
> > >::_M_erase(std::_List_iterator<Task*>)
> > (stl_list.h:1431)
> > ==27820==    by 0x45FF0B: std::list<Task*,
> > std::allocator<Task*> >::pop_front()
> > (stl_list.h:906)
> > ==27820==    by 0x45DB73:
> > TaskManager::task_done(bool, std::string const&)
> > (task.cc:2256)
> > ==27820==    by 0x465970:
> > XorpMemberCallback2B0<void, TaskManager, bool,
> > std::string const&>::dispatch(bool,
> std::string
> > const&) (call
> > back_nodebug.hh:4636)
> > ==27820==    by 0x45C540: Task::step8_report()
> > (task.cc:1998)
> > ==27820==    by 0x4659DF:
> > XorpMemberCallback0B0<void, Task>::dispatch()
> > (callback_nodebug.hh:306)
> > ==27820==    by 0x449613:
> >
> Module::terminate_with_prejudice(ref_ptr<XorpCallback0<void>
> > >) (module_manager.cc:218)
> > ==27820==    by 0x44F63C:
> > XorpMemberCallback0B1<void, Module,
> > ref_ptr<XorpCallback0<void> >
> >::dispatch()
> > (callback_nodebug.hh:598)
> > ==27820==    by 0x549D72:
> > OneoffTimerNode2::expire(XorpTimer&, void*)
> > (timer.cc:167)
> > ==27820==  Address 0x50c9340 is 80 bytes inside a
> > block of size 200 alloc'd
> > ==27820==    at 0x4A06FFC: operator new(unsigned
> > long) (vg_replace_malloc.c:230)
> > ==27820==    by 0x42C81F:
> > MasterConfigTree::MasterConfigTree(std::string
> const&,
> > MasterTemplateTree*, ModuleManager&,
> XorpClient&,
> > boo
> > l, bool) (master_conf_tree.cc:119)
> > ==27820==    by 0x406ED6: Rtrmgr::run()
> > (main_rtrmgr.cc:319)
> > ==27820==    by 0x407E57: main
> > (main_rtrmgr.cc:665)
> > 
> > 
> > It appears to me that the task-manager object (this)
> is
> > already deleted when
> > the taskmanager::task_done() method is called.
> > 
> > Could probably add some debugging to the destructors
> and
> > constructors of TaskManager
> > to verify.  I have some other things to do
> first..but
> > will look at this a bit later
> > if no one beats me to it.
> > 
> > Thanks,
> > Ben
> > 
> > -- 
> > Ben Greear <greearb at candelatech.com>
> > Candela Technologies Inc  http://www.candelatech.com
> > 
> > 
> 
> 
> 
> 


      



More information about the Xorp-hackers mailing list