< Previous by Date Date Index Next by Date >
< Previous in Thread Thread Index Next in Thread >

Re: [reSIProcate] NULL pointer crash in DialogSet::dispatch withmAppDialogSet


I given this a read through once, but I need more time to understand what the issue is.  Hopefully I will get some time this week to think about it.  : )

 

From: Byron Campen [mailto:bcampen@xxxxxxxxxxxx]
Sent: October 1, 2007 6:31 PM
To: resiprocate-devel
Cc: Scott Godin
Subject: Re: [reSIProcate] NULL pointer crash in DialogSet::dispatch withmAppDialogSet

 

            No comments? So, I take this to mean that nobody would be sore at me if I were to remove every use of AppDialogSet::reuse() that looks suspicious to me (ie, all of them)? Pipe up!

 

Best regards,

Byron Campen

 

 



            So, I've given this a look, and I have a theory. Aron said that the crash happened sometime after the onRequestRetry() callback returned 0...

 

From ClientSubscription.cxx, starting at line 165

*snip*

      if (msg.header(h_StatusLine).statusCode() == 408)

      {

         InfoLog (<< "Received 408 to SUBSCRIBE "

                  << mLastRequest->header(h_To));

         retry = handler->onRequestRetry(getHandle(), 0, msg);

      }

      else

      {

         InfoLog (<< "Received non-408 retriable to SUBSCRIBE "

                  << mLastRequest->header(h_To));

         retry = handler->onRequestRetry(getHandle(), msg.header(h_RetryAfter).value(), msg);

      }

 

      if (retry < 0)

      {

         DebugLog(<< "Application requested failure on Retry-After");

         mEnded = true;

         handler->onTerminated(getHandle(), msg);

         delete this;

         return;

      }

      else if (retry == 0)

      {

         DebugLog(<< "Application requested immediate retry on Retry-After");

         //!dcm! -- why isn't this just a refresh--retry after might be

         //middle element and not indicate dialog destruction      

         if (mDialog.mRemoteTarget.uri().host().empty())

         {

            SharedPtr<SipMessage> sub = mDum.makeSubscription(mLastRequest->header(h_To), getEventType());

            mDum.send(sub);

         }

         else

         {

            SharedPtr<SipMessage> sub = mDum.makeSubscription(mDialog.mRemoteTarget, getEventType(), getAppDialogSet()->reuse());

            mDum.send(sub);

         }

         //!dcm! -- new sub created above, when does this usage get destroyed?

         //return;

      }

      else

      {

         // leave the usage around until the timeout

         // !dlb! would be nice to set the state to something dead, but not used

         mDum.addTimer(DumTimeout::SubscriptionRetry,

                       retry,

                       getBaseHandle(),

                       ++mTimerSeq);

         // leave the usage around until the timeout

         return;

      }

           

      delete this;

      return;

 

*snip*

 

     So, returning 0 on the onRequestRetry() callback causes the AppDialogSet* to be unset (See impl of AppDialogSet::reuse()), and the ClientSubscription to be deleted. Then, in ~DialogUsage() (from which ClientSubscription is derived)...

 

*snip*

DialogUsage::~DialogUsage()

{

   mDialog.possiblyDie();

}

*snip*

 

When the Dialog gets this call, it calls DialogUsageManager::destroy() on itself, like so:

 

*snip*

void Dialog::possiblyDie()

{

   // !slg! Note:  dialogs should really stick around for 32s, in order to ensure that all 2xx retransmissions get Ack'd, then BYE'd correctly

   if (!mDestroying)

   {

      if (mClientSubscriptions.empty() &&

          mServerSubscriptions.empty() &&

          !mInviteSession)

      {

         mDestroying = true;

         mDum.destroy(this);

      }

   }

}

*snip*

 

Which in turn causes the following to happen:

 

*snip*

void

DialogUsageManager::destroy(Dialog* d)

{

   if (mShutdownState != Destroying)

   {

      post(new DestroyUsage(d));

   }

   else

   {

      InfoLog(<< "DialogUsageManager::destroy() not posting to stack");

   }

}

*snip*

 

     So, we asynchronously schedule the destruction of the Dialog. The DialogSet is still around. If some messaging comes in, and beats the DestroyUsage message to the front of the queue (not an unlikely scenario at all, and made much more likely by sitting at a breakpoint), the messaging will be sent to the DialogSet, which will then core in exactly the manner you're seeing.

 

     Ultimately, if the AppDialogSet is reused, we have to be completely sure that the DialogSet will be gone before any further messaging can arrive.

 

     On top of this, we have the issue of what happens when we have more than one Dialog in this DialogSet (say, the SUBSCRIBE got forked). If one of the forks decides it needs to start a new subscription in the manner above, the other subscriptions in the DialogSet will find themselves in trouble very shortly. This isn't even a race. It's basically guaranteed to fail.

 

     My inclination is to turn AppDialogSet::reuse() into a clone(), or into a call to the AppDialogSetFactory. (Or, we need to just be tolerant if the AppDialogSet is null. But I imagine we have opened a rather large can or worms here, and there are probably other nasty race-conditions that we'll encounter as a result of the non-atomicity of DialogSet teardown.)

 

Best regards,

Byron Campen



It’s possible you’ve exposed a race condition by using the debugger – I can’t think of what it would be offhand though. : (  The Debug log should provide more insight.

 

Scott

 

From: Aron Rosenberg [mailto:arosenberg@xxxxxxxxxxxxxx]
Sent: Friday, July 27, 2007 2:48 PM
To: Scott Godin; resiprocate-devel@xxxxxxxxxxxxxxxxxxxx
Subject: RE: [reSIProcate] NULL pointer crash in DialogSet::dispatch withmAppDialogSet

 

One thing that might be going on is that when I sit in a breakpoint, a 408 transaction is created by the resip stack because of the “timeout”. Could this because the reason for the crash, onRemoved gets called, but a 408 is still active…

 

-Aron

 


From: Scott Godin [mailto:slgodin@xxxxxxxxxxxx]
Sent: Friday, July 27, 2007 10:24 AM
To: Aron Rosenberg; resiprocate-devel@xxxxxxxxxxxxxxxxxxxx
Subject: RE: [reSIProcate] NULL pointer crash in DialogSet::dispatch withmAppDialogSet

 

Are you calling all DUM API’s from the same thread (including the process loop)?

 

From: Aron Rosenberg [mailto:arosenberg@xxxxxxxxxxxxxx]
Sent: Friday, July 27, 2007 12:39 PM
To: Scott Godin; resiprocate-devel@xxxxxxxxxxxxxxxxxxxx
Subject: RE: [reSIProcate] NULL pointer crash in DialogSet::dispatch withmAppDialogSet

 

DUM controls everything. This crash only happens if I have hit a breakpoint and let it sit for a couple seconds before continuing. For example, If I have a breakpoint in the onTerminated or onRequestRetry for ClientSubscriptionHandler implementation. It will crash after the continue. This never happens except in debug mode and only if breakpoints were used on the resiprocate thread.

 

-Aron

 


From: Scott Godin [mailto:slgodin@xxxxxxxxxxxx]
Sent: Friday, July 27, 2007 6:48 AM
To: Aron Rosenberg; resiprocate-devel@xxxxxxxxxxxxxxxxxxxx
Subject: RE: [reSIProcate] NULL pointer crash in DialogSet::dispatch withmAppDialogSet

 

If the DialogSet is still around the AppDialogSet should also be around.  Are you deleting your AppDialogSet instances or letting DUM control their lifetime?

 

Scott

 

From: resiprocate-devel-bounces@xxxxxxxxxxxxxxxxxxxx [mailto:resiprocate-devel-bounces@xxxxxxxxxxxxxxxxxxxx] On Behalf Of Aron Rosenberg
Sent: Thursday, July 26, 2007 12:20 PM
To: resiprocate-devel@xxxxxxxxxxxxxxxxxxxx
Subject: [reSIProcate] NULL pointer crash in DialogSet::dispatch withmAppDialogSet

 

So I have been debugging some resip issues with ClientSubscriptions and keep getting a NULL pointer crash after I have been sitting at some breakpoints in the code. The crash occurs in:

 

DialogSet.cxx line 775

      DebugLog ( << "### Calling CreateAppDialog ###: " << std::endl << std::endl <<msg);

--->  AppDialog* appDialog = mAppDialogSet->createAppDialog(msg);

      dialog->mAppDialog = appDialog;

      appDialog->mDialog = dialog;

      dialog->dispatch(msg);

 

mAppDialogSet is NULL. The call stack for this crash is as follows:

 

>          resip::DialogSet::dispatch(const resip::SipMessage & msg={...})  Line 775 + 0x1c  C++

            resip::DialogUsageManager::processRequest(const resip::SipMessage & request={...})  Line 1560   C++

            resip::DialogUsageManager::incomingProcess(std::auto_ptr<resip::Message> msg={...})  Line 1281 C++

            resip::DialogUsageManager::internalProcess(std::auto_ptr<resip::Message> msg={...})  Line 1134   C++

            resip::DialogUsageManager::process()  Line 1300 + 0x49  C++

            SipEP::run()  Line 2404 + 0xb     C++ // Is our thread loop

 

 

mState == Established

msg is a NOTIFY

 

The crash happens after I have requested the ->end() of the ClientSubscription and a NOTIFY comes in from the wire and I call rejectUpdate(404) since the user has “ended” the Client Subscription. Then I usually get a call to onRequestRetry where we return 0. The crash generally occurs after that. I will try to get a resip debug log up to the crash.

 

Does this appear to be a resip issue or are we improperly implementing our own AppDialog and AppDialogSet instances.

 

-Aron

 

 

---------------------------------------------

Aron Rosenberg

Founder and CTO

SightSpeed - http://www.sightspeed.com/

 

918 Parker St, Suite A14

Berkeley, CA 94710

 

Email: arosenberg@xxxxxxxxxxxxxx

Phone: 510-665-2920

Cell: 510-847-7389

Fax: 510-649-9569

SightSpeed Video Link: http://aron.sightspeed.com

 

 

 

_______________________________________________

resiprocate-devel mailing list

 

_______________________________________________

resiprocate-devel mailing list