[reSIProcate] NULL pointer crash in DialogSet::dispatch withmAppDialogSet

Byron Campen bcampen at estacado.net
Wed Sep 26 16:04:16 CDT 2007


	So, I've given this a look, and I have a theory. Aron said that the  
crash happened sometime after the onRequestRetry() callback returned  
0...

 From ClientSubscription.cxx, starting at line 165
*snip*
       if (msg.header(h_StatusLine).statusCode() == 408)
       {
          InfoLog (<< "Received 408 to SUBSCRIBE "
                   << mLastRequest->header(h_To));
          retry = handler->onRequestRetry(getHandle(), 0, msg);
       }
       else
       {
          InfoLog (<< "Received non-408 retriable to SUBSCRIBE "
                   << mLastRequest->header(h_To));
          retry = handler->onRequestRetry(getHandle(), msg.header 
(h_RetryAfter).value(), msg);
       }

       if (retry < 0)
       {
          DebugLog(<< "Application requested failure on Retry-After");
          mEnded = true;
          handler->onTerminated(getHandle(), msg);
          delete this;
          return;
       }
       else if (retry == 0)
       {
          DebugLog(<< "Application requested immediate retry on Retry- 
After");
          //!dcm! -- why isn't this just a refresh--retry after might be
          //middle element and not indicate dialog destruction
          if (mDialog.mRemoteTarget.uri().host().empty())
          {
             SharedPtr<SipMessage> sub = mDum.makeSubscription 
(mLastRequest->header(h_To), getEventType());
             mDum.send(sub);
          }
          else
          {
             SharedPtr<SipMessage> sub = mDum.makeSubscription 
(mDialog.mRemoteTarget, getEventType(), getAppDialogSet()->reuse());
             mDum.send(sub);
          }
          //!dcm! -- new sub created above, when does this usage get  
destroyed?
          //return;
       }
       else
       {
          // leave the usage around until the timeout
          // !dlb! would be nice to set the state to something dead,  
but not used
          mDum.addTimer(DumTimeout::SubscriptionRetry,
                        retry,
                        getBaseHandle(),
                        ++mTimerSeq);
          // leave the usage around until the timeout
          return;
       }

       delete this;
       return;

*snip*

	So, returning 0 on the onRequestRetry() callback causes the  
AppDialogSet* to be unset (See impl of AppDialogSet::reuse()), and  
the ClientSubscription to be deleted. Then, in ~DialogUsage() (from  
which ClientSubscription is derived)...

*snip*
DialogUsage::~DialogUsage()
{
    mDialog.possiblyDie();
}
*snip*

When the Dialog gets this call, it calls DialogUsageManager::destroy 
() on itself, like so:

*snip*
void Dialog::possiblyDie()
{
    // !slg! Note:  dialogs should really stick around for 32s, in  
order to ensure that all 2xx retransmissions get Ack'd, then BYE'd  
correctly
    if (!mDestroying)
    {
       if (mClientSubscriptions.empty() &&
           mServerSubscriptions.empty() &&
           !mInviteSession)
       {
          mDestroying = true;
          mDum.destroy(this);
       }
    }
}
*snip*

Which in turn causes the following to happen:

*snip*
void
DialogUsageManager::destroy(Dialog* d)
{
    if (mShutdownState != Destroying)
    {
       post(new DestroyUsage(d));
    }
    else
    {
       InfoLog(<< "DialogUsageManager::destroy() not posting to stack");
    }
}
*snip*

	So, we asynchronously schedule the destruction of the Dialog. The  
DialogSet is still around. If some messaging comes in, and beats the  
DestroyUsage message to the front of the queue (not an unlikely  
scenario at all, and made much more likely by sitting at a  
breakpoint), the messaging will be sent to the DialogSet, which will  
then core in exactly the manner you're seeing.

	Ultimately, if the AppDialogSet is reused, we have to be completely  
sure that the DialogSet will be gone before any further messaging can  
arrive.

	On top of this, we have the issue of what happens when we have more  
than one Dialog in this DialogSet (say, the SUBSCRIBE got forked). If  
one of the forks decides it needs to start a new subscription in the  
manner above, the other subscriptions in the DialogSet will find  
themselves in trouble very shortly. This isn't even a race. It's  
basically guaranteed to fail.

	My inclination is to turn AppDialogSet::reuse() into a clone(), or  
into a call to the AppDialogSetFactory. (Or, we need to just be  
tolerant if the AppDialogSet is null. But I imagine we have opened a  
rather large can or worms here, and there are probably other nasty  
race-conditions that we'll encounter as a result of the non-atomicity  
of DialogSet teardown.)

Best regards,
Byron Campen

> It’s possible you’ve exposed a race condition by using the debugger  
> – I can’t think of what it would be offhand though. : (  The Debug  
> log should provide more insight.
>
>
>
> Scott
>
>
>
> From: Aron Rosenberg [mailto:arosenberg at sightspeed.com]
> Sent: Friday, July 27, 2007 2:48 PM
> To: Scott Godin; resiprocate-devel at list.resiprocate.org
> Subject: RE: [reSIProcate] NULL pointer crash in  
> DialogSet::dispatch withmAppDialogSet
>
>
>
> One thing that might be going on is that when I sit in a  
> breakpoint, a 408 transaction is created by the resip stack because  
> of the “timeout”. Could this because the reason for the crash,  
> onRemoved gets called, but a 408 is still active…
>
>
>
> -Aron
>
>
>
> From: Scott Godin [mailto:slgodin at icescape.com]
> Sent: Friday, July 27, 2007 10:24 AM
> To: Aron Rosenberg; resiprocate-devel at list.resiprocate.org
> Subject: RE: [reSIProcate] NULL pointer crash in  
> DialogSet::dispatch withmAppDialogSet
>
>
>
> Are you calling all DUM API’s from the same thread (including the  
> process loop)?
>
>
>
> From: Aron Rosenberg [mailto:arosenberg at sightspeed.com]
> Sent: Friday, July 27, 2007 12:39 PM
> To: Scott Godin; resiprocate-devel at list.resiprocate.org
> Subject: RE: [reSIProcate] NULL pointer crash in  
> DialogSet::dispatch withmAppDialogSet
>
>
>
> DUM controls everything. This crash only happens if I have hit a  
> breakpoint and let it sit for a couple seconds before continuing.  
> For example, If I have a breakpoint in the onTerminated or  
> onRequestRetry for ClientSubscriptionHandler implementation. It  
> will crash after the continue. This never happens except in debug  
> mode and only if breakpoints were used on the resiprocate thread.
>
>
>
> -Aron
>
>
>
> From: Scott Godin [mailto:slgodin at icescape.com]
> Sent: Friday, July 27, 2007 6:48 AM
> To: Aron Rosenberg; resiprocate-devel at list.resiprocate.org
> Subject: RE: [reSIProcate] NULL pointer crash in  
> DialogSet::dispatch withmAppDialogSet
>
>
>
> If the DialogSet is still around the AppDialogSet should also be  
> around.  Are you deleting your AppDialogSet instances or letting  
> DUM control their lifetime?
>
>
>
> Scott
>
>
>
> From: resiprocate-devel-bounces at list.resiprocate.org  
> [mailto:resiprocate-devel-bounces at list.resiprocate.org] On Behalf  
> Of Aron Rosenberg
> Sent: Thursday, July 26, 2007 12:20 PM
> To: resiprocate-devel at list.resiprocate.org
> Subject: [reSIProcate] NULL pointer crash in DialogSet::dispatch  
> withmAppDialogSet
>
>
>
> So I have been debugging some resip issues with ClientSubscriptions  
> and keep getting a NULL pointer crash after I have been sitting at  
> some breakpoints in the code. The crash occurs in:
>
>
>
> DialogSet.cxx line 775
>
>       DebugLog ( << "### Calling CreateAppDialog ###: " <<  
> std::endl << std::endl <<msg);
>
> --->  AppDialog* appDialog = mAppDialogSet->createAppDialog(msg);
>
>       dialog->mAppDialog = appDialog;
>
>       appDialog->mDialog = dialog;
>
>       dialog->dispatch(msg);
>
>
>
> mAppDialogSet is NULL. The call stack for this crash is as follows:
>
>
>
> >          resip::DialogSet::dispatch(const resip::SipMessage & msg= 
> {...})  Line 775 + 0x1c  C++
>
>             resip::DialogUsageManager::processRequest(const  
> resip::SipMessage & request={...})  Line 1560   C++
>
>             resip::DialogUsageManager::incomingProcess 
> (std::auto_ptr<resip::Message> msg={...})  Line 1281 C++
>
>             resip::DialogUsageManager::internalProcess 
> (std::auto_ptr<resip::Message> msg={...})  Line 1134   C++
>
>             resip::DialogUsageManager::process()  Line 1300 + 0x49   
> C++
>
>             SipEP::run()  Line 2404 + 0xb     C++ // Is our thread  
> loop
>
>
>
>
>
> mState == Established
>
> msg is a NOTIFY
>
>
>
> The crash happens after I have requested the ->end() of the  
> ClientSubscription and a NOTIFY comes in from the wire and I call  
> rejectUpdate(404) since the user has “ended” the Client  
> Subscription. Then I usually get a call to onRequestRetry where we  
> return 0. The crash generally occurs after that. I will try to get  
> a resip debug log up to the crash.
>
>
>
> Does this appear to be a resip issue or are we improperly  
> implementing our own AppDialog and AppDialogSet instances.
>
>
>
> -Aron
>
>
>
>
>
> ---------------------------------------------
>
> Aron Rosenberg
>
> Founder and CTO
>
> SightSpeed - http://www.sightspeed.com/
>
>
>
> 918 Parker St, Suite A14
>
> Berkeley, CA 94710
>
>
>
> Email: arosenberg at sightspeed.com
>
> Phone: 510-665-2920
>
> Cell: 510-847-7389
>
> Fax: 510-649-9569
>
> SightSpeed Video Link: http://aron.sightspeed.com
>
>
>
>
>
>
>
> _______________________________________________
> resiprocate-devel mailing list
> resiprocate-devel at list.resiprocate.org
> https://list.resiprocate.org/mailman/listinfo/resiprocate-devel

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://list.resiprocate.org/pipermail/resiprocate-devel/attachments/20070926/dff727ab/attachment.htm>


More information about the resiprocate-devel mailing list