< Previous by Date Date Index Next by Date >
< Previous in Thread Thread Index  

Re: [reSIProcate] [Fwd: Re: [reSIProcate-users]Helper::computeCallId returns the same value]


On Fri, Nov 7, 2008 at 11:55 AM, Aron Rosenberg
<arosenberg@xxxxxxxxxxxxxx> wrote:
> Glibc rand uses a seed, how is the seed accessed or protected?
>
> -Aron

If you look at the source I linked to earlier, random() is implemented as:
long int
__random ()
{
  int32_t retval;

  __libc_lock_lock (lock);

  (void) __random_r (&unsafe_state, &retval);

  __libc_lock_unlock (lock);

  return retval;
}

So the state is in "unsafe_state" and the variable lock is used to
protect it.  the "lock cmpxchg" assembly from before is what
__libc_lock_lock(lock) compiles into.  Anywhere unsafe_state is set or
accessed, it's protected by that lock as far as I can tell.

So looking at this code, I don't see how random() itself could be
causing this problem, unless the contract on what libc_lock_lock
ensures is different than I think it is.

Bruce





>
> -----Original Message-----
> From: Bruce Lowekamp [mailto:bbl@xxxxxxxxxxxx]
> Sent: Friday, November 07, 2008 8:23 AM
> To: Aron Rosenberg
> Cc: Adam Roach; resiprocate-devel
> Subject: Re: [reSIProcate] [Fwd: Re: [reSIProcate-users]Helper::computeCallId 
> returns the same value]
>
> I'm baffled by how this could be happening.  Looking at random.o in
> libc on the fedora and ubuntu machines I have handy right now (neither
> ia64, but both smp's), the lock in random() is implemented:
>  17:   f0 0f b1 0d 00 00 00    lock cmpxchg %ecx,0x0
> my understanding is that operation is considered smp-safe.  So unless
> I'm missing something very basic about what that guarantees that
> operation provides (which is entirely possible, I don't claim to be an
> expert on low-level memory operations), I don't  understand how random
> could be causing the problem.  I don't see how anything else coming in
> from makeInviteSession could be causing it, either.  I'd be interested
> in whether a collision is seen if you logged the callId in
> BaseCreator.cxx right after computeCallId is called, but of course
> that might change behavior...
>
> There are some possible race conditions that have never been fixed in
> Condition.cxx, and it's possible to do some stupid things with
> pointers to temporaries with some of the code in rutil (Data::c_str
> being the best example), but I don't see any of that involved in
> makeInviteSession.
>
> Bruce
>
>
>
> On Thu, Nov 6, 2008 at 5:53 PM, Aron Rosenberg
> <arosenberg@xxxxxxxxxxxxxx> wrote:
>> We see the issue on a gentoo stock glibc 2.6.1 version on a dual
>> quad-core Intel server.
>>
>> -Aron
>>
>> Aron Rosenberg
>>
>> -----Original Message-----
>> From: resiprocate-devel-bounces@xxxxxxxxxxxxxxx
>> [mailto:resiprocate-devel-bounces@xxxxxxxxxxxxxxx] On Behalf Of Bruce
>> Lowekamp
>> Sent: Wednesday, November 05, 2008 11:53 AM
>> To: Adam Roach
>> Cc: resiprocate-devel
>> Subject: Re: [reSIProcate] [Fwd: Re:
>> [reSIProcate-users]Helper::computeCallId returns the same value]
>>
>> I spent a little bit of time looking at this, but it's left me more
>> confused than I was before.
>>
>> Have you determined what platforms people are actually seeing the
>> CallID problem with?  In particular, what libc are they using?  To get
>> a duplicate callid, it looks like you would have to get 4 consecutive
>> calls to random() to return the same result.  The only way I can see
>> that would happen would be if two threads run their calls in parallel
>> starting with the same state, but without sharing any updates to the
>> random state.
>>
>> With glibc, I believe this is virtually impossible.  The glibc
>> implementation of rand and random imposes a mutex around all of the
>> calls that access the static state.
>> http://sourceware.org/cgi-bin/cvsweb.cgi/libc/stdlib/random.c?rev=1.18&c
>> ontent-type=text/x-cvsweb-markup&cvsroot=glibc
>> so unless there's something I'm not seeing like a peculiar cache
>> setting being used for the lock and memory random() uses, I don't see
>> how this problem is possible there.
>>
>> Based on that, I'm wondering if a different libc implementation is
>> being used here, and the reason switching to SSL fixes the problem is
>> that the openssl implementation actually forces thread safety
>> (ssleay_rand_bytes does locking, and it ultimately is the default rand
>> function in openssl).  My conclusion would be that the right thing to
>> do is to add a mutex to getRandom() that is used if an unsafe C
>> library is being used (not entirely sure how to check for that, but
>> could probably identify a set of known-safe C libraries that can be
>> detected).  That way, the concern about other uses of Random that
>> aren't being detected goes away.
>>
>> Bruce
>>
>>
>> 2008/10/13 Adam Roach <adam@xxxxxxxxxxx>:
>>> As we've seen in the past, the Call-ID generation code that DUM uses
>>> (resip/stack/Helper.cxx:625 on head) can generate colliding Call-IDs
>> under
>>> high-load conditions. The current code looks like this:
>>>
>>>   Data
>>>   Helper::computeCallId()
>>>   {
>>>      static Data hostname = DnsUtil::getLocalHostName();
>>>      Data hostAndSalt(hostname + Random::getRandomHex(16));
>>>   #ifndef USE_SSL // .bwc. None of this is neccessary if we're using
>>>   openssl
>>>   #if defined(__linux__) || defined(__APPLE__)
>>>      pid_t pid = getpid();
>>>      hostAndSalt.append((char*)&pid,sizeof(pid));
>>>   #endif
>>>   #ifdef __APPLE__
>>>      pthread_t thread = pthread_self();
>>>      hostAndSalt.append((char*)&thread,sizeof(thread));
>>>   #endif
>>>   #ifdef WIN32
>>>      DWORD proccessId = ::GetCurrentProcessId();
>>>      DWORD threadId = ::GetCurrentThreadId();
>>>      hostAndSalt.append((char*)&proccessId,sizeof(proccessId));
>>>      hostAndSalt.append((char*)&threadId,sizeof(threadId));
>>>   #endif
>>>   #endif // of USE_SSL
>>>      return hostAndSalt.md5().base64encode(true);
>>>   }
>>>
>>> I spoke to Byron just now, and he thinks the comment about "USE_SSL"
>> is not
>>> accurate. (It would be if the code under getRandomHex() called into
>> OpenSSL
>>> -- currently, it does not).
>>>
>>> To help refresh memories, we've visited this problem in detail before,
>> most
>>> recently here:
>>>
>>> http://list.resiprocate.org/archive/resiprocate-devel/msg06605.html
>>>
>>> The conclusion of that thread left me confused -- Alan demonstrated
>> that
>>> we'll have collisions (albeit rarely) on just about any architecture,
>> and
>>> that such collisions don't require multithreading to occur. From my
>> read of
>>> things, Aron's problem (and Ilana's; see
>>> http://list.resiprocate.org/archive/resiprocate-users/msg00642.html)
>> occurs
>>> more frequently than Alan's test program.
>>>
>>> It seems to me that there are a few things we can do to try and
>> address
>>> this:
>>>
>>>  1. If we're using OpenSSL, make computeCallId call through to OpenSSL
>>>     for its random numbers (there area a few paths to get there, so
>>>     I'm just throwing out the general idea at this point).
>>>  2. Remove the "#ifndef USE_SSL" guards from computeCallId() -- is
>>>     this sufficent?
>>>  3. Do #2, but also salt in a 32-bit thread-local serial number to
>>>     prevent intra-thread collisions
>>>
>>> Thoughts? (If no one expresses an opinion in a reasonable amount of
>> time,
>>> I'll probably do #3).
>>>
>>> [It occurs to me that we must have a similar problem with tags and
>> branch
>>> IDs, albeit without any assert()s being triggered -- I would presume
>> that
>>> any fix made to Call-ID should also be made to them as well, in
>>> Helper::computeUniqueBranch() and Helper::computeTag()]
>>>
>>> /a
>>>
>>>
>>> ---------- Forwarded message ----------
>>> From: Adam Roach <adam@xxxxxxxxxxx>
>>> To: Ilana Polyak <Ilana.Polyak@xxxxxxxxxxxxxx>
>>> Date: Mon, 13 Oct 2008 09:49:28 -0500
>>> Subject: Re: [reSIProcate-users] Helper::computeCallId returns the
>> same
>>> value
>>> This issue has been previously seen, but we haven't been able to pin
>> it
>>> down.
>>>
>>> Previous reports can be found here:
>>>
>>>
>> http://list.resiprocate.org/archive/resiprocate-devel-old/msg03200.html
>>> http://list.resiprocate.org/archive/resiprocate-devel/msg06605.html
>>>
>>> Aron's solution -- shunting "getRandom" over to "getCryptoRandom" --
>> worked
>>> for him. Of course, you impose a higher load on your CPU when you do
>> so, so
>>> you may want to try tracking the problem down and addressing it in a
>> more
>>> efficient way.
>>>
>>> The problem does not seem to surface except when using DUM.
>>>
>>> /a
>>>
>>>
>>> Ilana Polyak wrote:
>>>>
>>>> Hello
>>>>
>>>> I have just started to use dum in our application and noticed that if
>> I
>>>> run calls in a very high rate the call id repeats itself?
>>>>
>>>> What am I doing wrong I have a separate thread that calls buildFdSet,
>>>> stack process and dum process. There is a semaphore before it and
>> semaphore
>>>> for all the api calls that come from my application.
>>>>
>>>> I have run a call for computeCallId from the same thread ( the thread
>> that
>>>> runs the dum and stack) and the value returned seems to be fine. But
>> when it
>>>> gets called from the api makeInviteSession which is called from the
>> context
>>>> of my application thread the value repeats it self for around 8
>> calls.
>>>>
>>>> The calls are created one after another in a very high volume. If the
>>>> calls are created in a low volume (let's say one per second)
>> everything is
>>>> fine.
>>>>
>>>> Have anyone seen this problem?
>>>>
>>>> Thanks
>>>>
>>>> **_Ilana Polyak_**
>>>>
>>>> Senior Software Engineer, Protocol Group
>>>>
>>>> Blade Business Line
>>>>
>>>> **_ _**
>>>>
>>>> **_AudioCodes USA, Inc._**
>>>>
>>>> 27 World's Fair Drive
>>>>
>>>> Somerset, NJ 08873
>>>>
>>>> Tel: 732-469-0880 ext. 137
>>>>
>>>> Fax: 732-469-2298
>>>>
>>>> Direct: 732-652-4677
>>>>
>>>> Corporate URL: http://www.audiocodes.com <http://www.audiocodes.com/>
>>>>
>>>> Blade Business Line URL: http://www.audiocodes.com/blades
>>>>
>>>> **
>>>>
>>>>
>>>>
>> ------------------------------------------------------------------------
>>>> This email and any files transmitted with it are confidential
>> material.
>>>> They are intended solely for the use of the designated individual or
>> entity
>>>> to whom they are addressed. If the reader of this message is not the
>>>> intended recipient, you are hereby notified that any dissemination,
>> use,
>>>> distribution or copying of this communication is strictly prohibited
>> and may
>>>> be unlawful.
>>>>
>>>> If you have received this email in error please immediately notify
>> the
>>>> sender and delete or destroy any copy of this message
>>>>
>> ------------------------------------------------------------------------
>>>>
>>>> _______________________________________________
>>>> resiprocate-users mailing list
>>>> resiprocate-users@xxxxxxxxxxxxxxx
>>>> List Archive: http://list.resiprocate.org/archive/resiprocate-users/
>>>
>>> _______________________________________________
>>> resiprocate-users mailing list
>>> resiprocate-users@xxxxxxxxxxxxxxx
>>> List Archive: http://list.resiprocate.org/archive/resiprocate-users/
>>>
>>> _______________________________________________
>>> resiprocate-devel mailing list
>>> resiprocate-devel@xxxxxxxxxxxxxxx
>>> https://list.resiprocate.org/mailman/listinfo/resiprocate-devel
>>>
>> _______________________________________________
>> resiprocate-devel mailing list
>> resiprocate-devel@xxxxxxxxxxxxxxx
>> https://list.resiprocate.org/mailman/listinfo/resiprocate-devel
>>
>