Re: [reSIProcate] [Fwd: Re: [reSIProcate-users]Helper::computeCallId returns the same value]
On Fri, Nov 7, 2008 at 11:55 AM, Aron Rosenberg
<arosenberg@xxxxxxxxxxxxxx> wrote:
> Glibc rand uses a seed, how is the seed accessed or protected?
>
> -Aron
If you look at the source I linked to earlier, random() is implemented as:
long int
__random ()
{
int32_t retval;
__libc_lock_lock (lock);
(void) __random_r (&unsafe_state, &retval);
__libc_lock_unlock (lock);
return retval;
}
So the state is in "unsafe_state" and the variable lock is used to
protect it. the "lock cmpxchg" assembly from before is what
__libc_lock_lock(lock) compiles into. Anywhere unsafe_state is set or
accessed, it's protected by that lock as far as I can tell.
So looking at this code, I don't see how random() itself could be
causing this problem, unless the contract on what libc_lock_lock
ensures is different than I think it is.
Bruce
>
> -----Original Message-----
> From: Bruce Lowekamp [mailto:bbl@xxxxxxxxxxxx]
> Sent: Friday, November 07, 2008 8:23 AM
> To: Aron Rosenberg
> Cc: Adam Roach; resiprocate-devel
> Subject: Re: [reSIProcate] [Fwd: Re: [reSIProcate-users]Helper::computeCallId
> returns the same value]
>
> I'm baffled by how this could be happening. Looking at random.o in
> libc on the fedora and ubuntu machines I have handy right now (neither
> ia64, but both smp's), the lock in random() is implemented:
> 17: f0 0f b1 0d 00 00 00 lock cmpxchg %ecx,0x0
> my understanding is that operation is considered smp-safe. So unless
> I'm missing something very basic about what that guarantees that
> operation provides (which is entirely possible, I don't claim to be an
> expert on low-level memory operations), I don't understand how random
> could be causing the problem. I don't see how anything else coming in
> from makeInviteSession could be causing it, either. I'd be interested
> in whether a collision is seen if you logged the callId in
> BaseCreator.cxx right after computeCallId is called, but of course
> that might change behavior...
>
> There are some possible race conditions that have never been fixed in
> Condition.cxx, and it's possible to do some stupid things with
> pointers to temporaries with some of the code in rutil (Data::c_str
> being the best example), but I don't see any of that involved in
> makeInviteSession.
>
> Bruce
>
>
>
> On Thu, Nov 6, 2008 at 5:53 PM, Aron Rosenberg
> <arosenberg@xxxxxxxxxxxxxx> wrote:
>> We see the issue on a gentoo stock glibc 2.6.1 version on a dual
>> quad-core Intel server.
>>
>> -Aron
>>
>> Aron Rosenberg
>>
>> -----Original Message-----
>> From: resiprocate-devel-bounces@xxxxxxxxxxxxxxx
>> [mailto:resiprocate-devel-bounces@xxxxxxxxxxxxxxx] On Behalf Of Bruce
>> Lowekamp
>> Sent: Wednesday, November 05, 2008 11:53 AM
>> To: Adam Roach
>> Cc: resiprocate-devel
>> Subject: Re: [reSIProcate] [Fwd: Re:
>> [reSIProcate-users]Helper::computeCallId returns the same value]
>>
>> I spent a little bit of time looking at this, but it's left me more
>> confused than I was before.
>>
>> Have you determined what platforms people are actually seeing the
>> CallID problem with? In particular, what libc are they using? To get
>> a duplicate callid, it looks like you would have to get 4 consecutive
>> calls to random() to return the same result. The only way I can see
>> that would happen would be if two threads run their calls in parallel
>> starting with the same state, but without sharing any updates to the
>> random state.
>>
>> With glibc, I believe this is virtually impossible. The glibc
>> implementation of rand and random imposes a mutex around all of the
>> calls that access the static state.
>> http://sourceware.org/cgi-bin/cvsweb.cgi/libc/stdlib/random.c?rev=1.18&c
>> ontent-type=text/x-cvsweb-markup&cvsroot=glibc
>> so unless there's something I'm not seeing like a peculiar cache
>> setting being used for the lock and memory random() uses, I don't see
>> how this problem is possible there.
>>
>> Based on that, I'm wondering if a different libc implementation is
>> being used here, and the reason switching to SSL fixes the problem is
>> that the openssl implementation actually forces thread safety
>> (ssleay_rand_bytes does locking, and it ultimately is the default rand
>> function in openssl). My conclusion would be that the right thing to
>> do is to add a mutex to getRandom() that is used if an unsafe C
>> library is being used (not entirely sure how to check for that, but
>> could probably identify a set of known-safe C libraries that can be
>> detected). That way, the concern about other uses of Random that
>> aren't being detected goes away.
>>
>> Bruce
>>
>>
>> 2008/10/13 Adam Roach <adam@xxxxxxxxxxx>:
>>> As we've seen in the past, the Call-ID generation code that DUM uses
>>> (resip/stack/Helper.cxx:625 on head) can generate colliding Call-IDs
>> under
>>> high-load conditions. The current code looks like this:
>>>
>>> Data
>>> Helper::computeCallId()
>>> {
>>> static Data hostname = DnsUtil::getLocalHostName();
>>> Data hostAndSalt(hostname + Random::getRandomHex(16));
>>> #ifndef USE_SSL // .bwc. None of this is neccessary if we're using
>>> openssl
>>> #if defined(__linux__) || defined(__APPLE__)
>>> pid_t pid = getpid();
>>> hostAndSalt.append((char*)&pid,sizeof(pid));
>>> #endif
>>> #ifdef __APPLE__
>>> pthread_t thread = pthread_self();
>>> hostAndSalt.append((char*)&thread,sizeof(thread));
>>> #endif
>>> #ifdef WIN32
>>> DWORD proccessId = ::GetCurrentProcessId();
>>> DWORD threadId = ::GetCurrentThreadId();
>>> hostAndSalt.append((char*)&proccessId,sizeof(proccessId));
>>> hostAndSalt.append((char*)&threadId,sizeof(threadId));
>>> #endif
>>> #endif // of USE_SSL
>>> return hostAndSalt.md5().base64encode(true);
>>> }
>>>
>>> I spoke to Byron just now, and he thinks the comment about "USE_SSL"
>> is not
>>> accurate. (It would be if the code under getRandomHex() called into
>> OpenSSL
>>> -- currently, it does not).
>>>
>>> To help refresh memories, we've visited this problem in detail before,
>> most
>>> recently here:
>>>
>>> http://list.resiprocate.org/archive/resiprocate-devel/msg06605.html
>>>
>>> The conclusion of that thread left me confused -- Alan demonstrated
>> that
>>> we'll have collisions (albeit rarely) on just about any architecture,
>> and
>>> that such collisions don't require multithreading to occur. From my
>> read of
>>> things, Aron's problem (and Ilana's; see
>>> http://list.resiprocate.org/archive/resiprocate-users/msg00642.html)
>> occurs
>>> more frequently than Alan's test program.
>>>
>>> It seems to me that there are a few things we can do to try and
>> address
>>> this:
>>>
>>> 1. If we're using OpenSSL, make computeCallId call through to OpenSSL
>>> for its random numbers (there area a few paths to get there, so
>>> I'm just throwing out the general idea at this point).
>>> 2. Remove the "#ifndef USE_SSL" guards from computeCallId() -- is
>>> this sufficent?
>>> 3. Do #2, but also salt in a 32-bit thread-local serial number to
>>> prevent intra-thread collisions
>>>
>>> Thoughts? (If no one expresses an opinion in a reasonable amount of
>> time,
>>> I'll probably do #3).
>>>
>>> [It occurs to me that we must have a similar problem with tags and
>> branch
>>> IDs, albeit without any assert()s being triggered -- I would presume
>> that
>>> any fix made to Call-ID should also be made to them as well, in
>>> Helper::computeUniqueBranch() and Helper::computeTag()]
>>>
>>> /a
>>>
>>>
>>> ---------- Forwarded message ----------
>>> From: Adam Roach <adam@xxxxxxxxxxx>
>>> To: Ilana Polyak <Ilana.Polyak@xxxxxxxxxxxxxx>
>>> Date: Mon, 13 Oct 2008 09:49:28 -0500
>>> Subject: Re: [reSIProcate-users] Helper::computeCallId returns the
>> same
>>> value
>>> This issue has been previously seen, but we haven't been able to pin
>> it
>>> down.
>>>
>>> Previous reports can be found here:
>>>
>>>
>> http://list.resiprocate.org/archive/resiprocate-devel-old/msg03200.html
>>> http://list.resiprocate.org/archive/resiprocate-devel/msg06605.html
>>>
>>> Aron's solution -- shunting "getRandom" over to "getCryptoRandom" --
>> worked
>>> for him. Of course, you impose a higher load on your CPU when you do
>> so, so
>>> you may want to try tracking the problem down and addressing it in a
>> more
>>> efficient way.
>>>
>>> The problem does not seem to surface except when using DUM.
>>>
>>> /a
>>>
>>>
>>> Ilana Polyak wrote:
>>>>
>>>> Hello
>>>>
>>>> I have just started to use dum in our application and noticed that if
>> I
>>>> run calls in a very high rate the call id repeats itself?
>>>>
>>>> What am I doing wrong I have a separate thread that calls buildFdSet,
>>>> stack process and dum process. There is a semaphore before it and
>> semaphore
>>>> for all the api calls that come from my application.
>>>>
>>>> I have run a call for computeCallId from the same thread ( the thread
>> that
>>>> runs the dum and stack) and the value returned seems to be fine. But
>> when it
>>>> gets called from the api makeInviteSession which is called from the
>> context
>>>> of my application thread the value repeats it self for around 8
>> calls.
>>>>
>>>> The calls are created one after another in a very high volume. If the
>>>> calls are created in a low volume (let's say one per second)
>> everything is
>>>> fine.
>>>>
>>>> Have anyone seen this problem?
>>>>
>>>> Thanks
>>>>
>>>> **_Ilana Polyak_**
>>>>
>>>> Senior Software Engineer, Protocol Group
>>>>
>>>> Blade Business Line
>>>>
>>>> **_ _**
>>>>
>>>> **_AudioCodes USA, Inc._**
>>>>
>>>> 27 World's Fair Drive
>>>>
>>>> Somerset, NJ 08873
>>>>
>>>> Tel: 732-469-0880 ext. 137
>>>>
>>>> Fax: 732-469-2298
>>>>
>>>> Direct: 732-652-4677
>>>>
>>>> Corporate URL: http://www.audiocodes.com <http://www.audiocodes.com/>
>>>>
>>>> Blade Business Line URL: http://www.audiocodes.com/blades
>>>>
>>>> **
>>>>
>>>>
>>>>
>> ------------------------------------------------------------------------
>>>> This email and any files transmitted with it are confidential
>> material.
>>>> They are intended solely for the use of the designated individual or
>> entity
>>>> to whom they are addressed. If the reader of this message is not the
>>>> intended recipient, you are hereby notified that any dissemination,
>> use,
>>>> distribution or copying of this communication is strictly prohibited
>> and may
>>>> be unlawful.
>>>>
>>>> If you have received this email in error please immediately notify
>> the
>>>> sender and delete or destroy any copy of this message
>>>>
>> ------------------------------------------------------------------------
>>>>
>>>> _______________________________________________
>>>> resiprocate-users mailing list
>>>> resiprocate-users@xxxxxxxxxxxxxxx
>>>> List Archive: http://list.resiprocate.org/archive/resiprocate-users/
>>>
>>> _______________________________________________
>>> resiprocate-users mailing list
>>> resiprocate-users@xxxxxxxxxxxxxxx
>>> List Archive: http://list.resiprocate.org/archive/resiprocate-users/
>>>
>>> _______________________________________________
>>> resiprocate-devel mailing list
>>> resiprocate-devel@xxxxxxxxxxxxxxx
>>> https://list.resiprocate.org/mailman/listinfo/resiprocate-devel
>>>
>> _______________________________________________
>> resiprocate-devel mailing list
>> resiprocate-devel@xxxxxxxxxxxxxxx
>> https://list.resiprocate.org/mailman/listinfo/resiprocate-devel
>>
>