[reSIProcate] Random.cxx and MultiCore systems

Alan Hawrylyshen alan at polyphase.ca
Wed Mar 19 15:04:21 CDT 2008


Can we boil it down to a simple test driver that can confirm random()  
is returning the same value twice?
A


--
Sorry this is terse; it is from my handheld device.

On 19 Mar 2008, at 12:55, Byron Campen <bcampen at estacado.net> wrote:

> 	Upon some discussion, it seems that this is happening _in a single  
> thread_, so using a thread-id will not help with this specific  
> problem (although it is probably a good idea anyway). It does seem  
> likely that we are running into a caching problem, although I am not  
> sure what can be done about this.
>
> 	Anyone have any ideas?
>
> Best regards,
> Byron Campen
>
>
>> More to Byron's point. If you substitute the crypto random  
>> function, you will notice a significant increase in latency. A  
>> pseudo random cheap solution that doesn't exhibit this erect is  
>> needed.  We stumbled across something similar years ago when we  
>> ported to a then exotic dual processor machine. The problem  
>> occurred very rarely then but with 4+ cores and todays speeds I  
>> believe the problem will happen predictably as you have seen Aron.
>>
>> Alan
>> --
>> Sorry this is terse; it is from my handheld device.
>>
>> On 19 Mar 2008, at 10:14, Byron Campen <bcampen at estacado.net> wrote:
>>
>>> 	I wouldn't re-implement Random::getRandom() with getCryptoRandom 
>>> (), since the contract on it is for providing cheap, pseudo-random  
>>> numbers. It would be more reasonable to change the code that  
>>> generates transaction-ids and tags (in fact, the code that  
>>> generates Call-Ids has been tweaked to help with this very problem  
>>> that you're seeing). The tweak in the Call-Id generation code  
>>> involves throwing the thread-id into the generated bits, which  
>>> solves the collision issue you're seeing. Maybe we could alter  
>>> Random::getRandom() to xor the current thread-id with everything  
>>> it returned (this would be in-keeping with "cheap, pseudo-random  
>>> numbers")? Or maybe we could add a Random::getRandomReentrant()  
>>> function?
>>>
>>> 	Anyone have an opinion on this?
>>>
>>> Best regards,
>>> Byron Campen
>>>
>>>> So this bug report concerns a very strange issue that we noticed  
>>>> on our brandnew Dual Quad Core machine (8 cpu’s) involving dup 
>>>> licate Call-Id’s, Transaction-ID’s and Tag’s being generated  
>>>> for independent INVITE’s. This behavior would then result in a 
>>>> ssert failures all over the stack.
>>>>
>>>>
>>>>
>>>> We have a single instance of DUM/Resiprocate running on its own  
>>>> thread. Our application generates 4 independent INVITE requests  
>>>> at the same exact time which results in sequential calls  
>>>> eventually being made to Random.cxx and then glibc’s random()  
>>>> function. Of the four calls we get the following random values 
>>>>  returned
>>>>
>>>>
>>>>
>>>> Call 1: aaaaaaaaaaa
>>>>
>>>> Call 2: bbbbbbbbbb
>>>>
>>>> Call 3: aaaaaaaaaaa   (same exact sequence of random values as  
>>>> the first call)
>>>>
>>>> Call 4: bbbbbbbbbb  (same exact sequence of random values as the  
>>>> second call)
>>>>
>>>>
>>>>
>>>> Sometime later, various assert failures would occur due to  
>>>> duplicate TID values and all sorts of other issues.
>>>>
>>>>
>>>>
>>>> If pause or sleep the thread for 1 MS then the the problem  
>>>> disappears. So what the heck is going on….
>>>>
>>>>
>>>>
>>>> We think that DUM thread is being migrated across CPU’s betwee 
>>>> n the different invocations of glibc’s random() function and t 
>>>> he “seed” value is stale in a one of the CPU caches.
>>>>
>>>>
>>>>
>>>> So how do we fix this – When we dug into the resiprocate Rando 
>>>> m.cxx code we noticed that although we had linked against Open 
>>>> SSL, the OpenSSL random functions were not being used at all.  
>>>> They would be used to initialize the seed but not used to actu 
>>>> ally generate the random values.
>>>>
>>>>
>>>>
>>>> If we used the crypto versions of the functions the repeatedness  
>>>> issue went away completely.
>>>>
>>>>
>>>>
>>>> Here is a small patch which will use the crypto version if  
>>>> USE_OPENSSL is defined
>>>>
>>>>
>>>>
>>>> --- rutil/Random.cxx.orig              2008-03-14 23:21:29.000000000 -0700
>>>>
>>>> +++ rutil/Random.cxx    2008-03-15 00:26:59.000000000 -0700
>>>>
>>>> @@ -149,8 +149,9 @@
>>>>
>>>>  Random::getRandom()
>>>>
>>>>  {
>>>>
>>>>     initialize();
>>>>
>>>> -
>>>>
>>>> -#ifdef WIN32
>>>>
>>>> +#if USE_OPENSSL
>>>>
>>>> +   return getCryptoRandom();
>>>>
>>>> +#elif WIN32
>>>>
>>>>     assert( RAND_MAX == 0x7fff );
>>>>
>>>>     int r1 = rand();
>>>>
>>>>     int r2 = rand();
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> -Aron
>>>>
>>>>
>>>>
>>>> Aron Rosenberg
>>>>
>>>> SightSpeed
>>>>
>>>> _______________________________________________
>>>> resiprocate-devel mailing list
>>>> resiprocate-devel at resiprocate.org
>>>> https://list.re
>>> _______________________________________________
>>> resiprocate-devel mailing list
>>> resiprocate-devel at resiprocate.org
>>> http
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://list.resiprocate.org/pipermail/resiprocate-devel/attachments/20080319/c96f9bb7/attachment.htm>


More information about the resiprocate-devel mailing list