[reSIProcate] Random.cxx and MultiCore systems

Byron Campen bcampen at estacado.net
Wed Mar 19 14:55:10 CDT 2008


	Upon some discussion, it seems that this is happening _in a single  
thread_, so using a thread-id will not help with this specific  
problem (although it is probably a good idea anyway). It does seem  
likely that we are running into a caching problem, although I am not  
sure what can be done about this.

	Anyone have any ideas?

Best regards,
Byron Campen


> More to Byron's point. If you substitute the crypto random  
> function, you will notice a significant increase in latency. A  
> pseudo random cheap solution that doesn't exhibit this erect is  
> needed.  We stumbled across something similar years ago when we  
> ported to a then exotic dual processor machine. The problem  
> occurred very rarely then but with 4+ cores and todays speeds I  
> believe the problem will happen predictably as you have seen Aron.
>
> Alan
> --
> Sorry this is terse; it is from my handheld device.
>
> On 19 Mar 2008, at 10:14, Byron Campen <bcampen at estacado.net> wrote:
>
>> 	I wouldn't re-implement Random::getRandom() with getCryptoRandom 
>> (), since the contract on it is for providing cheap, pseudo-random  
>> numbers. It would be more reasonable to change the code that  
>> generates transaction-ids and tags (in fact, the code that  
>> generates Call-Ids has been tweaked to help with this very problem  
>> that you're seeing). The tweak in the Call-Id generation code  
>> involves throwing the thread-id into the generated bits, which  
>> solves the collision issue you're seeing. Maybe we could alter  
>> Random::getRandom() to xor the current thread-id with everything  
>> it returned (this would be in-keeping with "cheap, pseudo-random  
>> numbers")? Or maybe we could add a Random::getRandomReentrant()  
>> function?
>>
>> 	Anyone have an opinion on this?
>>
>> Best regards,
>> Byron Campen
>>
>>> So this bug report concerns a very strange issue that we noticed  
>>> on our brandnew Dual Quad Core machine (8 cpu’s) involving  
>>> duplicate Call-Id’s, Transaction-ID’s and Tag’s being generated  
>>> for independent INVITE’s. This behavior would then result in  
>>> assert failures all over the stack.
>>>
>>>
>>>
>>> We have a single instance of DUM/Resiprocate running on its own  
>>> thread. Our application generates 4 independent INVITE requests  
>>> at the same exact time which results in sequential calls  
>>> eventually being made to Random.cxx and then glibc’s random()  
>>> function. Of the four calls we get the following random values  
>>> returned
>>>
>>>
>>>
>>> Call 1: aaaaaaaaaaa
>>>
>>> Call 2: bbbbbbbbbb
>>>
>>> Call 3: aaaaaaaaaaa   (same exact sequence of random values as  
>>> the first call)
>>>
>>> Call 4: bbbbbbbbbb  (same exact sequence of random values as the  
>>> second call)
>>>
>>>
>>>
>>> Sometime later, various assert failures would occur due to  
>>> duplicate TID values and all sorts of other issues.
>>>
>>>
>>>
>>> If pause or sleep the thread for 1 MS then the the problem  
>>> disappears. So what the heck is going on….
>>>
>>>
>>>
>>> We think that DUM thread is being migrated across CPU’s between  
>>> the different invocations of glibc’s random() function and the  
>>> “seed” value is stale in a one of the CPU caches.
>>>
>>>
>>>
>>> So how do we fix this – When we dug into the resiprocate  
>>> Random.cxx code we noticed that although we had linked against  
>>> OpenSSL, the OpenSSL random functions were not being used at all.  
>>> They would be used to initialize the seed but not used to  
>>> actually generate the random values.
>>>
>>>
>>>
>>> If we used the crypto versions of the functions the repeatedness  
>>> issue went away completely.
>>>
>>>
>>>
>>> Here is a small patch which will use the crypto version if  
>>> USE_OPENSSL is defined
>>>
>>>
>>>
>>> --- rutil/Random.cxx.orig              2008-03-14  
>>> 23:21:29.000000000 -0700
>>>
>>> +++ rutil/Random.cxx    2008-03-15 00:26:59.000000000 -0700
>>>
>>> @@ -149,8 +149,9 @@
>>>
>>>  Random::getRandom()
>>>
>>>  {
>>>
>>>     initialize();
>>>
>>> -
>>>
>>> -#ifdef WIN32
>>>
>>> +#if USE_OPENSSL
>>>
>>> +   return getCryptoRandom();
>>>
>>> +#elif WIN32
>>>
>>>     assert( RAND_MAX == 0x7fff );
>>>
>>>     int r1 = rand();
>>>
>>>     int r2 = rand();
>>>
>>>
>>>
>>>
>>>
>>> -Aron
>>>
>>>
>>>
>>> Aron Rosenberg
>>>
>>> SightSpeed
>>>
>>> _______________________________________________
>>> resiprocate-devel mailing list
>>> resiprocate-devel at resiprocate.org
>>> https://list.re
>> _______________________________________________
>> resiprocate-devel mailing list
>> resiprocate-devel at resiprocate.org
>> https://list.resiprocate.org/mailman/listinfo/resiprocate-devel

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://list.resiprocate.org/pipermail/resiprocate-devel/attachments/20080319/22967237/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2423 bytes
Desc: not available
URL: <http://list.resiprocate.org/pipermail/resiprocate-devel/attachments/20080319/22967237/attachment.bin>


More information about the resiprocate-devel mailing list