Alan Hawrylyshen alan at polyphase.ca
Wed Mar 19 14:30:27 CDT 2008

More to Byron's point. If you substitute the crypto random function,  
you will notice a significant increase in latency. A pseudo random  
cheap solution that doesn't exhibit this erect is needed.  We stumbled  
across something similar years ago when we ported to a then exotic  
dual processor machine. The problem occurred very rarely then but with  
4+ cores and todays speeds I believe the problem will happen  
predictably as you have seen Aron.

Sorry this is terse; it is from my handheld device.

On 19 Mar 2008, at 10:14, Byron Campen <bcampen at estacado.net> wrote:

> 	I wouldn't re-implement Random::getRandom() with getCryptoRandom(),  
> since the contract on it is for providing cheap, pseudo-random  
> numbers. It would be more reasonable to change the code that  
> generates transaction-ids and tags (in fact, the code that generates  
> Call-Ids has been tweaked to help with this very problem that you're  
> seeing). The tweak in the Call-Id generation code involves throwing  
> the thread-id into the generated bits, which solves the collision  
> issue you're seeing. Maybe we could alter Random::getRandom() to xor  
> the current thread-id with everything it returned (this would be in- 
> keeping with "cheap, pseudo-random numbers")? Or maybe we could add  
> a Random::getRandomReentrant() function?
> 	Anyone have an opinion on this?
> Best regards,
> Byron Campen
>> So this bug report concerns a very strange issue that we noticed on  
>> our brandnew Dual Quad Core machine (8 cpu’s) involving duplicate  
>> Call-Id’s, Transaction-ID’s and Tag’s being generated for  
>> independent INVITE’s. This behavior would then result in assert fa 
>> ilures all over the stack.
>> We have a single instance of DUM/Resiprocate running on its own  
>> thread. Our application generates 4 independent INVITE requests at  
>> the same exact time which results in sequential calls eventually  
>> being made to Random.cxx and then glibc’s random() function. Of th 
>> e four calls we get the following random values returned
>> Call 1: aaaaaaaaaaa
>> Call 2: bbbbbbbbbb
>> Call 3: aaaaaaaaaaa   (same exact sequence of random values as the  
>> first call)
>> Call 4: bbbbbbbbbb  (same exact sequence of random values as the  
>> second call)
>> Sometime later, various assert failures would occur due to  
>> duplicate TID values and all sorts of other issues.
>> If pause or sleep the thread for 1 MS then the the problem  
>> disappears. So what the heck is going on….
>> We think that DUM thread is being migrated across CPU’s between th 
>> e different invocations of glibc’s random() function and the  
>> “seed” value is stale in a one of the CPU caches.
>> So how do we fix this – When we dug into the resiprocate Random.cx 
>> x code we noticed that although we had linked against OpenSSL, the 
>>  OpenSSL random functions were not being used at all. They would b 
>> e used to initialize the seed but not used to actually generate th 
>> e random values.
>> If we used the crypto versions of the functions the repeatedness  
>> issue went away completely.
>> Here is a small patch which will use the crypto version if  
>> USE_OPENSSL is defined
>> --- rutil/Random.cxx.orig              2008-03-14 23:21:29.000000000 -0700
>> +++ rutil/Random.cxx    2008-03-15 00:26:59.000000000 -0700
>> @@ -149,8 +149,9 @@
>>  Random::getRandom()
>>  {
>>     initialize();
>> -
>> -#ifdef WIN32
>> +   return getCryptoRandom();
>> +#elif WIN32
>>     assert( RAND_MAX == 0x7fff );
>>     int r1 = rand();
>>     int r2 = rand();
>> -Aron
>> Aron Rosenberg
>> SightSpeed
>> _______________________________________________
>> resiprocate-devel mailing list
>> resiprocate-devel at resiprocate.org
>> https://list.re
> _______________________________________________
> resiprocate-devel mailing list
> resiprocate-devel at resiprocate.org
> https://list.resiprocate.org/mailman/listinfo/resiprocate-devel
