[reSIProcate] Random.cxx and MultiCore systems
Alan Hawrylyshen
alan at polyphase.ca
Wed Mar 19 15:04:21 CDT 2008
Can we boil it down to a simple test driver that can confirm random()
is returning the same value twice?
A
--
Sorry this is terse; it is from my handheld device.
On 19 Mar 2008, at 12:55, Byron Campen <bcampen at estacado.net> wrote:
> Upon some discussion, it seems that this is happening _in a single
> thread_, so using a thread-id will not help with this specific
> problem (although it is probably a good idea anyway). It does seem
> likely that we are running into a caching problem, although I am not
> sure what can be done about this.
>
> Anyone have any ideas?
>
> Best regards,
> Byron Campen
>
>
>> More to Byron's point. If you substitute the crypto random
>> function, you will notice a significant increase in latency. A
>> pseudo random cheap solution that doesn't exhibit this erect is
>> needed. We stumbled across something similar years ago when we
>> ported to a then exotic dual processor machine. The problem
>> occurred very rarely then but with 4+ cores and todays speeds I
>> believe the problem will happen predictably as you have seen Aron.
>>
>> Alan
>> --
>> Sorry this is terse; it is from my handheld device.
>>
>> On 19 Mar 2008, at 10:14, Byron Campen <bcampen at estacado.net> wrote:
>>
>>> I wouldn't re-implement Random::getRandom() with getCryptoRandom
>>> (), since the contract on it is for providing cheap, pseudo-random
>>> numbers. It would be more reasonable to change the code that
>>> generates transaction-ids and tags (in fact, the code that
>>> generates Call-Ids has been tweaked to help with this very problem
>>> that you're seeing). The tweak in the Call-Id generation code
>>> involves throwing the thread-id into the generated bits, which
>>> solves the collision issue you're seeing. Maybe we could alter
>>> Random::getRandom() to xor the current thread-id with everything
>>> it returned (this would be in-keeping with "cheap, pseudo-random
>>> numbers")? Or maybe we could add a Random::getRandomReentrant()
>>> function?
>>>
>>> Anyone have an opinion on this?
>>>
>>> Best regards,
>>> Byron Campen
>>>
>>>> So this bug report concerns a very strange issue that we noticed
>>>> on our brandnew Dual Quad Core machine (8 cpu’s) involving dup
>>>> licate Call-Id’s, Transaction-ID’s and Tag’s being generated
>>>> for independent INVITE’s. This behavior would then result in a
>>>> ssert failures all over the stack.
>>>>
>>>>
>>>>
>>>> We have a single instance of DUM/Resiprocate running on its own
>>>> thread. Our application generates 4 independent INVITE requests
>>>> at the same exact time which results in sequential calls
>>>> eventually being made to Random.cxx and then glibc’s random()
>>>> function. Of the four calls we get the following random values
>>>> returned
>>>>
>>>>
>>>>
>>>> Call 1: aaaaaaaaaaa
>>>>
>>>> Call 2: bbbbbbbbbb
>>>>
>>>> Call 3: aaaaaaaaaaa (same exact sequence of random values as
>>>> the first call)
>>>>
>>>> Call 4: bbbbbbbbbb (same exact sequence of random values as the
>>>> second call)
>>>>
>>>>
>>>>
>>>> Sometime later, various assert failures would occur due to
>>>> duplicate TID values and all sorts of other issues.
>>>>
>>>>
>>>>
>>>> If pause or sleep the thread for 1 MS then the the problem
>>>> disappears. So what the heck is going on….
>>>>
>>>>
>>>>
>>>> We think that DUM thread is being migrated across CPU’s betwee
>>>> n the different invocations of glibc’s random() function and t
>>>> he “seed” value is stale in a one of the CPU caches.
>>>>
>>>>
>>>>
>>>> So how do we fix this – When we dug into the resiprocate Rando
>>>> m.cxx code we noticed that although we had linked against Open
>>>> SSL, the OpenSSL random functions were not being used at all.
>>>> They would be used to initialize the seed but not used to actu
>>>> ally generate the random values.
>>>>
>>>>
>>>>
>>>> If we used the crypto versions of the functions the repeatedness
>>>> issue went away completely.
>>>>
>>>>
>>>>
>>>> Here is a small patch which will use the crypto version if
>>>> USE_OPENSSL is defined
>>>>
>>>>
>>>>
>>>> --- rutil/Random.cxx.orig 2008-03-14 23:21:29.000000000 -0700
>>>>
>>>> +++ rutil/Random.cxx 2008-03-15 00:26:59.000000000 -0700
>>>>
>>>> @@ -149,8 +149,9 @@
>>>>
>>>> Random::getRandom()
>>>>
>>>> {
>>>>
>>>> initialize();
>>>>
>>>> -
>>>>
>>>> -#ifdef WIN32
>>>>
>>>> +#if USE_OPENSSL
>>>>
>>>> + return getCryptoRandom();
>>>>
>>>> +#elif WIN32
>>>>
>>>> assert( RAND_MAX == 0x7fff );
>>>>
>>>> int r1 = rand();
>>>>
>>>> int r2 = rand();
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> -Aron
>>>>
>>>>
>>>>
>>>> Aron Rosenberg
>>>>
>>>> SightSpeed
>>>>
>>>> _______________________________________________
>>>> resiprocate-devel mailing list
>>>> resiprocate-devel at resiprocate.org
>>>> https://list.re
>>> _______________________________________________
>>> resiprocate-devel mailing list
>>> resiprocate-devel at resiprocate.org
>>> http
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://list.resiprocate.org/pipermail/resiprocate-devel/attachments/20080319/c96f9bb7/attachment.htm>
More information about the resiprocate-devel
mailing list