[reSIProcate] Random.cxx and MultiCore systems
Aron Rosenberg
arosenberg at sightspeed.com
Thu Mar 20 13:15:05 CDT 2008
There is only a single thread going on. Keep in mind that there is a lot
of other calls happening in the call stack between successive
invocations of the dum ->makeInvite
In our initial tests it was 100% repeatable that we would have duplicate
call-ids and tids being generated. If we stuck a usleep(1) then the
problem went away. What I think is happening is that the OS is moving
the thread to different physical CPU's between successive calls but the
cache isn't updated.
If we used the sched_setaffinity then the initial location of the
duplicate would move, but other assert and random() failures would occur
later.
-Aron
From: Alan Hawrylyshen [mailto:alan at polyphase.ca]
Sent: Thursday, March 20, 2008 1:02 PM
To: Aron Rosenberg
Cc: Byron Campen; resiprocate-devel
Subject: Re: [reSIProcate] Random.cxx and MultiCore systems
Now that I've slept on it - I suppose that some level of parallelism
would hurt this problem. Is there any chance that your application is
calling the Helper functions that generate the CallIDs from more than
one thread? Your results below are exactly inline with what I see on my
system. I'm not convinced this is a problem.. (specifically that these
results illustrate the problem). More consideration is required.
Can you rule out that you call random (via a routine that makes a
callid) from more than one thread?
Thanks
Alan
On 20-Mar-08, at 10:44 , Aron Rosenberg wrote:
First run - Count was around 70
mp-test ~ # ./a.out
tot: 778262873
l1: 2060261465
l2: 2060261465
Aborted
Second Run - Count was at 400
mp-test ~ # ./a.out
tot: 4033371507
l1: 1314891622
l2: 1314891622
Aborted
Third Run - Count was at 130
mp-test ~ # ./a.out
tot: 1427405301
l1: 475005228
l2: 475005228
Aborted
mp-test ~ # ./a.out
tot: 1309167503
l1: 71029242
l2: 71029242
Aborted
-Aron
From: Alan Hawrylyshen [mailto:alan at polyphase.ca]
Sent: Thursday, March 20, 2008 11:39 AM
To: Aron Rosenberg
Cc: Byron Campen; resiprocate-devel
Subject: Re: [reSIProcate] Random.cxx and MultiCore systems
I am still quite tempted to prove what the failure is with a minimal
test driver. I fear that it might be something slightly more insidious.
So, once we can cause this to happen at-will, we can address the
appropriate root cause. Is this something that can be checked easily?
Anyone?
I have a test driver that fails on a dual core intel platform, gcc
4.0.1, Mac OS X 10.5.2
This will fail around the 100 mark in the progress output (but I have
waited much longer).
Let it run for a while and see.
This will abort when two successive calls to random() match.
I would expect this to be unlikely, but should we check this on a single
processor / single core system?
Does it happen more often on dual core or SMP systems?
Aron - can you try this on your platform?
Please run it a LOT and see if the time-to-run varies greatly or if it
fails reliably.
Thanks
Alan
--
#include <stdio.h>
#include <time.h>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
int
main()
{
unsigned long long t = 0;
unsigned long l1 = (unsigned long)random();
srandom(time(0));
unsigned long l2 = 0UL;
while (3)
{
l2 = (unsigned long)random();
if ( l1 == l2 ){
printf("tot: %llu\nl1: %lu\nl2: %lu\n",t,l1,l2);
abort();
}
l1 = l2;
t++;
const int modulator = 10000000L;
if (!(t % modulator)) {
printf("%llu...\r",(t/modulator));
fflush(stdout);
}
}
return 0;
}
Alan
On 19-Mar-08, at 15:56 , Aron Rosenberg wrote:
The only thing that I could think of is to use the new random_r and
srand_r functions instead of random and srand. The glibc _r ones force
the application to keep the "seed" value which might make it immune to
the caching problem.
The issue with this approach was that the entire Random() class is
static although you could just add a class wide static variable to hold
the new userland data.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://list.resiprocate.org/pipermail/resiprocate-devel/attachments/20080320/d77f9ee5/attachment.htm>
More information about the resiprocate-devel
mailing list