[reSIProcate] Random.cxx and MultiCore systems

Thu Mar 20 13:01:54 CDT 2008

Now that I've slept on it - I suppose that some level of parallelism  
would hurt this problem. Is there any chance that your application is  
calling the Helper functions that generate the CallIDs from more than  
one thread? Your results below are exactly inline with what I see on  
my system. I'm not convinced this is a problem.. (specifically that  
these results illustrate the problem). More consideration is required.

Can you rule out that you call random (via a routine that makes a  
callid) from more than one thread?

Thanks
Alan

On 20-Mar-08, at 10:44 , Aron Rosenberg wrote:

> First run – Count was around 70
>
> mp-test ~ # ./a.out
> tot: 778262873
> l1: 2060261465
> l2: 2060261465
> Aborted
>
> Second Run – Count was at 400
> mp-test ~ # ./a.out
> tot: 4033371507
> l1: 1314891622
> l2: 1314891622
> Aborted
>
> Third Run – Count was at 130
> mp-test ~ # ./a.out
> tot: 1427405301
> l1: 475005228
> l2: 475005228
> Aborted
>
> mp-test ~ # ./a.out
> tot: 1309167503
> l1: 71029242
> l2: 71029242
> Aborted
>
> -Aron
>
> From: Alan Hawrylyshen [mailto:alan at polyphase.ca]
> Sent: Thursday, March 20, 2008 11:39 AM
> To: Aron Rosenberg
> Cc: Byron Campen; resiprocate-devel
> Subject: Re: [reSIProcate] Random.cxx and MultiCore systems
>
> I am still quite tempted to prove what the failure is with a minimal  
> test driver. I fear that it might be something slightly more  
> insidious. So, once we can cause this to happen at-will, we can  
> address the appropriate root cause. Is this something that can be  
> checked easily? Anyone?
>
> I have a test driver that fails on a dual core intel platform, gcc  
> 4.0.1, Mac OS X 10.5.2
> This will fail around the 100 mark in the progress output (but I  
> have waited much longer).
> Let it run for a while and see.
> This will abort when two successive calls to random() match.
>
> I would expect this to be unlikely, but should we check this on a  
> single processor / single core system?
> Does it happen more often on dual core or SMP systems?
> Aron - can you try this on your platform?
> Please run it a LOT and see if the time-to-run varies greatly or if  
> it fails reliably.
>
> Thanks
> Alan
>
> --
>
> #include <stdio.h>
> #include <time.h>
> #include <unistd.h>
> #include <stdlib.h>
> #include <string.h>
>
> int
> main()
> {
>     unsigned long long t = 0;
>     unsigned long l1 = (unsigned long)random();
>
>     srandom(time(0));
>
>     unsigned long l2 = 0UL;
>     while (3)
>         {
>
>             l2 = (unsigned long)random();
>
>             if ( l1 == l2 ){
>                 printf("tot: %llu\nl1: %lu\nl2: %lu\n",t,l1,l2);
>                 abort();
>             }
>             l1 = l2;
>             t++;
>             const int modulator = 10000000L;
>             if (!(t % modulator)) {
>                 printf("%llu...\r",(t/modulator));
>                 fflush(stdout);
>             }
>         }
>
>     return 0;
> }
>
>
> Alan
>
> On 19-Mar-08, at 15:56 , Aron Rosenberg wrote:
>
>
> The only thing that I could think of is to use the new random_r and  
> srand_r functions instead of random and srand. The glibc _r ones  
> force the application to keep the “seed” value which might make it  
> immune to the caching problem.
>
> The issue with this approach was that the entire Random() class is  
> static although you could just add a class wide static variable to  
> hold the new userland data.
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://list.resiprocate.org/pipermail/resiprocate-devel/attachments/20080320/e279b2aa/attachment.htm>