< Previous by Date Date Index Next by Date >
< Previous in Thread Thread Index Next in Thread >

Re: [reSIProcate] [Fwd: Re: [reSIProcate-users] Helper::computeCallId returns the same value]


Adam,

Attached is the patch we currently use to implement the SSL crypto stuff
for all resip code. On our multi-core systems it adds unperceivable
overhead and fixes the "random" random issues.

For the cases where SSL is not defined i.e. (3), the getpid call on
Linux should already do that since pids on Linux are thread specific.
When I wrote the first patch which added the current thread local stuff,
I didn't find any other value that was a good per-thread id on linux,
pthread_self always seemed to return the same value on all threads in a
given process. The current code (non SSL) worked when we had n DUM
instances on n separate threads, but it broke once we switched to a
single DUM instance handling all traffic on a single thread. My guess is
that the thread is being moved around CPU cores during context switches
and a cache issue keeps glibc from generating good random data.

If you wanted to write some assembly code you could salt the value with
the rdtsc counter (ticks), although that might not even be multi-core
safe. (See http://www.mcs.anl.gov/~kazutomo/rdtsc.html for code)

-Aron

-----Original Message-----
From: resiprocate-devel-bounces@xxxxxxxxxxxxxxx
[mailto:resiprocate-devel-bounces@xxxxxxxxxxxxxxx] On Behalf Of Adam
Roach
Sent: Monday, October 13, 2008 1:23 PM
To: resiprocate-devel
Subject: [reSIProcate] [Fwd: Re: [reSIProcate-users]
Helper::computeCallId returns the same value]

As we've seen in the past, the Call-ID generation code that DUM uses
(resip/stack/Helper.cxx:625 on head) can generate colliding Call-IDs
under high-load conditions. The current code looks like this:

    Data
    Helper::computeCallId()
    {
       static Data hostname = DnsUtil::getLocalHostName();
       Data hostAndSalt(hostname + Random::getRandomHex(16));
    #ifndef USE_SSL // .bwc. None of this is neccessary if we're using
    openssl
    #if defined(__linux__) || defined(__APPLE__)
       pid_t pid = getpid();
       hostAndSalt.append((char*)&pid,sizeof(pid));
    #endif
    #ifdef __APPLE__
       pthread_t thread = pthread_self();
       hostAndSalt.append((char*)&thread,sizeof(thread));
    #endif
    #ifdef WIN32
       DWORD proccessId = ::GetCurrentProcessId();
       DWORD threadId = ::GetCurrentThreadId();
       hostAndSalt.append((char*)&proccessId,sizeof(proccessId));
       hostAndSalt.append((char*)&threadId,sizeof(threadId));
    #endif
    #endif // of USE_SSL
       return hostAndSalt.md5().base64encode(true);
    }

I spoke to Byron just now, and he thinks the comment about "USE_SSL" is
not accurate. (It would be if the code under getRandomHex() called into
OpenSSL -- currently, it does not).

To help refresh memories, we've visited this problem in detail before,
most recently here:

http://list.resiprocate.org/archive/resiprocate-devel/msg06605.html

The conclusion of that thread left me confused -- Alan demonstrated that
we'll have collisions (albeit rarely) on just about any architecture,
and that such collisions don't require multithreading to occur. From my
read of things, Aron's problem (and Ilana's; see
http://list.resiprocate.org/archive/resiprocate-users/msg00642.html)
occurs more frequently than Alan's test program.

It seems to me that there are a few things we can do to try and address
this:

   1. If we're using OpenSSL, make computeCallId call through to OpenSSL
      for its random numbers (there area a few paths to get there, so
      I'm just throwing out the general idea at this point).
   2. Remove the "#ifndef USE_SSL" guards from computeCallId() -- is
      this sufficent?
   3. Do #2, but also salt in a 32-bit thread-local serial number to
      prevent intra-thread collisions

Thoughts? (If no one expresses an opinion in a reasonable amount of
time, I'll probably do #3).

[It occurs to me that we must have a similar problem with tags and
branch IDs, albeit without any assert()s being triggered -- I would
presume that any fix made to Call-ID should also be made to them as
well, in Helper::computeUniqueBranch() and Helper::computeTag()]

/a

Attachment: resiprocate_random_usessl.diff
Description: resiprocate_random_usessl.diff