< Previous by Date | Date Index | Next by Date > |
< Previous in Thread | Thread Index | Next in Thread > |
Adam, Attached is the patch we currently use to implement the SSL crypto stuff for all resip code. On our multi-core systems it adds unperceivable overhead and fixes the "random" random issues. For the cases where SSL is not defined i.e. (3), the getpid call on Linux should already do that since pids on Linux are thread specific. When I wrote the first patch which added the current thread local stuff, I didn't find any other value that was a good per-thread id on linux, pthread_self always seemed to return the same value on all threads in a given process. The current code (non SSL) worked when we had n DUM instances on n separate threads, but it broke once we switched to a single DUM instance handling all traffic on a single thread. My guess is that the thread is being moved around CPU cores during context switches and a cache issue keeps glibc from generating good random data. If you wanted to write some assembly code you could salt the value with the rdtsc counter (ticks), although that might not even be multi-core safe. (See http://www.mcs.anl.gov/~kazutomo/rdtsc.html for code) -Aron -----Original Message----- From: resiprocate-devel-bounces@xxxxxxxxxxxxxxx [mailto:resiprocate-devel-bounces@xxxxxxxxxxxxxxx] On Behalf Of Adam Roach Sent: Monday, October 13, 2008 1:23 PM To: resiprocate-devel Subject: [reSIProcate] [Fwd: Re: [reSIProcate-users] Helper::computeCallId returns the same value] As we've seen in the past, the Call-ID generation code that DUM uses (resip/stack/Helper.cxx:625 on head) can generate colliding Call-IDs under high-load conditions. The current code looks like this: Data Helper::computeCallId() { static Data hostname = DnsUtil::getLocalHostName(); Data hostAndSalt(hostname + Random::getRandomHex(16)); #ifndef USE_SSL // .bwc. None of this is neccessary if we're using openssl #if defined(__linux__) || defined(__APPLE__) pid_t pid = getpid(); hostAndSalt.append((char*)&pid,sizeof(pid)); #endif #ifdef __APPLE__ pthread_t thread = pthread_self(); hostAndSalt.append((char*)&thread,sizeof(thread)); #endif #ifdef WIN32 DWORD proccessId = ::GetCurrentProcessId(); DWORD threadId = ::GetCurrentThreadId(); hostAndSalt.append((char*)&proccessId,sizeof(proccessId)); hostAndSalt.append((char*)&threadId,sizeof(threadId)); #endif #endif // of USE_SSL return hostAndSalt.md5().base64encode(true); } I spoke to Byron just now, and he thinks the comment about "USE_SSL" is not accurate. (It would be if the code under getRandomHex() called into OpenSSL -- currently, it does not). To help refresh memories, we've visited this problem in detail before, most recently here: http://list.resiprocate.org/archive/resiprocate-devel/msg06605.html The conclusion of that thread left me confused -- Alan demonstrated that we'll have collisions (albeit rarely) on just about any architecture, and that such collisions don't require multithreading to occur. From my read of things, Aron's problem (and Ilana's; see http://list.resiprocate.org/archive/resiprocate-users/msg00642.html) occurs more frequently than Alan's test program. It seems to me that there are a few things we can do to try and address this: 1. If we're using OpenSSL, make computeCallId call through to OpenSSL for its random numbers (there area a few paths to get there, so I'm just throwing out the general idea at this point). 2. Remove the "#ifndef USE_SSL" guards from computeCallId() -- is this sufficent? 3. Do #2, but also salt in a 32-bit thread-local serial number to prevent intra-thread collisions Thoughts? (If no one expresses an opinion in a reasonable amount of time, I'll probably do #3). [It occurs to me that we must have a similar problem with tags and branch IDs, albeit without any assert()s being triggered -- I would presume that any fix made to Call-ID should also be made to them as well, in Helper::computeUniqueBranch() and Helper::computeTag()] /a
Attachment:
resiprocate_random_usessl.diff
Description: resiprocate_random_usessl.diff