[reSIProcate] Data::*hash() not working on binary data
We tracked down a bug with Data that prevents it from calculating
hashes on binary data (really any Data with characters above 0x7f).
Since rawHash and rawCaseInsensitiveHash take const char* (signed), do
arithmetic on those characters, and use the result as an array
subscript, the subscript gets promoted to int and it winds up using
the values randomPermutation[-255,255] in the hash. On Mac OS, the
system happens to leave the memory region preceding randomPermutation
zeroed, so if the last character in the Data was negative, the hash
would always be zero (which is what started Tiffany looking at this
problem). On an x86 FC5 platform, the bug isn't as noticeable unless
you add something like:
cout << (*c ^ bytes[0]) << " "
<< (*c ^ bytes[1]) << " "
<< (*c ^ bytes[2]) << " "
<< (*c ^ bytes[3]) << endl;
to the inner loop of rawHash()
anyway, a patch is below which fixes the problem by simply casting the
char* c parameter to be an unsigned char* before calculating the hash.
Bruce
-----------------
Index: Data.cxx
===================================================================
--- Data.cxx (revision 610)
+++ Data.cxx (working copy)
@@ -1837,10 +1837,11 @@
};
size_t
-Data::rawHash(const char* c, size_t size)
+Data::rawHash(const char* cx, size_t size)
{
// 4 byte Pearson's hash
// essentially random hashing
+ const unsigned char* c = (const unsigned char*) cx;
union
{
@@ -1853,7 +1854,7 @@
bytes[2] = randomPermutation[2];
bytes[3] = randomPermutation[3];
- const char* end = c + size;
+ const unsigned char* end = c + size;
for ( ; c != end; ++c)
{
bytes[0] = randomPermutation[*c ^ bytes[0]];
@@ -1868,8 +1869,9 @@
// use only for ascii characters!
size_t
-Data::rawCaseInsensitiveHash(const char* c, size_t size)
+Data::rawCaseInsensitiveHash(const char* cx, size_t size)
{
+ const unsigned char* c = (const unsigned char*) cx;
union
{
size_t st;
@@ -1881,10 +1883,10 @@
bytes[2] = randomPermutation[2];
bytes[3] = randomPermutation[3];
- const char* end = c + size;
+ const unsigned char* end = c + size;
for ( ; c != end; ++c)
{
- char cc = tolower(*c);
+ unsigned char cc = tolower(*c);
bytes[0] = randomPermutation[cc ^ bytes[0]];
bytes[1] = randomPermutation[cc ^ bytes[1]];
bytes[2] = randomPermutation[cc ^ bytes[2]];