Friday, October 8, 2010

Where's my bits?

Recently on StackOverflow a question was asked and a colleague of mine answered it with a mostly correct solution. You can read about the question and multiple solutions on the link above, but what got me thinking was the real solution, which was to use prime numbers instead of normalized values from 0 or 1 to n. The reason here is because in my colleague's solution addition and multiplication operations were used and unless you are working with primes you are going to have some collision with factors.

Primes however were not going to work. Not because of any mathematically flaw but because of bits. There were not enough bits to represent a unique hash using primes for the range the OP (original poster) was asking about (At this point you should read the proposed solution to get a gist of what is going on). Why is this, you may ask. Well for some reason Microsoft treats the long double datatype as a synonym for a 64-bit double. This did not help because my colleague needed at least 67 bits, which if Microsoft did like most compilers, long double would have been 80 bits or more. Dismayed at this, pseudo-code was put up instead of an actual working solution.

Ironically I had been complaining about this very problem about 3 weeks prior and the same colleague asked 'Why would you ever need a datatype that wide?' Before anyone jumps all over him, he is mostly right. You have the 128-bit SSE registers and the up and coming 256-bit AVX register so you can do all the work you want with intrinsics, hopefully, and of course how often do you need a value that big? All that aside, I was determined to figure out some way to perform a multiplication with 2 64-bit operands with a result that would need more than 64-bits. I took to C++ and wrote a function template for that very purpose. The below function template takes two operands and returns the result in upper and lower referenced arguments if they will not fit in one of the result arguments. This is not optimized so that the logic can be followed but because C++ templates are so amazing, you would not believe how reduced this code becomes after compilation.

template<typename T, typename R>
void LargeIntegerMultiply(T operand1, T operand2, R& upperResult, R& lowerResult)
unsigned int bitShift = sizeof(T) * 4;
T mask = ~(((T) -1) << bitShift);

T x1 = operand1 & mask;
T y1 = operand2 & mask;
T x2 = (operand1 >> bitShift) & mask;
T y2 = (operand2 >> bitShift) & mask;

T x1y1 = x1 * y1;
T x1y2 = x1 * y2;
T x2y1 = x2 * y1;
T x2y2 = x2 * y2;

lowerResult = x1y1 & mask;

R temp = (x1y1 >> bitShift) + x1y2 + x2y1;

T upperTemp = temp >> bitShift;
T lowerTemp = temp & mask;

lowerResult += (lowerTemp << bitShift);

upperResult = x2y2 + upperTemp;