Friday, October 8, 2010

Where's my bits?

Recently on StackOverflow a question was asked and a colleague of mine answered it with a mostly correct solution. You can read about the question and multiple solutions on the link above, but what got me thinking was the real solution, which was to use prime numbers instead of normalized values from 0 or 1 to n. The reason here is because in my colleague's solution addition and multiplication operations were used and unless you are working with primes you are going to have some collision with factors.

Primes however were not going to work. Not because of any mathematically flaw but because of bits. There were not enough bits to represent a unique hash using primes for the range the OP (original poster) was asking about (At this point you should read the proposed solution to get a gist of what is going on). Why is this, you may ask. Well for some reason Microsoft treats the long double datatype as a synonym for a 64-bit double. This did not help because my colleague needed at least 67 bits, which if Microsoft did like most compilers, long double would have been 80 bits or more. Dismayed at this, pseudo-code was put up instead of an actual working solution.

Ironically I had been complaining about this very problem about 3 weeks prior and the same colleague asked 'Why would you ever need a datatype that wide?' Before anyone jumps all over him, he is mostly right. You have the 128-bit SSE registers and the up and coming 256-bit AVX register so you can do all the work you want with intrinsics, hopefully, and of course how often do you need a value that big? All that aside, I was determined to figure out some way to perform a multiplication with 2 64-bit operands with a result that would need more than 64-bits. I took to C++ and wrote a function template for that very purpose. The below function template takes two operands and returns the result in upper and lower referenced arguments if they will not fit in one of the result arguments. This is not optimized so that the logic can be followed but because C++ templates are so amazing, you would not believe how reduced this code becomes after compilation.

template<typename T, typename R>
void LargeIntegerMultiply(T operand1, T operand2, R& upperResult, R& lowerResult)
unsigned int bitShift = sizeof(T) * 4;
T mask = ~(((T) -1) << bitShift);

T x1 = operand1 & mask;
T y1 = operand2 & mask;
T x2 = (operand1 >> bitShift) & mask;
T y2 = (operand2 >> bitShift) & mask;

T x1y1 = x1 * y1;
T x1y2 = x1 * y2;
T x2y1 = x2 * y1;
T x2y2 = x2 * y2;

lowerResult = x1y1 & mask;

R temp = (x1y1 >> bitShift) + x1y2 + x2y1;

T upperTemp = temp >> bitShift;
T lowerTemp = temp & mask;

lowerResult += (lowerTemp << bitShift);

upperResult = x2y2 + upperTemp;

Saturday, August 7, 2010

Don't be scared...

Reading is an empowering thing. It allows the individual to speak almost directly with a person from another time and place. Words on paper or rather words that are read, offer a way to set down an idea and allows others to take from it what they will at their leisure. It can however be a scary thing.

It was not too long ago when people were banned from reading the Bible and in some places in the world some are still banned from reading the Qur'an. The reasons for this are that if you read someone's ideas then you have the opportunity to interpret them for yourself and make a decision regarding your agreement or not. In the past, religious leaders did not want their congregations to be making those decisions. As an aside, this is also why much of the Catholic mass was spoken in Latin until the Vatican II. For religion however this has largely changed and we are free to read as we like and interpret without fear of being strung up or burned at the stake for our viewpoints. This is mostly true.

Thankfully, Government bills and laws are different in that they are in the public domain for viewing, yet similar to the Catholic mass being done in Latin. Government documents are heavy on legal procedure and rhetorical flourish, might as well be Latin. This has forced many people to seek commentators to give a brief rundown of what is in bills. We turn to news channels or newspapers for the gist of the bill and base our opinions on that information. This is a good way of doing it actually. It allows us to be in the law making process without having to speak in a legal dialect, gives commentators something to do, and lets the Government be as precise as possible in the laws they write.

The downside to this is that you get a different viewpoint and interpretation based on who explains the bill to you. See the beginning of the entry for why. Many times it is helpful just to be able to see what was written and reference that against what is being explained. In my high school we had a decent civic's class, an endangered species. We discussed the role of Government, how it works, and what we as citizens are expected to do. The internet was just coming up at this time and I was lucky to have an instructor that utilized the fledgling technology. I was directed to what has become my goto URL for all things related to United States law, Thomas. It is named after Thomas Jefferson for his pivotal role in the legal foundation of our country. This site allows the user to search sessions of Congress as well as look at actual bills being currently debated and look at the precise wording, layout, implications, and all that resides in bill XYZ.

It may take some time to understand bill layout and parse verbiage, don't be scared, but when you hear somewhere that bill XYZ says that "The Government will kill your dog." Now you can go check, if they really will.

Sunday, June 27, 2010

History: Cum grano salis

Recently I found this article on the BBC and was quite intrigued because biased historical perspective is still apparently an interesting topic. This seems to be a topic that gets a great deal of publicity whenever a political group begins to gain momentum.

For example in reading the article the historical archetype for who does this is the Soviet Union, more specifically Joseph Stalin. It appears as a general rallying cry by whatever party is against historical perspective to compare that other party to the Soviet Union or Stalin specifically. This type of comparison is the precise issue I have a problem with. It almost seems that political parties refuse to accept that history is inherently based on perspective.

At this point I am sure many people will be up in arms and preparing scathing rebuttals but it is true. Let us examine any conflict throughout history. For the sake of ruffling as few feathers as possible consider Genghis Khan, or for those on the Asian continent, Chinggis Khaan. This person is highly revered in Mongolia and in many indirect ways by those in Europe. He started what would become the great trade route from China to Europe during his conquest. Who can argue this is a bad thing, besides the most staunch anti-globalisationists? Obviously I am looking at one aspect of him... aha! There is the precise issue. Perspective. There are those in India or modern day Persia (Iran) who argue that because of Genghis Khan we are hundreds of years behind in technology because of the destruction of knowledge brought on by his campaign. Many others will bring up the senseless torture and debauchery inflicted upon their respective peoples. One country's hero is another's dictator.

This argument is well known and nothing new. I guess my point is that complaining about what goes into history books is a constant battle and one that will never truly be done to everyone's liking. The fact that Texas and possibly other states are engaging in this enterprise is not too surprising, considering the recent upheaval of political perspectives in the United States. I submit to all that fighting for unbiased historical accuracy is a battle that no one will ever win, outside of their own political group anyways and we should take anyone's account of history, cum grano salis.

Sunday, June 6, 2010

RDTSC, RDTSCP and QueryPerformanceCounter

During a recent project to implement the ICorProfilerCallback2 interface, supplied by Microsoft to profile .NET applications, I hit a small snag. I wanted a fine grained time stamp. Fine grained enough to see all distinct function call times to say something simple like the Fibonacci algorithm.

The Fibonacci algorithm is quite small and done recursively using C# with the Environment.TickCount property the times overlap and according to the time generated it would appear that many operations occur at the same time. Of course this is nonsense. It is also obvious that collecting this time through the Environment class requires going through many layers of the .NET runtime to actually get to the underlying mechanism that collects that time. This is where the previously mentioned interface comes in.

Implementing the ICorProfilerCallback2 interface, the most recent profiler interface is ICorProfilerCallback3, allows the user to stay in native and gather a lot of data about the runtime needed by a profiler. Being in native then ensures that I will not have to go through the layers of the runtime and can access more precise data. This would of course mean making a call to the WINAPI GetTickCount(). Unfortunately this also did not offer the fine grained resolution desired.

Enter assembly. Regardless of what my final resolution was, I highly recommend anyone who has any interest in assembly to goto the Intel site and either download and\or request the Intel Architecture manual. They are quite helpful and although did not get me to the best solution for this problem have answer many questions since. Reading through the vast instruction lists, I stumbled upon rdtsc and rdtscp.

The first instruction, rdtsc, queries the on chip counter for an ever increasing QWORD. The previously mentioned functions all returned a DWORD. Using rdtsc however requires doing some inline assembly or with Visual C++ the intrinsic call __rdtsc can be used. The assembly code though is quite simple and can be done with some help from your favorite search engine or with the previously mentioned manuals. There are some caveats with using this instruction and they are based heavily on power management settings and with CPU frequency throttling. I am not going to discuss them here, but be wary of this.

After changing over my code to use the rdtsc instruction, resolution was granted and the world good, until I decided to try out the profiler on a multi-processor machine. Then the world was not so good and time did not seem to always be going forward. Timestamps would show the system leaving a function before it entered it. I assumed this was my own error but after some lengthy research realized that the counters being used were not synchronized across all CPUs. This was a terrible discovery and one that has a long history, that for another time. My solution was to use the rdtscp instruction that also returned the processor ID. This information is contained in those manuals mentioned above.

The rdtscp instruction proved to be an issue on another front however, not all CPUs come with this instruction. It is rather new and after looking at all of my machines and several of my colleagues it was clear none of us had the instruction present. Regardless of all this, the solution would still not solve the issue because there was no way to query a specific CPU only to know which CPU it came from. Which means I could not tell how out of sync the processors were and would just have to throw away the data anyways, which is evil by anyones standards. What the problem reduced to was I needed a way to tell the program to run on a specific CPU, which can be done with SetThreadAffinityMask() and SetProcessAffinityMask(), but then the application cannot be properly profiled because the application cannot run naturally on the target machine. Therefore rdtscp was also going to causes issues in the multi-processor case.

The problem was becoming obnoxious. All was not lost however, I knew that Windows must have some solution and resolved to find it. It came in the form of QueryPerformanceCounter(). This function abstracts away CPUs and provides the needed synchronization of CPUs or in many cases uses another counter. What this other counter is, I will not go into here. This function does have its limitations since it is a system call instead of a simple instruction, but does ensure that on multi-processor applications that values returned are increasing. It also returns the count in the LARGE_INTEGER datatype which happens to be a QWORD.

I found that this function does not have the resolution of rdtsc but it is close and we can assume it will only get better. In many instances it worked just fine, even though it was a system call. At this point I decided to settle on the supplied WINAPI call and have been happy. I have however written the following code to use the rdtscp instruction in instances when it exists because I still have not had a chance to use it. That code is below.

__declspec(naked) BOOL WINAPI
MyProfiler::GetCurrentTick(PLARGE_INTEGER time, PDWORD procId)
    mov eax,FALSE
    cmp [esp + 0x4],NULL;; Check for NULL pointer - time
    jz DONE
    mov eax,0x80000001  ;; Argument for CPUID
    bt edx,0x1B         ;; Check for RDTSCP in bit 27 of EDX
    rdtscp  ;; ReaD Time Stamp Counter [EDX:EAX] and Processor id [ECX]
    mov ebx,[esp + 0x4] ;; - time
    mov [ebx],eax       ;; Set lower order bits - time
    mov [ebx + 0x4],edx ;; Set higher order bits - time
    mov eax,TRUE        ;; Return TRUE
    jmp PROCID
    mov eax,[esp + 0x4] ;; - time
    push eax            ;; Push time on stack for QueryPerformanceCounter()
    call dword ptr QueryPerformanceCounter ;; EAX set in function,TRUE
    mov ecx,0x0         ;; Set ECX to 0 for Processor id
    cmp [esp + 0x8],NULL;; Check for NULL pointer - procId
    jz DONE
    mov ebx,[esp + 0x8] ;; - procId
    mov [ebx],ecx       ;; Set - procId
    ret 8 ;; Clean up stack

Saturday, June 5, 2010

Hello World!

As is apropos with anyone who programs or creates something, it is necessary to begin simply. This is the beginning of what I hope will be the documentation of bits of knowledge I have acquired over my years in multiple disciplines.

It has become a unique event when people have knowledge about multiple areas. I have observed this in working with people from other countries as well as with my close friends. Societally it seems we are taught that we should focus on one area and know it well. That is all that we are expected to know. If you are a mechanic you should know how to rebuild a transmission, but if you know nothing about gardening that is okay, because that is not 'your field'.

This to me is a way of convincing ourselves that we can stop being curious about the world and limit our gaze to that narrow slice of the world that is 'our field'. I challenge myself to investigate new disciplines regularly. It has allowed me to see how truly interconnected knowledge is and appreciate the gift of asking the simple questions of 'why' and 'how'.

I call this search the 'Terran Comedy' after the epic poem 'The Divine Comedy' by Dante Alighieri. I am not looking to investigate the divine, although that is something I enjoy, but am merely interested in the world in which I reside. I begin with confusion over a topic and subsequently offer what I discover to be an understanding. I welcome all to comment and offer their own insights.