Sunday, June 27, 2010

History: Cum grano salis

Recently I found this article on the BBC and was quite intrigued because biased historical perspective is still apparently an interesting topic. This seems to be a topic that gets a great deal of publicity whenever a political group begins to gain momentum.

For example in reading the article the historical archetype for who does this is the Soviet Union, more specifically Joseph Stalin. It appears as a general rallying cry by whatever party is against historical perspective to compare that other party to the Soviet Union or Stalin specifically. This type of comparison is the precise issue I have a problem with. It almost seems that political parties refuse to accept that history is inherently based on perspective.

At this point I am sure many people will be up in arms and preparing scathing rebuttals but it is true. Let us examine any conflict throughout history. For the sake of ruffling as few feathers as possible consider Genghis Khan, or for those on the Asian continent, Chinggis Khaan. This person is highly revered in Mongolia and in many indirect ways by those in Europe. He started what would become the great trade route from China to Europe during his conquest. Who can argue this is a bad thing, besides the most staunch anti-globalisationists? Obviously I am looking at one aspect of him... aha! There is the precise issue. Perspective. There are those in India or modern day Persia (Iran) who argue that because of Genghis Khan we are hundreds of years behind in technology because of the destruction of knowledge brought on by his campaign. Many others will bring up the senseless torture and debauchery inflicted upon their respective peoples. One country's hero is another's dictator.

This argument is well known and nothing new. I guess my point is that complaining about what goes into history books is a constant battle and one that will never truly be done to everyone's liking. The fact that Texas and possibly other states are engaging in this enterprise is not too surprising, considering the recent upheaval of political perspectives in the United States. I submit to all that fighting for unbiased historical accuracy is a battle that no one will ever win, outside of their own political group anyways and we should take anyone's account of history, cum grano salis.

Sunday, June 6, 2010

RDTSC, RDTSCP and QueryPerformanceCounter

During a recent project to implement the ICorProfilerCallback2 interface, supplied by Microsoft to profile .NET applications, I hit a small snag. I wanted a fine grained time stamp. Fine grained enough to see all distinct function call times to say something simple like the Fibonacci algorithm.

The Fibonacci algorithm is quite small and done recursively using C# with the Environment.TickCount property the times overlap and according to the time generated it would appear that many operations occur at the same time. Of course this is nonsense. It is also obvious that collecting this time through the Environment class requires going through many layers of the .NET runtime to actually get to the underlying mechanism that collects that time. This is where the previously mentioned interface comes in.

Implementing the ICorProfilerCallback2 interface, the most recent profiler interface is ICorProfilerCallback3, allows the user to stay in native and gather a lot of data about the runtime needed by a profiler. Being in native then ensures that I will not have to go through the layers of the runtime and can access more precise data. This would of course mean making a call to the WINAPI GetTickCount(). Unfortunately this also did not offer the fine grained resolution desired.

Enter assembly. Regardless of what my final resolution was, I highly recommend anyone who has any interest in assembly to goto the Intel site and either download and\or request the Intel Architecture manual. They are quite helpful and although did not get me to the best solution for this problem have answer many questions since. Reading through the vast instruction lists, I stumbled upon rdtsc and rdtscp.

The first instruction, rdtsc, queries the on chip counter for an ever increasing QWORD. The previously mentioned functions all returned a DWORD. Using rdtsc however requires doing some inline assembly or with Visual C++ the intrinsic call __rdtsc can be used. The assembly code though is quite simple and can be done with some help from your favorite search engine or with the previously mentioned manuals. There are some caveats with using this instruction and they are based heavily on power management settings and with CPU frequency throttling. I am not going to discuss them here, but be wary of this.

After changing over my code to use the rdtsc instruction, resolution was granted and the world good, until I decided to try out the profiler on a multi-processor machine. Then the world was not so good and time did not seem to always be going forward. Timestamps would show the system leaving a function before it entered it. I assumed this was my own error but after some lengthy research realized that the counters being used were not synchronized across all CPUs. This was a terrible discovery and one that has a long history, that for another time. My solution was to use the rdtscp instruction that also returned the processor ID. This information is contained in those manuals mentioned above.

The rdtscp instruction proved to be an issue on another front however, not all CPUs come with this instruction. It is rather new and after looking at all of my machines and several of my colleagues it was clear none of us had the instruction present. Regardless of all this, the solution would still not solve the issue because there was no way to query a specific CPU only to know which CPU it came from. Which means I could not tell how out of sync the processors were and would just have to throw away the data anyways, which is evil by anyones standards. What the problem reduced to was I needed a way to tell the program to run on a specific CPU, which can be done with SetThreadAffinityMask() and SetProcessAffinityMask(), but then the application cannot be properly profiled because the application cannot run naturally on the target machine. Therefore rdtscp was also going to causes issues in the multi-processor case.

The problem was becoming obnoxious. All was not lost however, I knew that Windows must have some solution and resolved to find it. It came in the form of QueryPerformanceCounter(). This function abstracts away CPUs and provides the needed synchronization of CPUs or in many cases uses another counter. What this other counter is, I will not go into here. This function does have its limitations since it is a system call instead of a simple instruction, but does ensure that on multi-processor applications that values returned are increasing. It also returns the count in the LARGE_INTEGER datatype which happens to be a QWORD.

I found that this function does not have the resolution of rdtsc but it is close and we can assume it will only get better. In many instances it worked just fine, even though it was a system call. At this point I decided to settle on the supplied WINAPI call and have been happy. I have however written the following code to use the rdtscp instruction in instances when it exists because I still have not had a chance to use it. That code is below.

__declspec(naked) BOOL WINAPI
MyProfiler::GetCurrentTick(PLARGE_INTEGER time, PDWORD procId)
{
  __asm
  {
    mov eax,FALSE
    cmp [esp + 0x4],NULL;; Check for NULL pointer - time
    jz DONE
    mov eax,0x80000001  ;; Argument for CPUID
    cpuid
    bt edx,0x1B         ;; Check for RDTSCP in bit 27 of EDX
    jc USE_RDTSCP
    jmp USE_QUERYPERFORMANCECOUNTER
USE_RDTSCP:
    rdtscp  ;; ReaD Time Stamp Counter [EDX:EAX] and Processor id [ECX]
    mov ebx,[esp + 0x4] ;; - time
    mov [ebx],eax       ;; Set lower order bits - time
    mov [ebx + 0x4],edx ;; Set higher order bits - time
    mov eax,TRUE        ;; Return TRUE
    jmp PROCID
USE_QUERYPERFORMANCECOUNTER:
    mov eax,[esp + 0x4] ;; - time
    push eax            ;; Push time on stack for QueryPerformanceCounter()
    call dword ptr QueryPerformanceCounter ;; EAX set in function,TRUE
    mov ecx,0x0         ;; Set ECX to 0 for Processor id
PROCID:
    cmp [esp + 0x8],NULL;; Check for NULL pointer - procId
    jz DONE
    mov ebx,[esp + 0x8] ;; - procId
    mov [ebx],ecx       ;; Set - procId
DONE:
    ret 8 ;; Clean up stack
  }
}

Saturday, June 5, 2010

Hello World!

As is apropos with anyone who programs or creates something, it is necessary to begin simply. This is the beginning of what I hope will be the documentation of bits of knowledge I have acquired over my years in multiple disciplines.

It has become a unique event when people have knowledge about multiple areas. I have observed this in working with people from other countries as well as with my close friends. Societally it seems we are taught that we should focus on one area and know it well. That is all that we are expected to know. If you are a mechanic you should know how to rebuild a transmission, but if you know nothing about gardening that is okay, because that is not 'your field'.

This to me is a way of convincing ourselves that we can stop being curious about the world and limit our gaze to that narrow slice of the world that is 'our field'. I challenge myself to investigate new disciplines regularly. It has allowed me to see how truly interconnected knowledge is and appreciate the gift of asking the simple questions of 'why' and 'how'.

I call this search the 'Terran Comedy' after the epic poem 'The Divine Comedy' by Dante Alighieri. I am not looking to investigate the divine, although that is something I enjoy, but am merely interested in the world in which I reside. I begin with confusion over a topic and subsequently offer what I discover to be an understanding. I welcome all to comment and offer their own insights.