Saturday, January 22, 2011

MS Assembler does the right thing!

While working on OS373, something of note that I discovered during the interrupt work is that the MS Assembler is just 'following orders'.  Take for instance the following naked function written for CL:


void __declspec(naked) ISR()
{
       __asm pushad

       // Handle interrupt

       __asm popad
       __asm iretd
}


This is the basically the simplest Interrupt Service Routine (ISR) that can be constructed.  Now we know that the CL and LINK output 32 bit images by default.   What this means is that all opcodes generated by the compiler will be 32-bit.  However that may not always be the case.  Take the GCC inline assembler for instance.


void ISR()
{
//
       // Ignore Prolog
       //

       __asm__ ("pushad");

       // Handle interrupt

       __asm__ ("popad");
       __asm__ ("iret");

//
       // Ignore Epilog
//
}



Both code examples are the same.  In fact both will output the same opcodes, minus the prologue and epilogue in the GCC case (GCC does not support naked functions for the x86 architecture).  If you are looking closely you will notice that the GCC has an "iret" instruction instead of "iretd".  So how can they generated the same opcodes?  Well many assemblers consider them to be the same thing, even though "iret" is the 16-bit instruction and "iretd" the 32-bit, because the compilers infers which one to use depending on the bit-ness of the resulting binary.

If you look at the Intel Architecture guide vol 2a you read that even the underlying opcodes are the same, 0xCF.  However if in the CL example I replace the "iretd" with "iret" something very fishy happens.  MS takes the perspective that if you are writing in assembly you know what you want and it respects that, GCC makes some assumptions.  The resulting opcode for CL using the "iret" instruction will actually be, 0x66 0xCF, this is very different from 0xCF.  The 0x66 prefix overrides the default operand size (Intel Architecture guide vol 2a: chapter 2.1.1); in this case the input to the opcode is not a 32-bit argument but 16-bit.  The same can be done for 16-bit opcodes.  If you write in 16-bit assembler, you can prefix an opcode with 0x66 and the CPU will supply a 32-bit argument to the 16-bit opcode.  I had originally written the "iret" instruction because I ‘assumed’ that the MS Assembler was doing what the software guide suggested.  This cause all my interrupts to blow up.  The only way to track down this issue was to actually step through the kernel and examine the assembly generated for the ISR.

Sunday, January 9, 2011

GDT Expand Down entries

Currently I am working on the development of an x86 operating system, OS373. It is something that I have thought about for many years and now feel that I have developed the skills and basic knowledge to begin the endeavorer. The experience has been enlightening and many criticism I have developed over the years for various operating systems have now relaxed due to my understanding of why certain decisions had been made. I may not always agree with those decisions, but now I can at least appreciate the difficulty of making those trade offs.

While reading various sources on OS development an aspect of a protected mode operating systems was not really explained very well. Many sites mention it and a few attempt to explain the mechanism but all seem to fall short of actually conveying how it works. See here for an example of an explanation that is correct, but over complicates the logic.

The mechanism being alluded to is the Expand Down option for data segment entries in the Global Descriptor Table (GDT). At this point unless you have done some reading on the GDT, data segments, and\or looked at an OS implementation of the GDT some background is probably needed.

The mechanism that puts the 'protected' in protected mode is the GDT. This table contains entries that break up the memory accessible by an OS into specific segments with certain permissions. Each entry contains an address, BASE, a length, LIMIT, and various flags that define a segment of memory and how as well as who can access that segment.

In its simplest form a segment register is populated with an index into the GDT. All general purpose registers are associated with a segment register. When an address, OFFSET, in a general purpose register is referenced the CPU converts the OFFSET to a linear address (LA) by using the BASE and LIMIT in the associated GDT entry contained in the segment register. The equation is quite simple:

LA = BASE + OFFSET

If the OFFSET is greater than the LIMIT defined, a General Protection fault is triggered by the CPU. The permissions in the entry are also verified, but that is outside the scope of this post. Using real numbers will help to clarify. The following is defined:

BASE = 0x1000
LIMIT = 0x100
OFFSET = 0x10

The OFFSET is less than the LIMIT so no issue there.

0x1000 : BASE
0x10 : OFFSET
-------
0x1010 : LA

The actual memory location to be accessed by the OFFSET is 0x1010. This example is how the Expand Up data segments work and the expected LA is obvious now. However in the 'various flags' mentioned above, there is a way to set the data segment to Expand Down.

Setting the Expand Down flag makes the LA less than the BASE, hence Expand Down; however, this setting does not change the above equation. The OFFSET will still be added to the BASE and the OFFSET will still be compared against the LIMIT. There is a subtle trick that the CPU is using to allow addition yet yield a decreasing LA, this trick is arithmetic overflow.

Consider the following. Lets us assume a series of registers exist that are only 4 bits in width. This means the registers can contain a maximum value of 15 and a minimum of 0. For example:
B3 B2 B1 B0
0 0 0 0 : 0
1 0 0 1 : 9
1 1 1 1 : 15

Let us consider addition using these registers.

B3 B2 B1 B0
1 : Carry
1 0 0 1 : 9
0 1 0 1 : 5
-----------
1 1 1 0 : 14

Using binary arithmetic, the sum of two values is as simple as 1 + 1. Looking at this example, a solution presents itself if we take the second operand to its furthest extreme.

B3 B2 B1 B0
1 1 1 1 : Carry
1 0 0 1 : 9
1 1 1 1 : 15
-----------
1 0 0 1 : 9

Notice B3 has an bit that needs to be carried over. In the overflow case, the CPU simply carries that overflow back to B0, this addition of the maximum value yields the same value. Thus decreasing the second operand by 1 in turn decreases the first also by 1.

B3 B2 B1 B0
1 1 1 1 : Carry
1 0 0 1 : 9
1 1 1 0 : 14
-----------
1 0 0 0 : 8

Using arithmetic overflow we have now decreased the value of the first operand using addition, the value is expanding down. Turning back to the Expand Down data segment the mechanism works the same way.

Lets use the same BASE as the previous example but compute the LA as if the data segment were of the Expand Down variety. The following is defined:

BASE = 0x1000
LIMIT = 0xfeff
OFFSET = 0xffef

Since we are decreasing the BASE we need to start at the other extreme. Where the LIMIT is 0 + the length for upward expansion, the LIMIT is now max register value - length. The OFFSET is greater than the LIMIT, expanding down, so no issue there.

0x1000 : BASE
0xffef : OFFSET
------
0x0fef : LA

Now like the initial example the LA is 0x10 from the BASE, but the LA is expanding down instead of up.