Solution to Challenge

The clock has ticked past midnight, so it’s now time to reveal the solution to my previous challenge. When I say “Solution” I mean what I and others are aware to be the currently best method. Nobody else has found anything better, and the two “winners” have presented the same solution (which Windows itself uses).

Since the question originally came to me from a developer at Microsoft, and I mentionned this, it was safe to assume that the method Windows used was probably “the right answer”. However, the hard part was explaining what exactly it was doing.

Correct solutions came, in order, from Matt Miller, Razvan Hobeanu and Ken Johnson. These are some of my favorite blogs to read and people I respect most, so I was honoured that they took the time to write up a solution (thanks to everyone else as well!). I will present a “full” solution, including the 64-bit implementation, and the actual code in the kernel responsible for this hack.

Before I start however, there’s one esoteric solution from Myria which I thought was funny enough to be shared. She proposed, roughly: 1) SetThreadAffinityMask(GetCurrentThread(), 1); 2) return 0;

This cute answer will first force the thread to run on CPU 0, then return… CPU 0. Technically this is true, but it’s also completely useless for the actual purpose on why you’d want to know the CPU number in the first place.

Which brings us to the actual correct solution. Most people correctly identified the routine responsible for the code, RtlGetCurrentProcessorNumber, which is what kernel32’s GetCurrentProcessorNumber forwards to. Note that the WOW64 version actually forwards to NtGetCurrentProcessorNumber, and that this Native API also does exist on 32-bit versions of Windows, and reads the value stored in the PCR. While this is a simple solution, it involves an expensive system call. So let’s go back to the user-mode Rtl routine. The raw assembly code is as follows:

mov ecx, 03Bh
lsl eax, ecx
shr eax, 0Eh

When I first saw this code, I didn’t even know what the LSL instruction did, as I had never encountered it. The Intel Manual explains that LSL stands for “Load Segment Limit”, which is a nice way to get the limit for a selector in the GDT without actually having access to the GDT itself. 0x3B is a rather weird selector, but I recognized it as 0x38 masked with 0x3. The former is the selector for the TEB, and the latter is called the RPL Mask, and selects the proper ring level (User-Mode is Ring 3, so RPL is 3). Converting this to nice C code using MSVC 2005’s intrinsics and the NDK (which has internal definitions), this function looks something like:

ULONG SegmentLimit;

// Get the current segment limit of the TEB
SegmentLimit = __segmentlimit(KGDT_R3_TEB | RPL_MASK);

// Get the CPU number from the limit. Each processor has its TEB
// selector with a limit composed of the CPU number in the 14th to 19th bits.
return (SegmentLimit >> 14);

This explains what the code does, and in some sense, how it does it. However, what exactly is the CPU number doing there? Is this some sort of x86 feature? Is it added during each context switch, at boot-up, etc?

The answer lies in the KeStartAllProcessors routine in the kernel, where the following piece of assembly executes:

mov     ebx, [ebp-2Ch]
mov     eax, [ebp-328h]
shl     eax, 0Eh
mov     [ebx+38h], ax
mov     eax, [ebp-328h]
shl     eax, 0Eh
xor     eax, [ebx+3Ch]
and     eax, 0F0000h
xor     [ebx+3Ch], eax

With some help from IDA, we can make this a bit nicer and update some lines:

INIT:008F6605                 mov     ebx, [ebp+ProcessorState.SpecialRegisters.Gdtr.HighWord]
INIT:008F66D6                 mov     eax, [ebp+i]

And of course, [ebx+38h] is the KGDT_R3_TEB entry in the GDT. Because this routine initializes all processors, it loops them, and i contains the current CPU number in the loop. The processor state contains the pointer to the actual GDT for this processor. Therefore, this is a specific hack that was added, and is fully dependent on the OS, which has to be Windows 2003 or newer.

Finally, on x64 versions, the selector used is actually 0x53, based on the 0x50 TEB selector in 64-bit mode. In WOW64 however, a fake WOW system call to NtGetCurrentProcessorNumber is done instead.

Full credit for this hack and the code behind it should go to Neill Clift, who came up with it.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.