Behind Windows x64’s 44-bit Virtual Memory Addressing Limit

The era of 64-bit computing is finally upon the consumer market, and what was once a rare hardware architecture has become the latest commodity in today’s processors. 64-bit processors promise not only a larger amount of registers and internal optimizations, but, perhaps most importantly, access to a full 64-bit address space, increasing the maximum number of addressable memory from 32-bits to 64-bits, or from 4GB to 16EB (Exabytes, about 17 billion GBs). Although previous solutions such as PAE enlarged the physically addressable limit to 36-bits, they were architectural “patches” and not real solutions for increasing the memory capabilities of hungry workloads or applications.

Although 16EB is a copious amount of memory, today’s computers, as well as tomorrow’s foreseeable machines (at least in the consumer market) are not yet close to requiring support for that much memory. For these reasons, as well as to simplify current chip architecture, the AMD64 specification (which Intel used for its own implementation of x64 processors, but not Itanium) currently only supports 48 bits of virtual address space — requiring all other 16 bits to be set to the same value as the “valid” or “implemented” bits, resulting in canonical addresses: the bottom half of the address space starts at 0x0000000000000000 with only 12 of those zeroes being part of an actual address (resulting in an end at 0x00007FFFFFFFFFFF), while the top half of the address space starts at 0xFFFF800000000000, ending at 0xFFFFFFFFFFFFFFFF.

As you can realize, as newer processors support more of the addressing bits, the lower-half of memory will expand upward, towards 0x7FFFFFFFFFFFFFFF, while the upper-half of memory will expand downward, toward 0x8000000000000000 (a similar split to today’s memory space, but with 32 more bits). Anyone working with 64-bit code ought to be very familiar with this implementation, since it can have subtle effects on code when the number of implemented bits will grow. Even in the 32-bit world, a number of Windows applications (including system code in Windows itself) assume the most significant bit is zero and use it as a flag — clearly the address would become kernel-mode, so the application would mask this bit off when using it as an address. Now developers get a shot at 16 bits to abuse as flags, sequence numbers and other optimizations that the CPU won’t even know about (on current x64 processors), on top of the usual bits that can be assumed due to alignment or user vs kernel-mode code location. Compiling the 64-bit application for Itanium and testing it would reveal such bugs, but this is beyond the testing capabilities of most developers.

Examples within Microsoft’s Windows are prevalent — pushlocks, fast references, Patchguard DPC contexts, and singly-linked lists are only some of the common Windows mechanisms which utilize bits within a pointer for non-addressing purposes. It is the latter of these which is of interest to us, due to the memory addressing limit it imposed on Windows x64 due to a lack of a CPU instruction (in the initial x64 processors) that the implementation required. First, let’s have a look at the data structure and functionality on 32-bits. If you’re unsure on what exactly a singly-linked list is, I suggest a quick read in an algorithm book or Google.

Here is the SLIST_HEADER, the data structure Windows uses to represent an entry inside the list:

typedef union _SLIST_HEADER {
    ULONGLONG Alignment;
    struct {
        SLIST_ENTRY Next;
        USHORT Depth;
        USHORT Sequence;
    } DUMMYSTRUCTNAME;
} SLIST_HEADER, *PSLIST_HEADER;

Here we have an 8-byte structure, guaranteed to be aligned as such, composed of three elements: the pointer to the next entry (32-bits, or 4 bytes), and depth and sequence numbers, each 16-bits (or 2 bytes). Striving to create lock-free push and pop operations, the developers realized that they could make use of an instruction present on Pentium processors or higher — CMPXCHG8B (Compare and Exchange 8 bytes). This instruction allows the atomic modification of 8 bytes of data, which typically, on a 486, would’ve required two operations (and thus subjected these operations to race conditions requiring a spinlock). By using this native CPU instruction, which also supports the LOCK prefix (guaranteeing atomicity on a multi-processor system), the need for a spinlock is eliminated, and all operations on the list become lock-free (increasing speed).

On 64-bit computers, addresses are 64-bits, so the pointer to the next entry must be 64-bits. If we keep the depth and sequence numbers within the same parameters, we require a way to modify at minimum 64+32 bits of data — or better yet, 128. Unfortunately, the first processors did not implement the essential CMPXCHG16B instruction to allow this. The developers had to find a variety of clever ways to squeeze as much information as possible into only 64-bits, which was the most they could modify atomically at once. The 64-bit SLIST_HEADER was born:

struct {  // 8-byte header
        ULONGLONG Depth:16;
        ULONGLONG Sequence:9;
        ULONGLONG NextEntry:39;
} Header8;

The first sacrifice to make was to reduce the space for the sequence number to 9 bits instead of 16 bits, reducing the maximum sequence number the list could achieve. This still only left 39 bits for the pointer — a mediocre improvement over 32 bits. By forcing the structure to be 16-byte aligned when allocated, 4 more bits could be won, since the bottom bits could now always be assumed to be 0. This gives us 43-bits for addresses — we can still do better. Because the implementation of linked-lists is used *either* in kernel-mode or user-mode, but cannot be used across address spaces, the top bit can be ignored, just as on 32-bit machines: the code will assume the address to be kernel-mode if called in kernel-mode, and vice-versa. This allows us to address up to 44-bits of memory in the NextEntry pointer, and is the defining constraint of Windows’ addressing limit.

44 bits is nothing to laugh at — they allow 16TB of memory to be described, and thus splits Windows into somewhat two even chunks of 8TB for user-mode and kernel-mode memory. Nevertheless, this is still 16 times smaller then the CPU’s own limit (48 bits is 256TB), and even farther still from the maximum 64-bits. So, with scalability in mind, there do exist some other bits in the SLIST_HEADER which define the type of header that is being dealt with — because yes, there is an official 16-bit header, written for the day when x64 CPUs would support 128-bit Compare and Exchange (which they now do). First, a look at the full 8-byte header:

 struct {  // 8-byte header
        ULONGLONG Depth:16;
        ULONGLONG Sequence:9;
        ULONGLONG NextEntry:39;
        ULONGLONG HeaderType:1; // 0: 8-byte; 1: 16-byte
        ULONGLONG Init:1;       // 0: uninitialized; 1: initialized
        ULONGLONG Reserved:59;
        ULONGLONG Region:3;
 } Header8;

Notice how the “HeaderType” bit is overlaid with the Depth bits, and allows the implementation and developers to deal with 16-byte headers whenever they will be put into use. This is what they look like:

 struct {  // 16-byte header
        ULONGLONG Depth:16;
        ULONGLONG Sequence:48;
        ULONGLONG HeaderType:1; // 0: 8-byte; 1: 16-byte
        ULONGLONG Init:1;       // 0: uninitialized; 1: initialized
        ULONGLONG Reserved:2;
        ULONGLONG NextEntry:60; // last 4 bits are always 0’s
 } Header16;

Note how the NextEntry pointer has now become 60-bits, and because the structure is still 16-byte aligned, with the 4 free bits, leads to the full 64-bits being addressable. As for supporting both these headers and giving the full address-space to compatible CPUs, one could probably expect Windows to use a runtime “hack” similar to the “486 Compatibility Spinlock” used in the old days of NT, when CMPXCHG8B couldn’t always be assumed to be present (although Intel implemented it on the 586 Pentium, Cyrix did not). As of now, I’m not aware of this 16-byte header being used.

So there you have it — an important lesson not only on Windows 64-bit programming, but also on the importance of thinking ahead when making potentially non-scalable design decisions. Windows did a good job at maintaining capability and still allowing expansion, but the consequences of attempting to use parts of the non-implemented bits in current CPUs as secondary data may be hard to detect once your software evolves to those platforms — tread carefully.

Purple Pill: What Happened

Two weeks ago, I posted and published about a tool I wrote called Purple Pill, which used a bug in the ATI Vista x64 Video Driver to allow unsigned drivers to load. Within an hour, I pulled off the tool and the post, on my own accord, due to the fact I discovered ATI was not made aware of this particular flaw, and I wanted to follow responsible disclosure and allow ATI to fix their driver.

This flaw was especially severe, as it could be used for other purposes — including allowing a guest user access to a SYSTEM token, if the user was using a computer with an ATI Video Card. Therefore, it could have significant security implications beyond my original goal of bypassing Driver Signing on 64-bit Vista. On systems without the driver, administrative privileges would be required for any kind of attack. I originally thought this flaw had been reported to ATI since it was disclosed publically at Blackhat — something that’s usually only done once the presenter has notified the company. In this case, it seems this was not done, and I made an incorrect assumption. I should’ve checked the status of the flaw on my own, before posting the tool and I apologize for not having done so.

As for the act of bypassing driver signing, while I still disagree with Microsoft’s policy of not allowing users to set a permanent policy to explicitly allow this (at their own risk, perhaps even voiding support/warranty options), I have come to realize that attacking this policy isn’t the way to go. Microsoft has decided that Code Integrity is a new part of their security model and it’s not about to go away. Using kernel bugs to subvert it means that these measures would eventually be fixed, while exploiting 3rd party drivers potentially allows malware to figure out how to do this as well, and use the 3rd party driver maliciously. It is also a method that can be protected against, since Vista does have the ability to do per-driver blocking, and once the 3rd party vendor has upgraded all the customers, the older driver can be killed (or even shimmed against, since various kernel infrastructure in Vista allows for this kind of real-time patching).

I am currently exploring other avenues for allowing open source drivers to function on 64-bit Vista without requiring developers to pay for a certificate and deal with the  code signing authorities, while still respecting Vista’s KMCS policy, and continuing to protect against malicious drivers using such a method for their own gain. It is my hope to find a solution which will both please Microsoft and the KMCS policy, as well as make life easy for open source developers (and other non-commercial hobbyists) which for whatever reason don’t want to, or cannot, pay for a certificate.

Secrets of the Application Compatilibity Database (SDB) – Part 4

My apologies for the long delay until this fourth part was published. I have been teaching in Seattle for the previous two weeks, and have just started to settle in Cupertino for my Apple internship, and I had very few spare moments in my hands.

In Part 3, we discussed how generic shims modify key parts of the system, usually through API hooking or undocumented flags, in order to provide compatibility with a variety of applications. We looked at shims such as the Windows 9x Heap Manager implementation in NT, and several re-direction and reflection APIs, as well as even some security bypassing shims. Today, we’ll take a look at how certain applications have specific shims implemented specifically just for them. We can find these with CDD easily, by noticing that the Shim name is usually a program name, as well as looking in the DLL which implements it. Finally, specific shims never have any descriptive text describing them. While looking through the Shim dump, I’ve chosen this one (arbitrarly):

Dumping Entry:

SHIMNAME="CorelSiteBuilder"
DLLFILE="AcSpecfc.DLL"

Any continued analysis on this shim must be done through reverse engineering, since we have no hint as to what this shim is attempting to do. By using IDA on the DLL specified, one can notice it is a series of C++ classes, each which represent a specific shim (there are of course other classes such as CString and the generic Shim Engine initialization classes). The prefix for the specific shims seems to be “NS”, so it was easy to locate our target of interest: NS_CorelSiteBuilder. Every shim class also has an initialization function that gets called, and is responsible for initializing the class and its hooks. This is usually called IniitalizeHooksMulti. In the disassembly of this function, pay special attention to loc_714F3691. This is where this class initializes the API hooks that make up this specific shim (other specific shims can also have other types of hooks, such as patches or COM hooks). The tagHOOKAPI structure contains the information required to patch an API, and one can clearly see that SetWindowTextA inside user32.dll is being hooked, and re-directed to NS_CorelSiteBuilder::APIHook_SetWindowTextA.

Now the actual hook can be looked at, and I’ve provided an analyzed and commented disassembly here. This is a pretty simple hook, and seems to check on whether the window handlw and window text that are being sent as arguments match the previous window handle and window text that the shim had saved durinng the last call. If they do match, it will simply return TRUE (success) without actually calling the original API, otherwise, the hook will save the window text that’s being set as the “old” window text (so that when the hook is called again, it will compare against this name now), and then perform a call to the original API (in tagAPIHOOK+0xC) with the unmodified arguments.

In other words, the whole point of this shim is to “absorb” SetWindowTextA calls to the Corel Site Builder window if the new text that’s being set matches the previous text, and simply return success. The reason on why such a shim would be necessary is left as an excercise to the reader.

In the next article, I will release the first version of the CDD utility which I’ve used when showing some of the Shims available, and document some of its uses.

New Object Manager Filtering APIs

The new bits of the WDK have been released, and it seems that finally, we are starting to see a glimpse of some of the new filtering technologies that were promised in Vista SP1 to help with incompatibilities due to PatchGuard. Although Vista added powerful Registry filtering support on top of the existing File filtering architecture, hooking some native calls would still have been necessary in order to filter other kinds of system behaviour. The first which seems to have been addressed are handles, and Vista SP1 now supports Object Manager Filters.

Currently, only create and duplicate can be filtered, but the ability for both pre and post notifications exist. As with the new filter model, Object Manager Filters also support Altitudes, and are fully versionned. Unfortunately, this new set of APIs seems rather disappointing to me. For starters, this functionality was already available, even behind Patchguard’s back, through native Object Manager callbacks present in the OBJECT_TYPE’s ObjectTypeInitializer structure which contains all the callbacks for the object type. This interface seems to do nothing more but expose in a more public and accessible way the same ObCreateMethod interface that has existed since NT4, except that it only works for create and duplicate (while the internal interface allows for open and inherit as well).

Nevertheless, this new filtering mechanism is clearly written to be extensible for other Object Manager actions, so hopefully we’ll see some new improvements before SP1 actually ships. For the curious, here are some of the new toys to play with:

//
// Registration version for Vista SP1 and Windows Server 2007
//
#define OB_FLT_REGISTRATION_VERSION_0100  0x0100

//
// This value should be used by filters for registration
//
#define OB_FLT_REGISTRATION_VERSION OB_FLT_REGISTRATION_VERSION_0100

typedef ULONG OB_OPERATION;

#define OB_OPERATION_HANDLE_CREATE              0x00000001
#define OB_OPERATION_HANDLE_DUPLICATE           0x00000002

typedef struct _OB_PRE_CREATE_HANDLE_INFORMATION {
    __inout ACCESS_MASK         DesiredAccess;
    __in ACCESS_MASK            OriginalDesiredAccess;
} OB_PRE_CREATE_HANDLE_INFORMATION, *POB_PRE_CREATE_HANDLE_INFORMATION;

typedef struct _OB_PRE_DUPLICATE_HANDLE_INFORMATION {
    __inout ACCESS_MASK         DesiredAccess;
    __in ACCESS_MASK            OriginalDesiredAccess;
    __in PVOID                  SourceProcess;
    __in PVOID                  TargetProcess;
} OB_PRE_DUPLICATE_HANDLE_INFORMATION, * POB_PRE_DUPLICATE_HANDLE_INFORMATION;

typedef union _OB_PRE_OPERATION_PARAMETERS {
    __inout OB_PRE_CREATE_HANDLE_INFORMATION        CreateHandleInformation;
    __inout OB_PRE_DUPLICATE_HANDLE_INFORMATION     DuplicateHandleInformation;
} OB_PRE_OPERATION_PARAMETERS, *POB_PRE_OPERATION_PARAMETERS;

typedef struct _OB_PRE_OPERATION_INFORMATION {
    __in OB_OPERATION           Operation;
    union {
        __in ULONG Flags;
        struct {
            __in ULONG KernelHandle:1;
            __in ULONG Reserved:31;
        };
    };
    __in PVOID                         Object;
    __in POBJECT_TYPE                  ObjectType;
    __out PVOID                        CallContext;
    __in POB_PRE_OPERATION_PARAMETERS  Parameters;
} OB_PRE_OPERATION_INFORMATION, *POB_PRE_OPERATION_INFORMATION;

typedef struct _OB_POST_CREATE_HANDLE_INFORMATION {
    __in ACCESS_MASK            GrantedAccess;
} OB_POST_CREATE_HANDLE_INFORMATION, *POB_POST_CREATE_HANDLE_INFORMATION;

typedef struct _OB_POST_DUPLICATE_HANDLE_INFORMATION {
    __in ACCESS_MASK            GrantedAccess;
} OB_POST_DUPLICATE_HANDLE_INFORMATION, * POB_POST_DUPLICATE_HANDLE_INFORMATION;

typedef union _OB_POST_OPERATION_PARAMETERS {
    __in OB_POST_CREATE_HANDLE_INFORMATION       CreateHandleInformation;
    __in OB_POST_DUPLICATE_HANDLE_INFORMATION    DuplicateHandleInformation;
} OB_POST_OPERATION_PARAMETERS, *POB_POST_OPERATION_PARAMETERS;

typedef struct _OB_POST_OPERATION_INFORMATION {
    __in OB_OPERATION  Operation;
    union {
        __in ULONG Flags;
        struct {
            __in ULONG KernelHandle:1;
            __in ULONG Reserved:31;
        };
    };
    __in PVOID                          Object;
    __in POBJECT_TYPE                   ObjectType;
    __in PVOID                          CallContext;
    __in NTSTATUS                       ReturnStatus;
    __in POB_POST_OPERATION_PARAMETERS  Parameters;
} OB_POST_OPERATION_INFORMATION,*POB_POST_OPERATION_INFORMATION;

typedef enum _OB_PREOP_CALLBACK_STATUS {
    OB_PREOP_SUCCESS
} OB_PREOP_CALLBACK_STATUS, *POB_PREOP_CALLBACK_STATUS;

typedef OB_PREOP_CALLBACK_STATUS
(*POB_PRE_OPERATION_CALLBACK) (
    __in PVOID RegistrationContext,
    __inout POB_PRE_OPERATION_INFORMATION OperationInformation
    );

typedef VOID
(*POB_POST_OPERATION_CALLBACK) (
    __in PVOID RegistrationContext,
    __in POB_POST_OPERATION_INFORMATION OperationInformation
    );

typedef struct _OB_OPERATION_REGISTRATION {
    __in POBJECT_TYPE                *ObjectType;
    __in OB_OPERATION                Operations;
    __in POB_PRE_OPERATION_CALLBACK  PreOperation;
    __in POB_POST_OPERATION_CALLBACK PostOperation;
} OB_OPERATION_REGISTRATION, *POB_OPERATION_REGISTRATION;

typedef struct _OB_CALLBACK_REGISTRATION {
    __in USHORT                     Version;
    __in USHORT                     OperationRegistrationCount;
    __in UNICODE_STRING             Altitude;
    __in PVOID                      RegistrationContext;
    __in OB_OPERATION_REGISTRATION  *OperationRegistration;
} OB_CALLBACK_REGISTRATION, *POB_CALLBACK_REGISTRATION;

#if (NTDDI_VERSION >= NTDDI_VISTASP1)
NTKERNELAPI
NTSTATUS
ObRegisterCallbacks (
    __in POB_CALLBACK_REGISTRATION CallbackRegistration,
    __deref_out PVOID *RegistrationHandle
    );

NTKERNELAPI
VOID
ObUnRegisterCallbacks (
    __in PVOID RegistrationHandle
    );

NTKERNELAPI
USHORT
ObGetFilterVersion ();
#endif

Why Protected Processes Are A Bad Idea

If you haven’t read or heard about Protected Processes yet, start by familiarizing yourself with the whitepaper here. MarkR also covered them in his 3-part series on Vista enhancements.

But basically, they’re another part of the next-generation high-definition audio/video support present into Vista, and related to the Protected Media Path, which I had covered a bit earlier, much to people’s attention. Before continuing, let me make clear that this post isn’t related to any previous PMP stuff I have posted, is not about the so-called crack or idea I had (Which, if you haven’t read previously, turned out to be false). This entry is strictly related to Protected Processes and their non-PMP use.

· Inject a thread into a protected process

· Access the virtual memory of a protected process

· Debug an active protected process

· Duplicate a handle from a protected process

· Change the quota or working set of a protected process

· Set or retrieve context information

· Impersonate the thread

Which means that all applications such as virus scanners, malware protectors, and any other kind of application that hooks all system processes, injects threads into them or even discretely reads their memory doesn’t work on Vista when it hits a protected process. For example, Warden (the application that World of Warcraft uses to catch cheaters) can’t determine if a protected process is evil or not, because it can’t go peek inside it. To help offset this dillema, protected processes can only be loaded if they are signed, and with a special license which comes with heavy restrictions on what the process can do, how it can behave, and so on. Because of this, only true media applications will ever be protected, and legitimate applications which were used to scanning address spaces will simply skip the process, inherently assuming that the DeCSS descrambler in Windows Media Player isn’t trying to hack their MMORPG.

Unforunately, it is trivial to make a process protected or unprotected by bypassing all the Code Integrity checks and sandbox in which protected processes are supposed to run. I wrote a small application which I called D-Pin Purr which does exactly this. I tried it on the only two protected processes I know on Vista (audiodg.exe and mfpmp.exe). While ProcessXP usually shows only limited information for them, after using my tool, I could see all the information. WinDBG attached to it fine:

ChildEBP RetAddr  Args to Child
01b4fbd4 770706a0 76f777d4 000000f0 00000000 ntdll!KiFastSystemCallRet
01b4fbd8 76f777d4 000000f0 00000000 00000000 ntdll!NtWaitForSingleObject+0xc
01b4fc48 6fd82e54 000000f0 ffffffff 00000000 kernel32!WaitForSingleObjectEx+0xbe
01b4fc6c 6fd82da1 000ea420 01b4fcac 00000000 MFPlat!LFQueueGetWaitEx+0xec
01b4fc8c 6fd82d63 000ea400 01b4fcac 00000000 MFPlat!LFQueueGetWait+0x22
01b4fcb4 6fd82887 01b4fdb8 00000000 00000000 MFPlat!CCompletionPortQ::Get+0x1f
01b4fdbc 6fd889d7 01b4fdfc 761f62b6 001022e0 MFPlat!CWorkQueue::CThread::ThreadMain+0x80
01b4fdc4 761f62b6 001022e0 3da0e0d2 00000000 MFPlat!CWorkQueue::CThread::ThreadFunc+0xd
01b4fdfc 761f63de 01b4fe10 76f73833 00029420 msvcrt!_endthreadex+0x44
01b4fe04 76f73833 00029420 01b4fe50 7704a9bd msvcrt!_endthreadex+0xce
01b4fe10 7704a9bd 00029420 01b462fd 00000000 kernel32!BaseThreadInitThunk+0xe
01b4fe50 00000000 761f639b 00029420 00000000 ntdll!_RtlUserThreadStart+0x23

Here’s a simple overview of the application itself:

c:\>dpinpurr
DPINPURR processid [/P]
Description:
    This tool is used to remove or add protection on a process.
Parameter List:
        processid       Specifies the PID of the process to be unprotected.
   /P                  Specifies to protect the process instead.
c:\>dpinpurr 312 /p

[C0000156] – STATUS_TOO_MANY_SECRETS:
         Process modified successfully!
c:\>

Being able to play with the PMP application isn’t really what I was interested in, since most of the high-level security is in the kernel anyway. The intersting thing is that I can make any application of my choosing protected, and thus undebuggable, uninjectable and with its address space secure. I’ll add dpinpurr to the download area soon, and provide a link.

While I don’t want to condone writing more powerul malware or MMORPG hacking tools (or whatever else can benefit from being protected), I think it’s time to signal a wakeup call to all the developers who were counting on simply ignoring protected processes and assuming they’re legitimate media applications.

Vista DRM Issue Aftermath

I received word from Microsoft today on the status of the Vista DRM Issue that I talked about earlier. It seems that the final consensus from their internal investigation is that my method does not constitute a viable means of exploting the driver signing/DRM model. In other words, the theory I came up with that might allow PMP to be subverted seems to have been proven false.

My original idea was to use boot Vista with the /DEBUG flag and then use the internal, undocumented Kernel-Mode Debug API to load executable code in kernel-memory or to overwrite existing code (as well as to disable PatchGuard). My rationale was that PMP wouldn’t detect any issues, since no unsigned code was running in the kernel, instead, you would have code hidden in Non Paged Pool or as part of \Driver\Null’s IOCTL routine (similarly to how Johanna loaded code using the pagefile.sys). However, it seems this won’t work, I’m assuming because PMP will actually detect that you’ve booted in Debug Mode, and it will enter reduced functionality mode (Which was the hypothesis on which the entire idea depended on). Since I don’t know more about PMP, I’m not sure if this is what happens, but that’s my personal guess. Either ways, it seems DRM is here to stay for now.

Speaking of reduced functionality mode, if you turn of the Secured Licensing Service (SLsvc) in Vista, the Control Panel and Windows Update stop working. I was disabling services to get a minimalstic Vista desktop (I don’t like booting with 50 processes on startup), and I didn’t care about this service, disabling it and assuming PMP would block me from playing BluRay/HDDVD (Which I don’t have)… but I never guessed it would kill the Control Panel. Seems kinda weird.

When I get back home, I”ll post a list of the only services that I’m running on Vista. It’s got all the functionality I need (Internet, Printing, Audio). I’m getting a new hard drive for my server tonight, as well as ugprading my main desktop CPU from an AMD64 X2 3800+ to an Opteron 185. That’s a jump from 2x2GHz, 1MB Cache to 2×2.6GHz, 2MB cache. I’m hoping to overclock to 2.8GHz. Do NOT get an FX-60. They’re the exact same chip, but they cost twice as more.

Rebooting from Kernel Mode

I see this question posted on OSR Online a lot: “How do I force a reboot of the computer from kernel mode?”. The clean solution always being recommended is to have a user-mode service that talks to the driver and does the appropriate ExitWindowsEx API call. But what if you really want to do it from kernel-mode? Well, you could use HalReturnToFirmware or NtShutdownSystem, but those functions are undocumented, and you probably won’t get WHQLed if you try using them. So I’ll show you a sneaky way that does the same, but uses a fully documented kernel API. Don’t use it unless you really know what you’re doing; I personally recommend using a service as well.

Rebooting a machine from kernel mode:

KeBugCheck(POWER_FAILURE_SIMULATE);

Now, I know what you’re thinking, but you’re wrong. This will *not* bugcheck the machine. It will actually call HalReturnToFirmware(HalRebootMachine), right after processing bugcheck callbacks. No BSOD, no crash dump, just a clean, simple, immediate reboot.

Enjoy 😉

Recent Events

After initially being slashdotted, my blog post below got linked across the blogosphere, hit Digg, the Inquirer, BoingBoing and other major news sites, and I’ve reached some 60 000 visitors in less than 24 hours…

Since most of you are therefore new visitors, I just wanted to post a short introduction/information paragraph. First of all, I suggest you visit the About page of the blog, as well as my Wiki page on the ReactOS website. This is just to clear up any confusion on where I currently reside, age, education, etc. If you are interested in my other publications/works as a security researcher, you should visit the Publications page, as well as OpenRCE, where I usually post my latest articles. You can also find a recording of my REcon 2006 talk on Archive.Org. Search for my name; the PDF is available on the Publications page as well. Finally, my project, ReactOS, is having a donation fund; if you’d like to donate some money, that would be very appreciated.

As for the DRM post, I never expected that it would get the kind of attention it has; to be fair, I had completely forgotten that today was Vista’s launch date (being a beta tester, I’ve had RTM for months now); I certaintly don’t want to make it seem like I was specifically targetting this day to release anything. Later this week I will release some safe, generic, proof of concept code that targets what I believe is a flaw in the Code Integrity/Driver Signing model. My 64-bit VM is running extremly slow, so it will take me some time to test the code. Because this code will require an initial reboot, Microsoft does not consider it to be a flaw from a security standpoint. And because it’s so generic, it has absolutely nothing to do with DRM or PMP. That being said, I’m sure someone with knowledge of the PMP implementation might be able to use this as a very smart building block of the entire code that would be required; but that would be like arresting every knife manufacturer because knives can kill people.

Finally, if any of you would like more information about ReactOS or would like to meet in person, I will be giving a talk at the SOCAL5X conference on February 9th, and I will be around LA on the 10th as well.

Update on Driver Signing Bypass

I apologize for the lack of news, but after attending CUSEC, I had to spend my time on catching up the two weeks of school and work that I had missed, and exploiting Vista ended up going on the backburner, especially as I had to re-install VMWare 6.0 (which wasn’t being helpful with me) and a new Vista 64-bit image.

That being said, it turns out the code I’ve written does not work out of the box on a Vista RTM system. Although it can be effective when combined with a reboot, this doesn’t provide any advantage of any of the myriad other ways that this could be done (including booting with the disable integrity checks BCD option or the /TESTSIGN flag).

However, it does bypass DRM. As part of the Protected Media Path, (PMP), Windows Vista sets up a number of requirements for A/V software and drivers in order to ensure it complies with the demandes of the media companies. One of these features, which has been heavily criticized as being the actual reason behind driver signing, is that “some premium content may be unavailable” if test signing mode is used. Originally, I assumed that this meant that the kernel would set some sort of variable, but this didn’t make sense: once your unsigned driver could load, it could disable this check. After reading the PMP documentation however, it seems to me that the “feature” explained is more likely the cause of this warning on premium content.

This feature is the ability of the PMP to notify A/V applications that there are unsigned drivers on the system, as well as provide a list of unsigned drivers. The idea is that the application can either outright refuse to play content, or that it can scan for known anti-DRM drivers which might be attempting to hook onto the unencrypted stream. This leads me to believe that it’s up to applications, not the OS, to enforce this DRM check.

The great thing about the code I’ve written is that it does NOT use test signing mode and it does NOT load an unsigned driver into the system. Therefore, to any A/V application running, the system seems totally safe — when in fact, it’s not. Now, because I’m still booting with a special flag, it’s possible for Microsoft to patch the PMP and have it report that this flag is set, thereby disabling premium content. However, beause I already have kernel-mode code running at this point, I can disable this flag in memory, and PMP will never know that it was enabled. Again, Microsoft could fight this by caching the value, or obfuscating it somewhere inside PMP’s kernel-mode code, but as long as it’s in kernel-mode, and I’ve got code in kernel-mode, I can patch it.

To continue this game, Microsoft could then use Patchguard on the obfuscated value…but that would only mean that I can simply disable Patchguard using the numerous methods that Skywing documented in his latest paper.

In the end, the only way that PMP is going to work is with a Hypervisor, and even that will probably fail.

Unfortunately, with almost 0% use for the open source community (which can use test signing mode for their drivers), documenting my method and/or releasing a sample might be viewed as an anti-DRM tool, and defintely a DMCA violation. Although used on its own, this POC doesn’t do anything or go anywhere near the PMP (I don’t even have Protected Media, HDMI, HD-DVD, nor do I know where PMP lives or how someone can intercept decrypted steams), a particularly nasty group of lawyers could still somehow associate the DMCA to it, so I’m not going to take any chances.

It’s quite ironic — Microsoft claims driver signing is to fight malware and increase system stability, so if I get sued under DMCA, wouldn’t that be an admission that driver signing is a “anti-copyright infringment tool”?.

I’d really love to release this tool to the public though, so I will look into my options — perhaps emphasizing the research aspect of it and crippling the binary would be a safe way.

Windows Vista 64-bit Driver Signing/PatchGuard Workaround

I’ve been sitting on this one for a while (over a year), awaiting confirmation of a final key component in the procedure, but I’ve now been able to test my method.
I will be spending tomorrow finishing up the paper and exploit code on my test Virtual PC image. Before you get all excited, please keep in mind this is a local, administrative-account-required workaround for the driver-signing requirement in Vista 64-bit and has no security implications what so ever.

Since I wasn’t able to get a working POC until now, I haven’t made a lot of noise about it… if I get it working right tomorrow, I will probably send a little note to Microsoft to make sure they don’t go medieval on my ass — it has zero customer impact so I don’t think they will, but I apologize if I’ll have to can it.