While developping the Native Development Library (NDL) that I’m working on, I attempted to play with a very undocumented feature of the Rtl Heap APIs: Tagging.
If you’ve used the familiar ExAllocatePool APIs in kernel-mode, then you’re already familiar with tagging. The Heap Manager supports the same idea, but allows you to define your own string tags of arbitrary size. This is done by a rather complex set of global flags, special APIs with strange string formatting (RtlCreateTagHeap), and a hidden little macro in winnt.h. Here’s how heap tagging works in the NDL:
A function called NdlpAllocateMemoryInternal allows the caller (the NDL) to allocate memory from the NDL Heap with a specific size, flags, and tag. The tag here is an index that we can define ourselves, such as NDL_STRING_TAG which is 0x2. Then, the NDL has other internal and/or external functions which allocate memory. For example, the LPC routines need to allocate PORT_MESSAGEs or other structures, so NDL_COMMUNICATIONS_TAG is used when calling NdlpAllocateMemoryInternal. There is also NdlpAllocateString, which uses NDL_STRING_TAG. Finally, users of the NDL (your application itself) gets an API called NdlAllocateMemory. You only provide the size and flags, and internally the NDL will set the NDL_USER_TAG to your allocation.
So far so good.
Now there’s two cool things we can do. First, the RtlQueryTagHeap API allows you to obtain statistics on each tag. Allocations, frees, and bytes allocated. This can give you a nice memory map of the NDL’s current memory usage. Even better however, by using RtlWalkHeap, the NDL can scan for all active NDL_USER_TAG allocations. This is useful, since when your native application returns, an internal call to NdlUnregisterApplication is made. When this happens, the assumption is made that your code is done executing (unless you’ve registered as a “resident” application), so in order to promote good programming and to catch leaks, RtlWalkHeap is called, and all active heap entries are scanned. If a block with the NDL_USER_TAG tag index is found, a debug message is printed out, saying that a heap entry at 0xFOO of size 0xBAR is leaking. We can then use the User-Mode Stack Trace Database support and the AllocatorBackTraceIndex of the heap entry to give a complete stack trace on where this allocation was made.
So far so good. Or Not.
Turns out I was getting Tag Indeces such as 0x8007, 0x8004, etc. It seems that all heap allocations were instead indexed with 0x8000 | CurrentAllocationIndex. This wasn’t helpful at all, so I started analyzing the problem.
The first one is the way in which heap tags are generated and then saved. To generate a tag, you use the MAKE_HEAP_TAG macro in winnt.h. This macro takes a “Tag base”, which is what RtlCreateTagHeap returns to you, as well as a tag index, which you define yourself, for example 0x2. The operation that’s done is Base | (Index << 18). So for index 2, with a base of 0x40000, this gives us 0xC0000. The problem is that when RtlpUpdateTagEntry is done, the code does the following: shr ebx, 12h and ebx, 0FFFF0FFFh EBX contains the heap flags, which are the actual HEAP_XXX flags ORed with the tag. Suppose we didn't use any flags, and are just sending our heap tag, 0xC0000. The result of this operation will be 3, not 2, because nothing is done to take into account the heap tag base. However, this bug should cause us to get tag indeces that are off-by-one, not in the 0x8000 range. So more must be going on. Recall that ebx also contains the typical heap flags. Some heap flags are as small as 0x8, others are bigger such as 0x100, and others yet are as high as 0x40000000. You can start seeing how this can corrupt this check. To make matters worse, when using a stack trace database, the heap understands that it's working in "debugging mode", so it calls a different set of APIs, such as RtlAllocateHeapSlowly and RtlDebugAllocateHeap. The latter ORs in some flags by default, such as Heap->ForceFlags, as well as HEAP_DISABLE_VALIDATION_CHECKS and HEAP_USER_SETTABLE_FLAGS. In my case, the total mask of the flags being ORed in was 0x50100000. Let’s bring in our heap tag, and the total becomes 501C0000. Let’s do the broken EBX code again, and the tag index becomes 0x407. Now RtlpUpdateTagEntry will check if 0x407 is above Heap->HighestTagIndex, and since I’ve created a lot less then 1031 tags, it will think this is a “pseudo-tag”. A pseudo-tag is the combinaiton of HEAP_PSEUDO_TAG_MASK and the curent allocaition index…and you’ve gussed it, that mask is 0x8000.
Thankfully, I was able to find a workaround for the NDL, although not with a small (but not critical) loss of functionality. First, I disabled support for stack backtraces. It makes finding your leak a big harder, but it’s not the end of the world, since this functionality is provided as a small benefit anyway. Since the stack trace functions are exported by Rtl, I will simply modify NdlAllocateMemory to capture the trace by itself. I can then use RtlSetUserFlagsHeap to associate the backtrace index or another similar device. If I want to get more evil, I can probably also play with the _HEAP_ENTRY structure itself and set the backtrace index myself.
The second “fix” was not to use the MAKE_HEAP_TAG macro at all, and ignore the “Tag base”. This solves the off-by-one problem but won’t work very reliably because it can conflict with actual heap flags.
This problem is on Win 2000 and XP. I haven’t checked Windows 2003 or Vista yet, but it’s possible that Vista fixed it after Adrian’s rewrite of code for higher security.