最近学习 yoda 1.2 , 发现TLS 部分处理不太明白, yoda 只是简单地把IMAGE_TLS_DIRECTORY32 复制到其他地方, 加壳后好象也没什么问题.
IMAGE_TLS_DIRECTORY32 STRUCT
StartAddressOfRawData dd ?
EndAddressOfRawData dd ?
AddressOfIndex dd ?
AddressOfCallBacks dd ?
SizeOfZeroFill dd ?
Characteristics dd ?
IMAGE_TLS_DIRECTORY32 ENDS
后来我在 yoda 的基础上 加了 aPlib, 问题就出来了, 程序不能运行了, 调试后发现问题就出在 TLS 上, 看来仅仅把IMAGE_TLS_DIRECTORY32 复制一份是不够的, 加壳时要完美处理TLS 好象比较困难, 不知各位写壳高手有什么好的办法? 还好, 用到 TLS 的程序好象不多.
附件点击下载。
下面是我从 MSDN 找到的有关 TLS 的一些资料
6.7. The .tls Section
The .tls section provides direct PE/COFF support for static Thread Local Storage (TLS). TLS is a special storage class supported by Windows NT, in which a data object is not an automatic (stack) variable, yet it is local to each individual thread that runs the code. Thus, each thread can maintain a different value for a variable declared using TLS.
Note that any amount of TLS data can be supported by using the API calls TlsAlloc, TlsFree, TlsSetValue, and TlsGetValue. The PE/COFF implementation is an alternative approach to using the API, and it has the advantage of being simpler from the high-level-language programmer’s point of view. This implementation enables TLS data to be defined and initialized in a manner similar to ordinary static variables in a program. For example, in Microsoft Visual C++, a static TLS variable can be defined as follows, without using the Windows API:
__declspec (thread) int tlsFlag = 1;
To support this programming construct, the PE/COFF .tls section specifies the following information: initialization data, callback routines for per-thread initialization and termination, and the TLS index explained in the following discussion.
Note Statically declared TLS data objects can be used only in statically loaded image files. This fact makes it unreliable to use static TLS data in a DLL unless you know that the DLL, or anything statically linked with it, will never be loaded dynamically with the LoadLibrary API function.
Executable code accesses a static TLS data object through the following steps:
1. At link time, the linker sets the Address of Index field of the TLS Directory. This field points to a location where the program will expect to receive the TLS index.
The Microsoft run-time library facilitates this process by defining a memory image of the TLS Directory and giving it the special name “__tls_used” (Intel x86 platforms) or “_tls_used” (other platforms). The linker looks for this memory image and uses the data there to create the TLS Directory. Other compilers that support TLS and work with the Microsoft linker must use this same technique.
2.
When a thread is created, the loader communicates the address of the
thread’s TLS array by placing the address of the Thread Environment Block
(TEB) in the FS register. A pointer to the TLS array is at the offset of 0x
3. The loader assigns the value of the TLS index to the place indicated by the Address of Index field.
4. The executable code retrieves the TLS index and also the location of the TLS array.
5. The code uses the TLS index and the TLS array location (multiplying the index by four and using it as an offset to the array) to get the address of the TLS data area for the given program and module. Each thread has its own TLS data area, but this is transparent to the program, which doesn’t need to know how data is allocated for individual threads.
6. An individual TLS data object is accessed as some fixed offset into the TLS data area.
The TLS array is an array of addresses that the system maintains for each thread. Each address in this array gives the location of TLS data for a given module (.EXE or DLL) within the program. The TLS index indicates which member of the array to use. (The index is a number, meaningful only to the system that identifies the module).
6.7.1 . The TLS Directory
The TLS Directory has the following format:
Offset (PE32/PE32+) |
Size (PE32/PE32+) |
Field |
Description |
0 |
4/8 |
Raw
Data |
Starting address of the TLS template. The template is a block of data used to initialize TLS data. The system copies all this data each time a thread is created, so it must not be corrupted. Note that this address is not an RVA; it is an address for which there should be a base relocation in the .reloc section. |
4/8 |
4/8 |
Raw
Data |
Address of the last byte of the TLS, except for the zero fill. As with the Raw Data Start VA, this is a virtual address, not an RVA. |
8/16 |
4/8 |
Address of Index |
Location to receive the TLS index, which the loader assigns. This location is in an ordinary data section, so it can be given a symbolic name accessible to the program. |
12/24 |
4/8 |
Address of Callbacks |
Pointer to an array of TLS callback functions. The array is null-terminated, so if there is no callback function supported, this field points to four bytes set to zero. The prototype for these functions is given below, in “TLS Callback Functions.” |
16/32 |
4 |
Size of Zero Fill |
The size in bytes of the template, beyond the initialized data delimited by Raw Data Start VA and Raw Data End VA. The total template size should be the same as the total size of TLS data in the image file. The zero fill is the amount of data that comes after the initialized nonzero data. |
20/36 |
4 |
Characteristics |
Reserved for possible future use by TLS flags. |
6.7.2 . TLS Callback Functions
The program can provide one or more TLS callback functions (though Microsoft compilers do not currently use this feature) to support additional initialization and termination for TLS data objects. A typical reason to use such a callback function would be to call constructors and destructors for objects.
Although there is typically no more than one callback function, a callback is implemented as an array to make it possible to add additional callback functions if desired. If there is more than one callback function, each function is called in the order its address appears in the array. A null pointer terminates the array. It is perfectly valid to have an empty list (no callback supported), in which case the callback array has exactly one member—a null pointer.
The prototype for a callback function (pointed to by a pointer of type PIMAGE_TLS_CALLBACK) has the same parameters as a DLL entry-point function:
typedef VOID (NTAPI *PIMAGE_TLS_CALLBACK) ( PVOID DllHandle, DWORD Reason, PVOID Reserved );
The Reserved parameter should be left set to 0. The Reason parameter can take the following values:
Setting |
Value |
Description |
DLL_PROCESS_ATTACH |
1 |
New process has started, including the first thread. |
DLL_THREAD_ATTACH |
2 |
New thread has been created (this notification sent for all but the first thread). |
DLL_THREAD_DETACH |
3 |
Thread is about to be terminated (this notification sent for all but the first thread). |
DLL_PROCESS_DETACH |
0 |
Process is about to terminate, including the original thread. |
一. 上面这段文章你看懂了吗? 反正我是糊里糊涂的, 最后动了一下手, 才有点明白.
结合实例我会简单翻译一下.
用 Delphi 6 建立 一个最小的文件test.dpr , 编译后得到 test.exe
Program test;
{$APPTYPE CONSOLE}
Begin
end.
看看TLS 的情况:
1. At link time, the linker sets the Address of Index field of the TLS Directory. This field points to a location where the program will expect to receive the TLS index.
连接时连接器指定 AddressOfIndex 值, 通过这个指针, 程序取 TLS index 值.
这里是 4036D0 . 如果通过API使用 TLS, 那就要 int dwTlsIndex = TlsAlloc();
2. When a thread is created, the loader communicates the address of the thread’s TLS array by placing the address of the Thread Environment Block (TEB) in the FS register. A pointer to the TLS array is at the offset of 0x2C from the beginning of TEB. This behavior is Intel x86 specific.
FS:[2Ch] 是指向 TLS 变量指针数组的指针, 这个数组最长 64, 以 0 结束.
这里要注意, FS 对每个Thread 都不同.
typedef struct _TEB {
NT_TIB Tib;
PVOID EnvironmentPointer;
CLIENT_ID Cid;
PVOID ActiveRpcInfo;
PVOID ThreadLocalStoragePointer; ; 2ch
PPEB Peb; ; 30h
ULONG LastErrorValue; ; 34h
…}
3. The loader assigns the value of the TLS index to the place indicated by the Address of Index field.
程序加载时, Loader 给 [4036D0] 赋值. 一般是 0, 指向第一个变量. 注意所有线程共用 [4036D0]
5. The code uses the TLS index and the TLS array location (multiplying the index by four and using it as an offset to the array) to get the address of the TLS data area for the given program and module. Each thread has its own TLS data area, but this is transparent to the program, which doesn’t need to know how data is allocated for individual threads.
Mov ecx, [4036d0] ; tls index
Shl ecx, 2 ; *4
Mov eax, FS:[2ch]
Add eax, ecx
Mov eax, [eax] ; eax -> TLS data area
由于 每个 Thread 的 FS 不同, 所以指向不同的 area, 互不影响.
这里有个问题, 我跟了一下 TlsGetValue, 发觉和上面的资料有差别, why?
7C59C0FB > 55 PUSH EBP
7C59C0FC 8BEC MOV EBP,ESP
7C59C0FE 64:A1 18000000 MOV EAX,DWORD PTR FS:[18] ; TEB
7C59C104 8B4D 08 MOV ECX,DWORD PTR SS:[EBP+8] ; Tls index
7C59C107 83F9 40 CMP ECX,40 ; 最大64
7C59C10A 73 0D JNB SHORT kernel32.7C59C119
7C59C10C 8360 34 00 AND DWORD PTR DS:[EAX+34], 0 ; LastErrorValue
7C59C110 8B8488 100E0000 MOV EAX,DWORD PTR DS:[EAX+ECX*4+E10] ; 有点不一样?
7C59C117 EB 14 JMP SHORT kernel32.7C59C12D
7C59C119 81F9 40040000 CMP ECX,440
7C59C11F 72 10 JB SHORT kernel32.7C59C131
7C59C121 68 0D0000C0 PUSH C000000D
7C59C126 E8 0EC1FDFF CALL kernel32.7C578239
7C59C12B 33C0 XOR EAX,EAX
7C59C12D 5D POP EBP
7C59C12E C2 0400 RETN 4
二.
StartAddressOfRawData dd ?
EndAddressOfRawData dd ?
SizeOfZeroFill dd ?
这三个和 Tls Data 初始数据有关, StartAddressOfRawData 数据开始地址, EndAddressOfRawData 非零数据结束地址, SizeofZeroFill 是后面跟零的个数.
线程创建时, 数据从这里复制.
三. TLS Callback Functions
这是线程建立和退出时的回调函数, 包括主线程和其他线程.
AddressOfCallBacks 是指向函数指针数组的指针, 以 0 结束.
这里 [406010]=0, 表示没有回调函数.
四. 下面我们 PEDiy 一下 test.exe, 修改如下
用 OD 改一下 程序
00402100 00 1F 40 00 20 1F 40 00 00 00 00 00 00 00 00 00 .¬@. ¬@......... ; CallBack
00402110 54 6C 73 20 73 74 72 2E 00 00 00 00 00 00 00 00 Tls str......... ; Tls Initial Data
00401EF0 52 65 61 64 54 4C 53 00 53 65 74 20 54 4C 53 00 ReadTLS.Set TLS.
00401F00 ? 60 PUSHAD
00401F01 ? A1 D0364000 MOV EAX, DWORD PTR DS:[4036D0]
00401F06 ? 50 PUSH EAX
00401F07 ? E8 F0FDFFFF CALL <JMP.&kernel32.TlsGetValue>
00401F0C . 6A 00 PUSH 0 ; /Style = MB_OK|MB_APPLMODAL
00401F0E . 68 F01E4000 PUSH test.00401EF0 ; |Title = "ReadTLS"
00401F13 . 50 PUSH EAX ; |Text
00401F14 . 6A 00 PUSH 0 ; |hOwner = NULL
00401F16 . E8 15F1FFFF CALL <JMP.&user32.MessageBoxA> ; \MessageBoxA
00401F1B . 61 POPAD
00401F1C > 33C0 XOR EAX, EAX
00401F1E . C3 RETN
00401F1F 00 DB 00
00401F20 . 60 PUSHAD
00401F21 . 68 F81E4000 PUSH test.00401EF8 ; /pValue = test.00401EF8
00401F26 . A1 D0364000 MOV EAX,DWORD PTR DS:[4036D0] ; |
00401F2B . 50 PUSH EAX ; |TlsIndex => 0
00401F2C . E8 D3FDFFFF CALL <JMP.&kernel32.TlsSetValue> ; \TlsSetValue
00401F31 . 61 POPAD
00401F32 . 33C0 XOR EAX,EAX
00401F34 . C3 RETN
OD 设置Debugging options/Events/System breakpoint.
载入 test.exe , 停在这里,
77F813B1 > CC INT3 ; 用过 DebugAPI 都应该清楚这里吧
77F813B2 C3 RETN
004036D0 00 00 00 00 ; Tls index = 0 ....
7FFDE000 0C FD 12 00 00 00 13 00 00 C0 12 00 00 00 00 00
7FFDE010 00 1E 00 00 00 00 00 00 00 E0 FD 7F 00 00 00 00
7FFDE020 8C 03 00 00 C0 04 00 00 00 00 00 00 E8 24 13 00 ; 2ch
001324E8 08 25 13 00 ABABABAB ABABABAB EE FE EE FE
001324F8 00 00 00 00 00 00 00 00 04 00 04 00 00 07 18 00
00132508 54 6C 73 20 73 74 72 2E AB AB AB AB AB AB AB AB Tls str.
可以看到这个数据已经被初始化了, 可惜 GetTlsValue 不是调的这里
F4 到下面, 中间有好多 DLL 处理
77F963FE 64:A1 18000000 MOV EAX,DWORD PTR FS:[18]
77F96404 8945 AC MOV DWORD PTR SS:[EBP-54],EAX
77F96407 6A 01 PUSH 1
77F96409 8B40 30 MOV EAX,DWORD PTR DS:[EAX+30]
77F9640C FF70 08 PUSH DWORD PTR DS:[EAX+8]
77F9640F E8 5DFCFFFF CALL ntdll.77F96071 ; F7 进入 , 处理 TLS 回调函数
77F96071 55 PUSH EBP
77F96072 8BEC MOV EBP,ESP
77F96074 6A FF PUSH -1
77F96076 68 E860F977 PUSH ntdll.77F960E8
77F9607B 68 551FF877 PUSH ntdll.77F81F55
77F96080 64:A1 00000000 MOV EAX,DWORD PTR FS:[0]
77F96086 50 PUSH EAX
77F96087 64:8925 0000000> MOV DWORD PTR FS:[0],ESP
77F9608E 51 PUSH ECX
77F9608F 51 PUSH ECX
77F96090 83EC 10 SUB ESP,10
77F96093 53 PUSH EBX
77F96094 56 PUSH ESI
77F96095 57 PUSH EDI
77F96096 8965 E8 MOV DWORD PTR SS:[EBP-18],ESP
77F96099 8D45 DC LEA EAX,DWORD PTR SS:[EBP-24]
77F9609C 50 PUSH EAX ; 保存 Tls. size
77F9609D 6A 09 PUSH 9 ; TLS 是第九项
77F9609F 6A 01 PUSH 1
77F960A1 8B7D 08 MOV EDI,DWORD PTR SS:[EBP+8]
77F960A4 57 PUSH EDI ; 400000h ImageBase
77F960A5 E8 1FD1FEFF CALL ntdll.RtlImageDirectoryEntryToData ; 返回 Tls.VirtualAddress
77F960AA 8365 FC 00 AND DWORD PTR SS:[EBP-4],0 ; no error
77F960AE 85C0 TEST EAX,EAX
77F960B0 74 21 JE SHORT ntdll.77F960D3 ; Tls. VirtualAddres = 0 ret
77F960B2 8B70 0C MOV ESI,DWORD PTR DS:[EAX+C] ; AddressOfCallBacks
77F960B5 8975 D8 MOV DWORD PTR SS:[EBP-28],ESI
77F960B8 85F6 TEST ESI,ESI
77F960BA 74 17 JE SHORT ntdll.77F960D3 ; AddressOfCallBacks =0 ret
77F960BC 803D AD03FD77 0> CMP BYTE PTR DS:[77FD03AD],0 ; ???
77F960C3 0F85 4B350000 JNZ ntdll.77F99614
77F960C9 8B06 MOV EAX,DWORD PTR DS:[ESI] ; 第一个 回调函数
77F960CB 85C0 TEST EAX,EAX
77F960CD 0F85 56350000 JNZ ntdll.77F99629 ; 为 NULL, ret
77F960D3 834D FC FF OR DWORD PTR SS:[EBP-4],FFFFFFFF
77F960D7 8B4D F0 MOV ECX,DWORD PTR SS:[EBP-10]
77F960DA 64:890D 0000000> MOV DWORD PTR FS:[0],ECX
77F960E1 5F POP EDI
77F960E2 5E POP ESI
77F960E3 5B POP EBX
77F960E4 C9 LEAVE
77F960E5 C2 0800 RETN 8 ; ret to 77F96414
77F99629 8945 E0 MOV DWORD PTR SS:[EBP-20],EAX ; test.00401F00
77F9962C 83C6 04 ADD ESI,4 ; 预备下一个
77F9962F 8975 D8 MOV DWORD PTR SS:[EBP-28],ESI
77F99632 803D AD03FD77 0>CMP BYTE PTR DS:[77FD03AD],0
77F99639 /74 0F JE SHORT ntdll.77F9964A
77F9963B |50 PUSH EAX
77F9963C |57 PUSH EDI
77F9963D |68 DE95F977 PUSH ntdll.77F995D ; "LDR: Calling Tls Callback Imagebase %lx Function %lx"
77F99642 |E8 D873FFFF CALL ntdll.DbgPrint
77F99647 |83C4 0C ADD ESP,0C
77F9964A \6A 00 PUSH 0 ; 回调函数的三个参数
77F9964C FF75 0C PUSH DWORD PTR SS:[EBP+C] ; 1 表示主线程创建
77F9964F 57 PUSH EDI ; pHandle
77F99650 FF75 E0 PUSH DWORD PTR SS:[EBP-20] ; pCallBack
77F99653 E8 7B9AFEFF CALL ntdll.77F830D3
77F99658 ^ E9 6CCAFFFF JMP ntdll.77F960C9 ; 下一个
77F830D3 55 PUSH EBP
77F830D4 8BEC MOV EBP,ESP
77F830D6 56 PUSH ESI
77F830D7 57 PUSH EDI
77F830D8 53 PUSH EBX
77F830D9 8BF4 MOV ESI,ESP
77F830DB FF75 14 PUSH DWORD PTR SS:[EBP+14]
77F830DE FF75 10 PUSH DWORD PTR SS:[EBP+10]
77F830E1 FF75 0C PUSH DWORD PTR SS:[EBP+C]
77F830E4 FF55 08 CALL DWORD PTR SS:[EBP+8] ; 调用我们的两个回调函数
77F830E7 8BE6 MOV ESP,ESI
77F830E9 5B POP EBX
77F830EA 5F POP EDI
77F830EB 5E POP ESI
77F830EC 5D POP EBP
77F830ED C2 1000 RETN 10
77F96414 ^\E9 7375FFFF JMP ntdll.77F8D98C
F4 到下面
77F9FF3B 6A 01 PUSH 1
77F9FF3D 57 PUSH EDI
77F9FF3E E8 3E29FEFF CALL ntdll.ZwContinue ; F8 到 OEP
00401E74 > $ 55 PUSH EBP
00401E75 . 8BEC MOV EBP,ESP
00401E77 . 83C4 F0 ADD ESP,-10
00401E7A . A1 90204000 MOV EAX,DWORD PTR DS:[402090]
00401E7F . C600 01 MOV BYTE PTR DS:[EAX],1
00401E82 . B8 541E4000 MOV EAX,test.00401E54
00401E87 . E8 24FFFFFF CALL test.00401DB0
00401E8C . E8 6BFBFFFF CALL test.004019FC
在程序结束时, TLS 回调函数还将再次调用.
五. 从上面可见, Tls 数据初试化和 TLS回调函数都在 OEP 之前进行, 如果程序加壳, 壳的 Loader 这时还没有起作用, 代码和 IAT 都没有解密, 要执行 回调函数肯定出错.
还要注意, IMAGE_TLS_DIRECTORY32 的数据都是 VA, 不是 RVA, 还涉及到重定位的问题.
不过, 有TLS 回调函数的程序不多, 壳只要简单地复制 IMAGE_TLS_DIRECTORY32 就可以了.
我遇到的问题, 本来 406010 处为 0, 用 aPLib 一压后, 变成了 60h, 多出了一个回调函数, 所以出错. 简单处理了一下, 不压缩 .rdata 区段.