最近学习 yoda 1.2 ,  发现TLS 部分处理不太明白,  yoda 只是简单地把IMAGE_TLS_DIRECTORY32 复制到其他地方,  加壳后好象也没什么问题.

IMAGE_TLS_DIRECTORY32 STRUCT
    StartAddressOfRawData   dd    ?
    EndAddressOfRawData    dd    ?
    AddressOfIndex                dd     ?
    AddressOfCallBacks         dd     ?
    SizeOfZeroFill                   dd    ?
    Characteristics                   dd     ?
IMAGE_TLS_DIRECTORY32 ENDS

后来我在 yoda 的基础上 加了 aPlib,  问题就出来了,  程序不能运行了, 调试后发现问题就出在  TLS 上, 看来仅仅把IMAGE_TLS_DIRECTORY32 复制一份是不够的, 加壳时要完美处理TLS 好象比较困难, 不知各位写壳高手有什么好的办法?  还好, 用到 TLS 的程序好象不多. 

附件点击下载。


下面是我从 MSDN 找到的有关 TLS 的一些资料

6.7. The .tls Section

The .tls section provides direct PE/COFF support for static Thread Local Storage (TLS). TLS is a special storage class supported by Windows NT, in which a data object is not an automatic (stack) variable, yet it is local to each individual thread that runs the code. Thus, each thread can maintain a different value for a variable declared using TLS.

Note that any amount of TLS data can be supported by using the API calls TlsAlloc, TlsFree, TlsSetValue, and TlsGetValue. The PE/COFF implementation is an alternative approach to using the API, and it has the advantage of being simpler from the high-level-language programmer’s point of view. This implementation enables TLS data to be defined and initialized in a manner similar to ordinary static variables in a program. For example, in Microsoft Visual C++, a static TLS variable can be defined as follows, without using the Windows API:

__declspec (thread) int tlsFlag = 1;

 

To support this programming construct, the PE/COFF .tls section specifies the following information: initialization data, callback routines for per-thread initialization and termination, and the TLS index explained in the following discussion.

Note   Statically declared TLS data objects can be used only in statically loaded image files. This fact makes it unreliable to use static TLS data in a DLL unless you know that the DLL, or anything statically linked with it, will never be loaded dynamically with the LoadLibrary API function.

Executable code accesses a static TLS data object through the following steps:

1.         At link time, the linker sets the Address of Index field of the TLS Directory. This field points to a location where the program will expect to receive the TLS index.

The Microsoft run-time library facilitates this process by defining a memory image of the TLS Directory and giving it the special name “__tls_used” (Intel x86 platforms) or “_tls_used” (other platforms). The linker looks for this memory image and uses the data there to create the TLS Directory. Other compilers that support TLS and work with the Microsoft linker must use this same technique.

2.         When a thread is created, the loader communicates the address of the thread’s TLS array by placing the address of the Thread Environment Block (TEB) in the FS register. A pointer to the TLS array is at the offset of 0x2C from the beginning of TEB. This behavior is Intel x86 specific.

3.         The loader assigns the value of the TLS index to the place indicated by the Address of Index field.

4.         The executable code retrieves the TLS index and also the location of the TLS array.

5.         The code uses the TLS index and the TLS array location (multiplying the index by four and using it as an offset to the array) to get the address of the TLS data area for the given program and module. Each thread has its own TLS data area, but this is transparent to the program, which doesn’t need to know how data is allocated for individual threads.

6.         An individual TLS data object is accessed as some fixed offset into the TLS data area.

The TLS array is an array of addresses that the system maintains for each thread. Each address in this array gives the location of TLS data for a given module (.EXE or DLL) within the program. The TLS index indicates which member of the array to use. (The index is a number, meaningful only to the system that identifies the module).

6.7.1. The TLS Directory

The TLS Directory has the following format:

Offset (PE32/PE32+)

Size (PE32/PE32+)

Field

Description

0

4/8

Raw Data Start VA (Virtual Address)

Starting address of the TLS template. The template is a block of data used to initialize TLS data. The system copies all this data each time a thread is created, so it must not be corrupted. Note that this address is not an RVA; it is an address for which there should be a base relocation in the .reloc section.

4/8

4/8

Raw Data End VA

Address of the last byte of the TLS, except for the zero fill. As with the Raw Data Start VA, this is a virtual address, not an RVA.

8/16

4/8

Address of Index

Location to receive the TLS index, which the loader assigns. This location is in an ordinary data section, so it can be given a symbolic name accessible to the program.

12/24

4/8

Address of Callbacks

Pointer to an array of TLS callback functions. The array is null-terminated, so if there is no callback function supported, this field points to four bytes set to zero. The prototype for these functions is given below, in “TLS Callback Functions.”

16/32

4

Size of Zero Fill

The size in bytes of the template, beyond the initialized data delimited by Raw Data Start VA and Raw Data End VA. The total template size should be the same as the total size of TLS data in the image file. The zero fill is the amount of data that comes after the initialized nonzero data.

20/36

4

Characteristics

Reserved for possible future use by TLS flags.

 

6.7.2. TLS Callback Functions

The program can provide one or more TLS callback functions (though Microsoft compilers do not currently use this feature) to support additional initialization and termination for TLS data objects. A typical reason to use such a callback function would be to call constructors and destructors for objects.

Although there is typically no more than one callback function, a callback is implemented as an array to make it possible to add additional callback functions if desired. If there is more than one callback function, each function is called in the order its address appears in the array. A null pointer terminates the array. It is perfectly valid to have an empty list (no callback supported), in which case the callback array has exactly one member—a null pointer.

The prototype for a callback function (pointed to by a pointer of type PIMAGE_TLS_CALLBACK) has the same parameters as a DLL entry-point function:

typedef VOID   (NTAPI *PIMAGE_TLS_CALLBACK) ( PVOID DllHandle,  DWORD Reason,  PVOID Reserved  );

 

The Reserved parameter should be left set to 0. The Reason parameter can take the following values:

Setting

Value

Description

DLL_PROCESS_ATTACH

1

New process has started, including the first thread.

DLL_THREAD_ATTACH

2

New thread has been created (this notification sent for all but the first thread).

DLL_THREAD_DETACH

3

Thread is about to be terminated (this notification sent for all but the first thread).

DLL_PROCESS_DETACH

0

Process is about to terminate, including the original thread.



一. 上面这段文章你看懂了吗?   反正我是糊里糊涂的,  最后动了一下手,  才有点明白.
结合实例我会简单翻译一下.

用 Delphi 6 建立 一个最小的文件test.dpr , 编译后得到  test.exe

Program test;
{$APPTYPE CONSOLE}
Begin
end.

看看TLS 的情况:

  

 

1.  At link time, the linker sets the Address of Index field of the TLS Directory. This field points to a location where the program will expect to receive the TLS index. 

连接时连接器指定  AddressOfIndex    值,   通过这个指针, 程序取 TLS index 值.
 这里是  4036D0 .  如果通过API使用 TLS,  那就要 int dwTlsIndex = TlsAlloc();


2.  When a thread is created, the loader communicates the address of the thread’s TLS array by placing the address of the Thread Environment Block (TEB) in the FS register. A pointer to the TLS array is at the offset of 0x2C from the beginning of TEB. This behavior is Intel x86 specific.

FS:[2Ch]  是指向 TLS 变量指针数组的指针, 这个数组最长 64,  以 0 结束.  
这里要注意, FS 对每个Thread 都不同.   

typedef struct _TEB {
  NT_TIB                  Tib;
  PVOID                   EnvironmentPointer;
  CLIENT_ID               Cid;
  PVOID                   ActiveRpcInfo;
  PVOID                   ThreadLocalStoragePointer;           ; 2ch
  PPEB                    Peb;                                 ; 30h
  ULONG                   LastErrorValue;                      ; 34h
   …}


3.  The loader assigns the value of the TLS index to the place indicated by the Address of Index field.
程序加载时,  Loader 给 [4036D0] 赋值.  一般是 0, 指向第一个变量.  注意所有线程共用 [4036D0]

5.  The code uses the TLS index and the TLS array location (multiplying the index by four and using it as an offset to the array) to get the address of the TLS data area for the given program and module. Each thread has its own TLS data area, but this is transparent to the program, which doesn’t need to know how data is allocated for individual threads.

Mov    ecx,  [4036d0]  ; tls index
Shl      ecx,  2               ; *4
Mov   eax,   FS:[2ch]
Add    eax,  ecx
Mov   eax,   [eax]         ; eax -> TLS data area

由于 每个 Thread 的 FS 不同, 所以指向不同的 area,  互不影响.

这里有个问题, 我跟了一下  TlsGetValue,  发觉和上面的资料有差别,  why?

7C59C0FB >  55                             PUSH EBP
7C59C0FC    8BEC                        MOV EBP,ESP
7C59C0FE    64:A1 18000000        MOV EAX,DWORD PTR FS:[18]             ; TEB
7C59C104    8B4D 08                     MOV ECX,DWORD PTR SS:[EBP+8]      ; Tls index
7C59C107    83F9 40                      CMP ECX,40                                               ; 最大64
7C59C10A    73 0D                         JNB SHORT kernel32.7C59C119

7C59C10C    8360 34 00                 AND DWORD PTR DS:[EAX+34], 0        ; LastErrorValue
7C59C110    8B8488 100E0000     MOV EAX,DWORD PTR DS:[EAX+ECX*4+E10]   ; 有点不一样?
7C59C117    EB 14                          JMP SHORT kernel32.7C59C12D

7C59C119    81F9 40040000           CMP ECX,440
7C59C11F    72 10                            JB SHORT kernel32.7C59C131
7C59C121    68 0D0000C0              PUSH C000000D
7C59C126    E8 0EC1FDFF             CALL kernel32.7C578239
7C59C12B    33C0                           XOR EAX,EAX
7C59C12D    5D                               POP EBP
7C59C12E    C2 0400                       RETN 4


二.
    StartAddressOfRawData   dd    ?
    EndAddressOfRawData    dd    ?
    SizeOfZeroFill                   dd    ?
    
  这三个和  Tls Data 初始数据有关,     StartAddressOfRawData  数据开始地址,  EndAddressOfRawData  非零数据结束地址,    SizeofZeroFill 是后面跟零的个数.

线程创建时,  数据从这里复制.  


三. TLS Callback Functions

这是线程建立和退出时的回调函数,  包括主线程和其他线程.

AddressOfCallBacks 是指向函数指针数组的指针,  以 0 结束.  
这里 [406010]=0,  表示没有回调函数.


四.  下面我们 PEDiy 一下 test.exe,  修改如下

 

 


用 OD 改一下 程序

00402100  00 1F 40 00 20 1F 40 00 00 00 00 00 00 00 00 00  .¬@. ¬@.........      ; CallBack 
00402110  54 6C 73 20 73 74 72 2E 00 00 00 00 00 00 00 00  Tls str.........      ; Tls Initial Data
00401EF0  52 65 61 64 54 4C 53 00 53 65 74 20 54 4C 53 00  ReadTLS.Set TLS.

00401F00   ?  60                               PUSHAD
00401F01   ?  A1 D0364000            MOV EAX, DWORD PTR DS:[4036D0]
00401F06   ?  50                               PUSH EAX
00401F07   ?  E8 F0FDFFFF           CALL <JMP.&kernel32.TlsGetValue>
00401F0C   .  6A 00                         PUSH 0                                                      ; /Style = MB_OK|MB_APPLMODAL
00401F0E   .  68 F01E4000              PUSH test.00401EF0                               ; |Title = "ReadTLS"
00401F13   .  50                                PUSH EAX                                              ; |Text
00401F14   .  6A 00                           PUSH 0                                                   ; |hOwner = NULL
00401F16   .  E8 15F1FFFF              CALL <JMP.&user32.MessageBoxA>   ; \MessageBoxA
00401F1B   .  61                               POPAD
00401F1C   >  33C0                         XOR EAX, EAX
00401F1E   .  C3                               RETN
00401F1F      00                                DB 00
00401F20   .  60                                PUSHAD
00401F21   .  68 F81E4000               PUSH test.00401EF8                                      ; /pValue = test.00401EF8
00401F26   .  A1 D0364000               MOV EAX,DWORD PTR DS:[4036D0]     ; |
00401F2B   .  50                                 PUSH EAX                                                   ; |TlsIndex => 0
00401F2C   .  E8 D3FDFFFF             CALL <JMP.&kernel32.TlsSetValue>         ; \TlsSetValue
00401F31   .  61                                 POPAD
00401F32   .  33C0                            XOR EAX,EAX
00401F34   .  C3                                 RETN

OD 设置Debugging options/Events/System breakpoint.

载入 test.exe , 停在这里, 

77F813B1 >  CC             INT3            ; 用过 DebugAPI 都应该清楚这里吧
77F813B2    C3              RETN

004036D0  00 00 00 00                       ; Tls index = 0                                   ....

7FFDE000  0C FD 12 00 00 00 13 00 00 C0 12 00 00 00 00 00  
7FFDE010  00 1E 00 00 00 00 00 00 00 E0 FD 7F  00 00 00 00
7FFDE020  8C 03 00 00 C0 04 00 00 00 00 00 00   E8 24 13 00      ; 2ch


001324E8  08 25 13 00  ABABABAB  ABABABAB  EE FE EE FE  
001324F8  00 00 00 00  00 00 00 00     04  00  04  00   00 07 18 00  
00132508  54 6C 73 20 73 74 72 2E    AB AB AB AB AB AB AB AB   Tls str.     

可以看到这个数据已经被初始化了, 可惜 GetTlsValue 不是调的这里

F4  到下面,  中间有好多 DLL 处理

77F963FE    64:A1 18000000        MOV EAX,DWORD PTR FS:[18]
77F96404    8945 AC                     MOV DWORD PTR SS:[EBP-54],EAX
77F96407    6A 01                         PUSH 1
77F96409    8B40 30                     MOV EAX,DWORD PTR DS:[EAX+30]
77F9640C    FF70 08                     PUSH DWORD PTR DS:[EAX+8]
77F9640F    E8 5DFCFFFF          CALL ntdll.77F96071  ;                                        F7 进入 , 处理 TLS  回调函数


77F96071    55                                PUSH EBP
77F96072    8BEC                          MOV EBP,ESP
77F96074    6A FF                           PUSH -1
77F96076    68 E860F977               PUSH ntdll.77F960E8
77F9607B    68 551FF877               PUSH ntdll.77F81F55
77F96080    64:A1 00000000          MOV EAX,DWORD PTR FS:[0]
77F96086    50                                 PUSH EAX
77F96087    64:8925 0000000>        MOV DWORD PTR FS:[0],ESP
77F9608E    51                                 PUSH ECX
77F9608F    51                                  PUSH ECX
77F96090    83EC 10                       SUB ESP,10
77F96093    53                                  PUSH EBX
77F96094    56                                  PUSH ESI
77F96095    57                                  PUSH EDI
77F96096    8965 E8                         MOV DWORD PTR SS:[EBP-18],ESP
77F96099    8D45 DC                       LEA EAX,DWORD PTR SS:[EBP-24]
77F9609C    50                                   PUSH EAX                                                          ; 保存 Tls. size
77F9609D    6A 09                            PUSH 9                                                                 ; TLS 是第九项
77F9609F    6A 01                               PUSH 1                                                               
77F960A1    8B7D 08                          MOV EDI,DWORD PTR SS:[EBP+8]
77F960A4    57                                     PUSH EDI                                                         ; 400000h  ImageBase
77F960A5    E8 1FD1FEFF                 CALL ntdll.RtlImageDirectoryEntryToData    ; 返回 Tls.VirtualAddress
77F960AA    8365 FC 00                    AND DWORD PTR SS:[EBP-4],0                   ; no error
77F960AE    85C0                              TEST EAX,EAX
77F960B0    74 21                             JE SHORT ntdll.77F960D3                                ; Tls. VirtualAddres = 0  ret           
77F960B2    8B70 0C                          MOV ESI,DWORD PTR DS:[EAX+C]            ; AddressOfCallBacks
77F960B5    8975 D8                         MOV DWORD PTR SS:[EBP-28],ESI
77F960B8    85F6                                TEST ESI,ESI                     
77F960BA    74 17                               JE SHORT ntdll.77F960D3                              ; AddressOfCallBacks  =0 ret    
77F960BC    803D AD03FD77 0>      CMP BYTE PTR DS:[77FD03AD],0               ; ???
77F960C3    0F85 4B350000                JNZ ntdll.77F99614
77F960C9    8B06                                 MOV EAX,DWORD PTR DS:[ESI]               ; 第一个 回调函数
77F960CB    85C0                                TEST EAX,EAX
77F960CD    0F85 56350000                JNZ ntdll.77F99629                                         ; 为 NULL, ret

77F960D3    834D FC FF                     OR DWORD PTR SS:[EBP-4],FFFFFFFF
77F960D7    8B4D F0                            MOV ECX,DWORD PTR SS:[EBP-10]
77F960DA    64:890D 0000000>           MOV DWORD PTR FS:[0],ECX
77F960E1    5F                                      POP EDI
77F960E2    5E                                     POP ESI
77F960E3    5B                                     POP EBX
77F960E4    C9                                     LEAVE
77F960E5    C2 0800                              RETN 8                                                     ; ret to 77F96414



77F99629    8945 E0                   MOV DWORD PTR SS:[EBP-20],EAX                    ; test.00401F00
77F9962C    83C6 04                   ADD ESI,4                                                                 ; 预备下一个        
77F9962F    8975 D8                    MOV DWORD PTR SS:[EBP-28],ESI
77F99632    803D AD03FD77 0>CMP BYTE PTR DS:[77FD03AD],0
77F99639   /74 0F                       JE SHORT ntdll.77F9964A
77F9963B   |50                            PUSH EAX
77F9963C   |57                            PUSH EDI
77F9963D   |68 DE95F977         PUSH ntdll.77F995D ;         "LDR: Calling Tls Callback Imagebase %lx Function %lx"
77F99642   |E8 D873FFFF         CALL ntdll.DbgPrint
77F99647   |83C4 0C                   ADD ESP,0C
77F9964A   \6A 00                      PUSH 0                                                                     ; 回调函数的三个参数
77F9964C    FF75 0C                   PUSH DWORD PTR SS:[EBP+C]                         ; 1 表示主线程创建                      
77F9964F    57                             PUSH EDI                                                                ; pHandle
77F99650    FF75 E0                   PUSH DWORD PTR SS:[EBP-20]                         ; pCallBack                     
77F99653    E8 7B9AFEFF          CALL ntdll.77F830D3
77F99658  ^ E9 6CCAFFFF         JMP ntdll.77F960C9                                               ; 下一个


77F830D3    55                              PUSH EBP
77F830D4    8BEC                         MOV EBP,ESP
77F830D6    56                               PUSH ESI
77F830D7    57                               PUSH EDI
77F830D8    53                              PUSH EBX
77F830D9    8BF4                         MOV ESI,ESP
77F830DB    FF75 14                     PUSH DWORD PTR SS:[EBP+14]
77F830DE    FF75 10                     PUSH DWORD PTR SS:[EBP+10]
77F830E1    FF75 0C                     PUSH DWORD PTR SS:[EBP+C]
77F830E4    FF55 08                     CALL DWORD PTR SS:[EBP+8]                       ;  调用我们的两个回调函数
77F830E7    8BE6                         MOV ESP,ESI
77F830E9    5B                             POP EBX
77F830EA    5F                            POP EDI
77F830EB    5E                            POP ESI
77F830EC    5D                            POP EBP
77F830ED    C2 1000                   RETN 10





77F96414  ^\E9 7375FFFF     JMP ntdll.77F8D98C
F4 到下面

77F9FF3B    6A 01                 PUSH 1
77F9FF3D    57                       PUSH EDI
77F9FF3E    E8 3E29FEFF     CALL ntdll.ZwContinue                                        ; F8  到 OEP


00401E74 > $  55                       PUSH EBP
00401E75   .  8BEC                   MOV EBP,ESP
00401E77   .  83C4 F0               ADD ESP,-10
00401E7A   .  A1 90204000      MOV EAX,DWORD PTR DS:[402090]
00401E7F   .  C600 01               MOV BYTE PTR DS:[EAX],1
00401E82   .  B8 541E4000      MOV EAX,test.00401E54
00401E87   .  E8 24FFFFFF      CALL test.00401DB0
00401E8C   .  E8 6BFBFFFF   CALL test.004019FC

在程序结束时,  TLS  回调函数还将再次调用.


五. 从上面可见,  Tls 数据初试化和 TLS回调函数都在 OEP 之前进行,  如果程序加壳, 壳的 Loader 这时还没有起作用,   代码和 IAT 都没有解密,  要执行 回调函数肯定出错.

还要注意,  IMAGE_TLS_DIRECTORY32 的数据都是 VA, 不是 RVA,   还涉及到重定位的问题.


不过, 有TLS 回调函数的程序不多,  壳只要简单地复制  IMAGE_TLS_DIRECTORY32 就可以了. 

我遇到的问题,  本来 406010 处为 0, 用 aPLib 一压后, 变成了 60h,   多出了一个回调函数, 所以出错.  简单处理了一下,  不压缩 .rdata 区段.