Crash Dump分析笔记

标题：Crash Dump分析笔记
作者：softworm
时间：2008-08-29 20:19
链接：http://bbs.pediy.com/showthread.php?t=71643

Crash Dump分析笔记

我自己的隐藏OD插件,带驱动,在跑ExeCryptor时偶尔BSOD。驱动挂了2个中断,分别是Int0D,用来对抗rdtsc,从
deroko的fakerRdtsc改的,另外1个是Int0E,抄的OllyBonE。

1: kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

UNEXPECTED_KERNEL_MODE_TRAP (7f)
This means a trap occurred in kernel mode, and it's a trap of a kind
that the kernel isn't allowed to have/catch (bound trap) or that
is always instant death (double fault).  The first number in the
bugcheck params is the number of the trap (8 = double fault, etc)
Consult an Intel x86 family manual to learn more about what these
traps are. Here is a *portion* of those codes:
If kv shows a taskGate
        use .tss on the part before the colon, then kv.
Else if kv shows a trapframe
        use .trap on that value
Else
        .trap on the appropriate frame will show where the trap was taken
        (on x86, this will be the ebp that goes with the procedure KiTrap)
Endif
kb will then show the corrected stack.
Arguments:
Arg1: 00000008, EXCEPTION_DOUBLE_FAULT
Arg2: f7737d70
Arg3: 00000000
Arg4: 00000000

Debugging Details:
------------------

PEB is paged out (Peb.Ldr = 7ffd700c).  Type ".hh dbgerr001" for details

PEB is paged out (Peb.Ldr = 7ffd700c).  Type ".hh dbgerr001" for details

BUGCHECK_STR:  0x7f_8

TSS:  00000028 -- (.tss 0x28)
eax=00000000 ebx=7ffd7000 ecx=0012ffb0 edx=0076a492 esi=00560000 edi=0012d028
eip=b9a4bed4 esp=f48be000 ebp=0012ffbc iopl=0         nv up di pl nz na po nc
cs=0008  ss=0010  ds=0000  es=0023  fs=003b  gs=0000             efl=00010002
b9a4bed4 60              pushad
Resetting default scope

DEFAULT_BUCKET_ID:  DRIVER_FAULT

PROCESS_NAME:  EXECryptor.exe

怎么会在执行pushad蓝屏? 记得看过高人的Blog,这种double fault常常是内核栈溢出。
看看当前栈:

1: kd> dd f48be000-10
f48bdff0  ???????? ???????? ???????? ???????? <-栈溢出
f48be000  00000002 b9a4be3c 00000008 00010002
f48be010  00000000 b9a4bece 00000008 00010002

确实是溢出了。那句pushad是NewInt0E的第1句。我已设置了Symbol Path,Image Path,用dds却没
什么符号信息。

1: kd> dds f48be000
f48be000  00000002 <- ErrorCode=2即W=1写操作,P=0页面不存在
f48be004  b9a4be3c <- NewInt0D入口=pushad, 引发PageFault
f48be008  00000008
f48be00c  00010002

进入页故障时的栈,错误码为2。Eip=b9a4be3c,这是NewInt0D的代码:

1: kd> u b9a4be3c
b9a4be3c 60              pushad
b9a4be3d 8b442424        mov     eax,dword ptr [esp+24h]
b9a4be41 3d00000080      cmp     eax,80000000h
b9a4be46 0f8781000000    ja      b9a4becd
b9a4be4c 50              push    eax
b9a4be4d e806f2ffff      call    b9a4b058
b9a4be52 8b742424        mov     esi,dword ptr [esp+24h]
b9a4be56 03f0            add     esi,eax
b9a4be58 66813e0f31      cmp     word ptr [esi],310Fh
b9a4be5d 756e            jne     b9a4becd

pushad时页面不存在,再次进入NewInt0E时导致DoubleFault。

换个格式看看完整的栈:

1: kd> dd f48be000  f48be000+4000
f48be000  00000002 b9a4be3c 00000008 00010002
f48be010  00000000 b9a4bece 00000008 00010002
f48be020  00000000 b9a4bece 00000008 00010002
f48be030  00000000 b9a4bece 00000008 00010002
f48be040  00000000 b9a4bece 00000008 00010002
f48be050  00000000 b9a4bece 00000008 00010002
f48be060  00000000 b9a4bece 00000008 00010002
f48be070  00000000 b9a4bece 00000008 00010002
f48be080  00000000 b9a4bece 00000008 00010002
f48be090  00000000 b9a4bece 00000008 00010002
f48be0a0  00000000 b9a4bece 00000008 00010002
f48be0b0  00000000 b9a4bece 00000008 00010002
... 中间全是这个
f48c1d50  00000000 b9a4bece 00000008 00010002
f48c1d60  00000000 b9a4bece 00000008 00010002
f48c1d70  00000000 b9a4bece 00000008 00010002
f48c1d80  00000000 b9a4bece 00000008 00010002
f48c1d90  00000000 b9a4b05e 00000008 00010046 <-这里开始
f48c1da0  b9a4be52 0076a492 0012d028 00560000
f48c1db0  0012ffbc f48c1dc8 7ffd7000 0012ffb8
....

大致执行顺序是这样的:

1. Ring3代码执行rdtsc,进入NewInt0D。

2. NewInt0D调用PrefixedRdtsc检测指令前缀,访问用户空间引发PageFault

1: kd> u b9a4b05e
b9a4b05e 8a0a            mov     cl,byte ptr [edx]
b9a4b060 80f926          cmp     cl,26h
b9a4b063 7432            je      b9a4b097

这里需要说明1下,deroko的代码在检测指令时,要调用MmIsAddressValid。我是直接访问的,
有个用2.1.x ExeCryptor加壳的Unpackme,会返回False。而不用MmIsAddressValid直接
访问指令却没问题(这个Unpackme在fakerRdtsc下无法运行)。不知道原因。

3. 进入NewInt0E,这次可能是正常处理了,然后继续执行NewInt0D。

4. 在NewInt0D出口再次PageFault:

1: kd> u b9a4bece
b9a4bece ff2530c3a4b9    jmp     dword ptr ds:[0B9A4C330h]
这句是跳到oldInt0D。不清楚这里为什么出页故障,pte是有效的

1: kd> !pte 0B9A4C330h
               VA b9a4c330
PDE at   C0300B98        PTE at C02E6930
contains 064E2963      contains 223D8963
pfn 64e2 -G-DA--KWEV    pfn 223d8 -G-DA--KWEV

1: kd> dd 0B9A4C330
b9a4c330  804e172d 863c601d 863c60d7 00000001
b9a4c340  00000001 bf87dc40 f4813d7e bf81bdf9

1: kd> !pte 804e172d
               VA 804e172d
PDE at   C0300804        PTE at C0201384
contains 004009E3      contains 00000000
pfn 400 -GLDA--KWEV    LARGE PAGE 4e1

5. 从这里开始,不断在同一代码位置引起PageFault并消耗大量内核栈

6. 再次进入NewInt0D,Ring3代码又执行了rdtsc?

7. NewInt0D执行pushad,栈溢出,无效Page,进入NewInt0E

8. NewInt0E执行pushad,Double Fault -> BSOD

问题是,为什么会在同一地址反复出现PageFault而消耗大量栈空间?也许是我抄的OllyBonE有问题?
干脆关闭对Int0E的Hook,再试试,还是同样的BSOD。但这次有点不同了:

TSS:  00000028 -- (.tss 0x28)
eax=00000000 ebx=7ffd3000 ecx=0012ffb0 edx=0076a492 esi=00560000 edi=0012d028
eip=863c60d7 esp=b9c8c000 ebp=0012ffbc iopl=0         nv up di pl nz na po nc
cs=0008  ss=0010  ds=0000  es=0023  fs=003b  gs=0000             efl=00010002
863c60d7 68371e4e80      push    offset nt!KiTrap0E (804e1e37)
Resetting default scope

在这里异常了。

0: kd> u 863c60d7
863c60d7 68371e4e80      push    offset nt!KiTrap0E (804e1e37)
*** ERROR: Module load completed but symbols could not be loaded for cpthook.sys
863c60dc e954173571      jmp     cpthook+0x835 (f7717835)
863c60e1 0e              push    cs
863c60e2 60              pushad
863c60e3 b573            mov     ch,73h
863c60e5 f737            div     eax,dword ptr [edi]
863c60e7 1e              push    ds
863c60e8 0800            or      byte ptr [eax],al

这次牵出了另1个驱动: CptHook.sys。

直接调试内核,看看IDT:

lkd> !idt -a

  Dumping IDT:

  00:  804dfabd nt!KiTrap00
  01:  863c601d
  02:  Task Selector = 0x0058
  03:  863c605b
  04:  804e01e6 nt!KiTrap04
  05:  804e034b nt!KiTrap05
  06:  804e04c9 nt!KiTrap06
  07:  804e0b4d nt!KiTrap07
  08:  Task Selector = 0x0050
  09:  804e0f5a nt!KiTrap09
  0a:  804e107f nt!KiTrap0A
  0b:  804e11c4 nt!KiTrap0B
  0c:  804e142e nt!KiTrap0C
  0d:  804e172d nt!KiTrap0D
  0e:  863c60d7 <---------------- Int0E已被人挂了!
  0f:  804e2175 nt!KiTrap0F

lkd> u 863c60d7
  863c60d7 68371e4e80      push    offset nt!KiTrap0E (804e1e37)
  863c60dc e954173571      jmp     cpthook+0x835 (f7717835)

lkd> u f7717835
  cpthook+0x835:
  f7717835 817c2408307a71f7 cmp     dword ptr [esp+8],offset cpthook+0xa30 (f7717a30)
  f771783d 7216            jb      cpthook+0x855 (f7717855)
  f771783f 817c2408797a71f7 cmp     dword ptr [esp+8],offset cpthook+0xa79 (f7717a79)
  f7717847 730c            jae     cpthook+0x855 (f7717855)
  f7717849 c7442408767a71f7 mov     dword ptr [esp+8],offset cpthook+0xa76 (f7717a76)
  f7717851 83c408          add     esp,8
  f7717854 cf              iretd
  f7717855 c3              ret

lkd> !pool 863c60d7
  Pool page 863c60d7 region is Nonpaged pool
  *863c6000 size:  f88 previous size:    0  (Allocated) *CptH
    Owning component : Unknown (update pooltag.txt)
    863c6f88 size:    8 previous size:  f88  (Free)       .=.a
    863c6f90 size:   10 previous size:    8  (Allocated)  NV
    863c6fa0 size:   60 previous size:   10  (Allocated)  Vpb

  IDT内的地址,是在NonPagedPool内的

这里可以看到原因了: DriverStudio的这个驱动Hook了Int0E,有个判断,异常的EIP若落在
某个地址范围,会直接iretd返回,不调用原始的KiTrap0E。貌似我的驱动不幸满足了条件,
又没有真正处理PageFault,导致反复页故障。不知道是否与测试机器是双核有关。

测试一下,不加载CptHook,不再BSOD了。解决办法也有了,在Hook Int0E前,先检测一下当前
值,如果落在内核映射地址范围外,则在跳到oldInt0E时,使用原始的KiTrap0E。这个地址
只有从文件分析,加载后的内核已经把INIT区段丢弃了。

不懂驱动,乱猜的很多,见笑。