This project is read-only.

Table of contents

1. Introduction

1.1 General information

The Pykd project started in 2010. The main reason was the inconvenience of scripting debugging with the built-in tools for WinDbg. Python language was chosen as an alternative scripting engine for many reasons: ease of learning of the language itself, the presence of a huge standard library and the presence of the powerful framework for creating extensions. Pykd is a module for the CPython interpreter. Pykd itself is written in C++ and uses Boost.Python to export functions and classes to Python. Pykd controls debugging on the Windows platform through he Debug Engine library and receives symbolic information through the MS DIA library. Note that pykd does not give direct access to the COM interfaces of Debug Engine and MS DIA. Instead, it implements its own interface which makes the development process faster and more convenient (we hope).

Pykd can operate in two modes:
  • as a plugin for WinDbg, in which case it provides commands to run scripts in the context of debugging sessions
  • as a separate module for the Python interpreter. This mode can be useful for creating automatic tools that parse crash dumps, for example.
←Table of contents

1.2 Quick start

For a quick start, best download the automatic installer. It will install all necessary components (including Python if not already installed). To verify that the installation was successful, run WinDbg, start debugging an application or dump file, then load pykd:
.load pykd.pyd

If there was no error message, everything is fine. But anyway, let's make sure that everything really works:
0:000> !pycmd
Python 2.6.5 (r265: 79096 , Mar 19 2010 , 18:02:59) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> print "Hello world!"
Hello world!
>>> quit ()
0:000>

Try to run some example scripts:
0:000> !py help
0:000> !py samples

If everything worked, you can start writing your own scripts.
←Table of contents

1.3 Building from source

  1. Take the source code from the repository.
  2. Install Python.
  3. Install and configure Boost. There is also a manual installation and assembly.
  4. Set the environment variables:
$(DIA_SDK_ROOT) - path to the MS DIA library. It is installed with Visual Studio and the path should look similar to C:\Program Files (x86)\Microsoft Visual Studio 9.0\DIA SDK.
$(DBG_SDK_ROOT) - path to the Debug Engine SDK. It is installed with the Debugging Tools for Windows and the path should look like C:\Program Files (x86)\Debugging Tools for Windows (x86)\SDK.
$(BOOST_ROOT) - path to the directory where you installed Boost.
$(PYTHON_ROOT) - path to the installation directory of Python. It is assumed that the system has both, x86 and x64, versions of Python in a directory structure like this: C:\Python26\x86\... and C:\Python26\x64\... in this case, $(PYTHON_ROOT) should be equal to C:\Python26. If the installation path is missing Python and does not indicate the platform, it is necessary to tweak the project file.
  1. Build the Boost.Python library
To assemble the required static Boost.Python libraries, the following paths point to the library:
$(BOOST_ROOT)\stage - for x86 assembly
$(BOOST_ROOT)\stage64 - for x64 assembly
You can collect them with the following commands:
bjam --stagedir=stage --with-python stage
bjam address-model=64 --stagedir=stage64 --with-python stage

If you have not installed yet, download bjam.
←Table of contents

1.4 Manual installation

To manually install pykd.pyd, you have to install the C++ runtime from Visual Studio (vcredist). If you compile it yourself, this shouldn't be a problem. If you downloaded pykd from the website, it also shouldn't be a problem since the ZIP file contains the desired redistributable (as long as we didn't mess with the release :-)).

Where to copy pykd.pyd?
It depends on the scenario. If pykd shall be used as a plugin to WinDbg, it makes sense to copy it into the winext subdirectory of your WinDbg installation. In this case you can rename it to pykd.dll so that you can omit the file extension when loading the extension:
0:000> .load pykd

If pykd shall be used to write a Python program,it must be put in a place where it can find a Python interpreter. These are the three options:
  • in the lib subdirectory of the Python installation.
  • any directory. In this case, the $(PYTHONPATH) environment variable must be set.
  • any directory, if you start Python from the directory where pykd.pyd is located.

Installing Visual C++ redistributable
Of course you need to install VCRedist. Otherwise, why would we ask you to download it?

Registering MS DIA
The MS DIA library will be installed during the installation of VCRedist. In order to work properly, it must also be registered. To do this, find the directory where msdia90.dll was installed an run the command
regsvr32 msdia90.dll


If you compiled pykd yourself using Visual Studio, no action needs to be considered regarding VCRedist. It's already installed on your machine and MS DIA is also in place.

←Table of contents

1.5 API changes

loadModule
The function loadModule was removed. Use the module class instead.
# mod = loadModule("mymodule")
mod = module("mymodule")

←Table of contents

2. WinDbg Commands

2.1 Loading the extension

To load the extension in WinDbg, type:
0:000> .load pykd_path\pykd.pyd

If pykd is in the winext subdirectory of Debugging Tools for Windows, the path can be omitted:
0:000> .load pykd.pyd

If pykd.pyd was renamed to pykd.dll, the extension may be omitted as well:
0:000> .load pykd

To see all WinDbg extensions which have been loaded successfully, run
0:000> .chain

To unload the extension use the same path as for loading:
0:000> .unload pykd_path\pykd.pyd
0:000> .unload pykd.pyd
0:000> .unload pykd

If you don't want to load pykd manually every time, load it and choose "Save workspace". Next time, pykd will be loaded as part of the workspace automatically.
←Table of contents

2.2 Running scripts

Run scripts using the !py command:
0:000> !py script_path\script_name.py [param1 [param2] [...]]]

The extension .py can be omitted. If you don't want to specify the full path, register the environment variable $(PYTHONPATH) or - and this is the preferred way - add pykd to the Registry
HKEY_LOCAL_MACHINE\SOFTWARE\Python\PythonCore\2.6\PythonPath

In this case, the path specified in the Default value will be used for searching scripts.

The parameters passed to the script can be accessed through the sys.argv list:
import sys
print "script path: " + sys.argv[0]
print "param1: " + sys.argv[1]
print "param2: " + sys.argv[2]

←Table of contents

2.3 Console mode

The console is run with the pykd command !pycmd:
0:000> !pycmd
Python 2.6.5 (r265: 79096 , Mar 19 2010 , 18:02:59) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>>

Before it starts, it automatically imports the pykd module, so that all functions of pykd can immediately be called. Remember that the console mode can be exited using the quit() function. This will persist the Python session:
>>> a = 10
>>> quit()
0:000> !pycmd
Python 2.6.5 (r265: 79096 , Mar 19 2010 , 18:02:59) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> print a
10
>>>

←Table of contents

3. Manage debugging

3.1 Stopping and restarting debugged processed

WinDbg provides keyboard shortcuts for go (F5) and break (Ctrl+Break).
Their counterparts in pykd are
go()
breakin()

go() resumes the debugged process and returns control to the debugger only when the debugger is stopped again - triggered by a breakpoint or when debugging is stopped manually via Ctrl+Break. This behavior should be considered when writing scripts. The function can also result in an exception. This usually happens if the debugged process terminates.
try:
    while true:
        go()
        print "break"
except:
    print "process terminated"

Above script will handle any debugger stops automatically and resume execution.

Using breakin() is hardly needed during normal operation due to the fact that the script is usually run while the debugger is already breaking. And at this time, the breakin() method does not make sense. In order to be able to stop a running process, the script has to create a separate thread and then call the function.

Attention! Do not attempt to use breakin() or go() inside debug events such as conditional breakpoints.
←Table of contents

3.2 Stepping

Stepping (tracing) is served by the two functions
step()
trace()
These actions are similar to the debugger commands trace into and trace over. Both functions can result in a DbgException if the debugged process has already ended.
←Table of contents

3.3 Working with Python debugging applications

If you want to run scripts outside of WinDbg, the first step is to create a debug session. More detailed session management will be discussed in the relevant section. As long as your application does not use multiple debugging sessions, it's not needed to take care of them - the first session will be created automatically with the following calls:
loadDump(dumpName) - loads the crash dump
id=startProcess(imageName) - runs a new process in debug mode
id=attachProcess(processId) - attaches the debugger to an existing process
attachKernel(parameterStr) - attaches the debugger to the kernel debugging system

To detach the debugger from the debugged process call detachProcess(id).
To stop debugging and terminate the debugged process call killProcess(id).

To find out what you are debugging, use
isDumpAnalyzing()
isKernelDebugging()
The first function determines whether the debugger is doing live debugging or analysing a memory dump. The second function distinguishes between user mode debugging and kernel mode debugging. If the script is specific to one of those debugging methods, it will be useful to insert such a test at the beginning of the script. Please inform the user that he is trying to run the script in the wrong context.
←Table of contents

3.4 Printing debug information

To display information on the screen you can use the default Python print method, but it is recommended to use the special functions
dprint(message, dml = False)
dprintln(message, dml = False)
The second function differs from the first in that it automatically adds a newline. The optional parameter dml includes output of DML. DML is specific to WinDbg and can be considered as very simple HTML. You can turn DML support on or off in WinDbg using .prefer_dml. Text formatting can be done using the following tags:
  • <b>...</b> - emphasize
  • <i>...</i> - italics
  • <u>...</u> - underline
  • <link cmd="command">...</link> - execute a command (similar to <a> in HTML)
Example:
dprintln("<b><u>The following command reloads all symbols</b></u>", True)
dprintln("<link cmd=\".reload /f\">reload</link>", True)

←Table of contents

3.5 Executing debugger commands

The method to execute debugger commands is
commandOutput = dbgCommand(commandStr)
s = dbgCommand("!analyze -v")
dprint(s)

To evaluate an expression the method is
expr(expressionStr)
expr("@rax+10")

Within a Python application you may want to use WinDbg extensions. Those extensions have to be loaded manually, which is done by
extHandle = loadExt(extensionPath)
The return value is a handle to the extension which is needed to call an extension function
commandOutput = callExt(extHandle, command, params)
(note that command does not include the exclamation mark)
and if necessary, dispose the extension
removeExt(exthandle)
Attention: working with extensions in pykd 0.2 differs from version 0.1. In version 0.2, the ext class has been removed and cannot be used to load extensions.
←Table of contents

3.6 Creating crash dumps

Saving the state of the application or system in the form of a crash dump can be done using
writeDump(fileName, dumpType)
The function is available in kernel mode and user mode. The second parameter specifies the type of the dump (True: minidump, False: full dump).
writeDump(r"c:\dump\fulldump.dmp", False)
writeDump(r"c:\dump\minidump.dmp", True)

←Table of contents

4. Working with memory and registers

4.1 Access to the general purpose registers

Access the general purpose registers (GPR) using
cpuReg=reg(regName)
cpuReg=reg(regIndex)
The first variant takes the symbolic register name, the second takes a register index. The second form can be used to transfer registers, e.g.
import pykd

try:
    i = 0
    while True:
        r = pykd.reg(i)
        pykd.dprintln("%s %x (%d)" % (r.name(), r, r))
        i += 1
except pykd.BaseException:
    pass

Both versions return an instance of the cpuReg class. If the information on the register cannot be obtained, an exception of type BaseException will be thrown.
The cpuReg class has two methods:
name()
index()
The class cpuReg can be used in integer calculations without additional considerations of its type:
r = reg("eax")
print r/10*234

Note: the current implementation of pykd supports only integer registers. Working with FPU, MMX or SSE registers is not supported.
←Table of contents

4.2 Access to model-specific registers (MSR)

Model-specific registers are accessed through the function rdmsr(msrNumber):
>>> print findSymbol(rdmsr(0x176))
nt!KiFastCallEntry

←Table of contents

4.3 Normalization of virtual addresses

All functions return virtual addresses in a so-called normalized form which is a 64 bit integer. For 32 bit platforms the address will be extended to 64 bit. The operation in C is
ULONG64 addr64 = (ULONG64)(LONG)addr;

Thus addresses will be converted as follows:
0x00100000 -> 0x00000000 00100000
0x80100000 -> 0xFFFFFFFF 80100000
This should be considered when doing arithmetic operations on addresses returned by pykd. To avoid possible errors in comparisons, it's recommended to use the function addr64():
import pykd
nt = pykd.module("nt")
if nt > addr64( 0x80000000 ):
    print "nt module is in highest address space"

←Table of contents

4.4 Direct memory access

Accessing memory of the debugged system is a great feature of pykd.
To read unsigned values, the following functions are available:
  • ptrByte( va )
  • ptrWord( va )
  • ptrDWord( va )
  • ptrQWord( va )
These functions serve a similar purpose for signed values:
  • ptrSignByte( va )
  • ptrSignWord( va )
  • ptrSignDWord( va )
  • ptrSignQWord( va )
For convenience, the cross-platform functions are:
  • ptrMWord(va)
  • ptrSignMWord(va)
  • ptrPtr(va)
They return the result depending on the bitness of the target platform (32 or 64 bit).
It's also often required to read a block of memory. These functions are designed for that:
  • loadBytes( va, count )
  • loadWords( va, count )
  • loadDWords( va, count )
  • loadQWords( va, count )
  • loadSignBytes( va, count )
  • loadSignWords( va, count )
  • loadSignDWords( va, count )
  • loadSignQWords( va, count )
  • loadPtrs( va, count )
All those functions return a list of objects.
←Table of contents

4.5 Memory access errors

All memory related functions result in a MemoryException if the memory at the given address cannot be read.
try:
    a = ptrByte( 0 )
except MemoryException:
    print "memory exception occurred"

To check the validity of a virtual address, you can use the isValid(va) function.
←Table of contents

4.6 Reading strings from memory

It is often necessary to read string data from memory. Of course this could be done with loadBytes(), but it is not always convenient. Therefore pykd added a set of functions that return data as a string. There are:
  • loadChars( va, count )
  • loadWChars( va, count )
They work similar to loadBytes() and loadWords() but they return the value as a string rather a list. This allows you to use them in conjunction with the struct module:
from struct import unpack
shortField1, shortField2, longField = unpack('hhl', loadChars( addr, 8 ) )

To read NUL-terminated strings, the functions are
  • loadСStr( va )
  • loadWStr( va )
Both return a string (loadWStr in unicode format). Note: These functions are not safe. The presence of a terminating NUL character is not guaranteed. Attention: the maximum length of a string returned by these functions is 64k. Reading a longer string results in a MemoryException.
The Windows kernel uses the structures UNICODE_STRING and ANSI_STRING to represent strings. To work with those, the commands are
  • loadAnsiString
  • loadUnicodeString
←Table of contents

5. Modules

5.1 Module class

A module is an executable file which is mapped to memory. A regular program consists of a main module (usually with .exe extension) and a set of libraries. Working with modules is done with the module class.
←Table of contents

5.1.1 Creating an instance of the module class

The module class constructor has two forms:
  • module( moduleName )
  • module( va )
The first form creates a module by its name, the second uses a virtual address which belongs to the module. If the module is not found, the constructor will raise a BaseException.
Example:
from pykd import *
try
    ntdll = module( "ntdll" )
    print ntdll.name(), hex(ntdll.begin()), hex(ntdll.size()) 
except BaseException:
    print "module not found"

←Table of contents

5.1.2 Obtaining information about the module

The following functions of the module class are designed to get more information:
  • name() - Returns the name of the module
  • image() - Returns the name of the executable file
  • pdb() - returns the name and full path to the file with symbolic information
  • begin() - returns the virtual address at which the module is loaded
  • end() - returns the virtual address of the end of the module
  • checksum() - returns the checksum
  • timestamp() - returns the time stamp
  • getVersion() - returns a tuple for the version of the module . For example: (1 , 0, 6452 , 0)
  • queryVersion(valueName) - returns the value of the resources module
←Table of contents

5.1.3 Loading and accessing symbols

To download symbol information use reload().
To find the virtual address which corresponds to a desired symbol, the method is offset(symName). If it is indicated that the symbol is not found, an exception of type BaseException will be thrown. Instead of an explicit call to offset(), the address of a symbol can be obtained using the method at the module class:
>>> nt = module("nt")
>>> print hex( nt.offset("PsLoadedModuleList") )
0xfffff801acb5ae80L
>>> print hex( nt.__getattr__("PsLoadedModuleList") )
0xfffff801acb5ae80L
>>> print hex( nt.PsLoadedModuleList )
0xfffff801acb5ae80L

Sometimes you may need the RVA symbols which you can get via rva(symName).
←Table of contents

5.1.4 Casting to other types

An instance of the module class has operators to cast to string and integer.
>>> nt = module("nt")
>>> print nt
Module: nt
Start: fffff801ac882000 End: fffff801acfc8000 Size: 746000
Image: ntkrnlmp.exe
Pdb: c:\sym\ntkrnlmp.pdb\569F266AE67D457D969D92298F8F98082\ntkrnlmp.pdb
Timestamp: 4f7118bb
Check Sum: 6b3b15

>>> print hex(nt)
fffff801ac882000

The start address will be used when the module class is involved in arithmetic operations:
>>> print hex( nt + 10 )
0xfffff801ac88200aL

←Table of contents

5.1.5 Obtaining information about a contained type

Besides describing variables and functions (entities that have an RVA), symbols can also describe types. Types of course don't have an RVA.
If a module has information about types, it can be accessed through the function type(typeName). The function returns an instance of class typeInfo, which will be discussed in detail later.
>>> nt = module("nt")
>>> print nt.type("_MDL")
struct/class: _MDL Size: 0x1c (28)
   +0000 Next                : _MDL*
   +0004 Size                : Int2B
   +0006 MdlFlags            : Int2B
   +0008 Process             : _EPROCESS*
   +000c MappedSystemVa      : Void*
   +0010 StartVa             : Void*
   +0014 ByteCount           : ULong
   +0018 ByteOffset          : ULong

←Table of contents

5.1.6 Typed variables

Pykd allows to simplify work with complex types such as classes and structures. The function responsible for that is typedVar(). To get an instance of class or structure, call typedVar() on the module:
  • typedVar( va )
  • typedVar( symbolName )
  • typedVar( typeName, va )
>>> nt = module("nt")
>>> print nt.typedVar( "_LIST_ENTRY", nt.PsLoadedModuleList )
struct/class: _LIST_ENTRY at 0xfffff8000369c650
   +0000 Flink               : _LIST_ENTRY*   0xfffffa8003c64890
   +0008 Blink               : _LIST_ENTRY*   0xfffffa80092f8f30

←Table of contents

5.2 Processing module load and unload events

For processing module load and unload events, you must inherit from the eventHandler class.
Event processing is carried out by overriding the onLoadModule event. For unloading, this is onUnloadModule.
←Table of contents

Last edited Feb 24, 2014 at 2:55 PM by strangedev, version 13

Comments

lovesuae Aug 31, 2015 at 7:57 AM 
in 3.1 should change true to True:

try:
while true:
go()
print "break"
except:
print "process terminated"