dirtbox, a highly scalable x86/Windows Emulator
Abstract
The increasing amount of new malware each day does not only put anti-virus
compa-
nies up to new limits handling these samples for detection by creating new
signatures.
But also for network security providers and administrators, getting
information on how
samples affect the networks they try to protect is an increasing problem.
Dynamic
analysis of malware by execution in sandboxes has been an approach that has
been suc-
cessfully applied in both of these problem scenarios, however classic
sandbox approaches
clearly suffer from severe scalability problems. Most of these rely on
setting up a real
target system such as the Windows XP operating system as a virtual
machine with
additional software that does logging of performed actions. While these are
easy to
develop and set up, they require a separate virtual machine instance for
each malware
sample to be analyzed and therefore do not scale up with today's
requirements in terms
of malware growth.
Anti-Virus vendors tried to circumvent performance issues for file
analysis by develop-
ing custom emulators that can be deployed on a customer end-host for
detection and do
not require a whole operating system inside a virtual machine. These
emulators however
often are software interpreters for the x86 instruction set and run
therefore into execu-
tion speed limitations on their own. Additionally, they suffer from
detectability because
they try to emulate every single Windows API but suffer from accuracy
issues.
dirtbox is an attempt to implement a highly scalable x86/Windows
emulator that
can be both used for simple malware detection and detailed behavior
analysis reports.
Instead of emulating every single x86 instruction in software,
malware instructions are
executed directly on the host CPU in a per basic block fashion. A
disassembling run on
each basic block ensures that no privileged or control flow
subverting instructions are
executed. The notion of virtual memory that is separated from the
emulators memory
is employed by special LDT segments and switching segment selectors
before executing
guest instructions.
Since no instrumentation alike instruction rewriting is being
done, disassembler results
per basic block can be cached and all execution happens in the
same process without
context-switches, a high grade of performance is achieved.
The operating system is emulated at the syscall layer. While
this layer is mostly
undocumented and implementing it in an accurate fashion is a
challenging task on its
own, the fact that no register changes are leaked from Ring 0
thwarts a lot of detection
techniques. For usage of the high-level APIs, corresponding
libraries are directly mapped
into the virtual memory as well. Detection mechanisms such as
- Examination of the ecx register after a SEH protected API call
- Stolen bytes from an API library implementation
- Direct reads and writes from PEB or other static locations or libraries
are supported automatically. Furthermore, process and heap layout reassemble
that of a genuine process since the original ntdll PE loading and heap
management code can be executed and used.
Georg Wicherski
|