Youssix

VMCS by Practice: Notes from Writing a Hypervisor

2026-05-20T00:00:00+00:00

If you start reading Intel VMX documentation seriously, you quickly notice something frustrating: the Intel SDM is extremely precise, but not pedagogical. It’s a reference manual, not a learning path.

Most beginner questions are not about individual fields. They’re about relationships between concepts:

How many VMCS structures actually exist?
What does “current VMCS” really mean?
Why does VMPTRLD fail right after allocation?
What’s the difference between VM_EXIT_REASON and VM_INSTRUCTION_ERROR?
Why do some VM exits feel impossible to debug?

This article is not a complete VMX reference. It’s a collection of practical notes and mental models I wish I had earlier while working on a custom hypervisor for learning purposes.

The target reader has already opened Intel SDM Vol. 3C, read the first VMX chapters, maybe looked at projects like HyperPlatform or KVM, and now has concrete debugging questions.

We will focus on:

VMCS lifecycle
VMCS activation rules
common beginner mistakes
VMX debugging flow
VM exit diagnostics

We will not cover: EPT internals, posted interrupts, nested virtualization, APIC virtualization, advanced scheduling, or production-grade VMM design. Those deserve separate articles.

1. How Many VMCS Exist, and How Many Are Current?

A common beginner misunderstanding is assuming there is “one VMCS per VM”. That is incorrect.

The VMCS is tied to a vCPU, not to the VM itself.

Consider this scenario:

4 VMs
2 vCPUs per VM
8 logical processors on the host

That means 4 × 2 = 8 vCPUs, and because each vCPU needs its own execution context, 8 VMCS regions allocated.

A mental model:

VM1:  vCPU0 → VMCS A    VM2:  vCPU0 → VMCS C
      vCPU1 → VMCS B          vCPU1 → VMCS D

VM3:  vCPU0 → VMCS E    VM4:  vCPU0 → VMCS G
      vCPU1 → VMCS F          vCPU1 → VMCS H

That tells us how many VMCS structures exist in memory. But Intel VMX introduces another concept: the current VMCS.

A VMCS becomes current on a logical processor after VMPTRLD. This loads the VMCS pointer into the processor’s VMX state. The important rule:

A logical processor has exactly one current VMCS at a time. A VMCS cannot be current on multiple logical processors simultaneously.

That second clause is what makes SMP hypervisor work non-trivial.

Allocated vs Current vs Launched

These three terms are often confused, and the distinction is the single most useful mental model you can build early.

Allocated — a 4 KB VMCS region exists in memory. Nothing more. The CPU doesn’t know about it.

Current — VMPTRLD has been executed for this VMCS on a logical processor. The CPU now associates VMX execution state with that structure. VMREAD / VMWRITE operate on the current VMCS.

Launched — VMLAUNCH has succeeded at least once on this VMCS. From this point, VMLAUNCH can no longer be used on it; only VMRESUME. The CPU internally tracks this launch-state bit.

A VMCS therefore lives in one of several states: clear (just allocated or just VMCLEAR‘d), current but not launched, or current and launched. VMCLEAR transitions a VMCS back to the clear state, which means after a clean migration to another LP you use VMLAUNCH again, not VMRESUME.

Maximum Current VMCS at Once

In our example:

8 vCPUs
8 logical processors

All 8 VMCS structures could be current simultaneously, one per LP:

LP0 → VMCS A    LP4 → VMCS E
LP1 → VMCS B    LP5 → VMCS F
LP2 → VMCS C    LP6 → VMCS G
LP3 → VMCS D    LP7 → VMCS H

But if only 2 vCPUs are scheduled at this instant: 8 VMCS allocated, 2 current, 6 sitting inactive in RAM.

VMCS Migration Trap

One of the easiest mistakes in multi-core hypervisor development is reusing a VMCS on another CPU without properly clearing it first.

Before a VMCS can migrate cleanly between logical processors, you must issue VMCLEAR on the source LP. Otherwise:

the CPU on the source LP may still consider it current
VMPTRLD on the destination LP can fail
internal cached state may not be flushed to the in-memory VMCS

This is one of the reasons VMCS lifecycle management gets complicated once SMP enters the picture.

2. Why Does `VMPTRLD` Fail Right After Allocation?

This is one of the most classic VMX beginner failures. You allocate a 4 KB page:

void* vmcs = MmAllocateContiguousMemory(0x1000, ...);

Then VMPTRLD fails immediately with VMfailInvalid. The usual reason: you forgot to initialize the VMCS revision identifier.

The VMCS Revision Identifier

A VMCS region is not just “any aligned page”. Intel requires the first 32 bits of the page to contain a specific value, the VMCS revision identifier:

offset 0x000:
  bits  0-30  → VMCS revision ID
  bit   31    → shadow VMCS indicator (keep at 0 unless using shadow VMCS)

Before VMPTRLD, you must write the revision ID at offset 0. It comes from MSR IA32_VMX_BASIC (0x480):

uint64_t vmx_basic = __readmsr(0x480);
uint32_t revision_id = (uint32_t)(vmx_basic & 0x7FFFFFFF);

memset(vmcs, 0, 0x1000);
*(uint32_t*)vmcs = revision_id;

After that, VMPTRLD can succeed.

Aside: VMXON Region

The VMXON region is a different 4 KB structure used by the VMXON instruction itself. It is not a VMCS, but confusingly it also requires the revision ID written at offset 0 (same MSR, same format). Beginners often allocate one and reuse it incorrectly. Keep them as separate allocations.

Physical vs Virtual Address Trap

VMPTRLD expects a physical address. Not a virtual address.

// Wrong:
__vmx_vmptrld(&vmcs_virtual);

// Correct:
__vmx_vmptrld(&vmcs_physical);

The VMCS region lives in physical memory from the CPU’s perspective.

Alignment Requirements

The VMCS region must be 4 KB aligned and 4 KB in size. This is why contiguous physical allocation is commonly used. Misalignment alone produces immediate VMX failures.

Why Intel Designed It This Way

Unlike AMD’s VMCB, the VMCS format is intentionally opaque. Intel does not expose an official C structure layout. The CPU owns the real internal format, and software interacts through VMREAD / VMWRITE.

The revision ID lets Intel evolve internal VMCS layouts across processors while keeping compatibility rules clear. The VMCS page is therefore partly a memory structure, and partly a CPU-managed object. That matters when debugging VMX state issues.

3. The 6 Regions of a VMCS

The VMCS is easier to understand once you stop viewing it as a giant opaque structure and instead split it into functional areas. Intel conceptually divides it into six regions.

1. Guest-State Area. Guest CPU state: RIP, RSP, CR0/CR3/CR4, segment state, MSRs, etc. On VM entry, the CPU loads guest execution state from here. On VM exit, the CPU saves it back here. This is a transition snapshot, not a live view of the guest CPU.

2. Host-State Area. Defines the state restored during VM exit: host RIP, host RSP, host CR3, segment selectors. Again, transition state — not a live host CPU mirror.

3. VM-Execution Control Fields. What should cause VM exits? CPUID intercept, MOV CR3 intercept, EPT enable, MSR bitmaps, exception bitmap, etc.

4. VM-Exit Control Fields. What happens during the exit transition: host address-space size, save/load debug controls, MSR switching behavior.

5. VM-Entry Control Fields. What happens during the entry transition: IA-32e guest mode, event injection, MSR loading.

6. VM-Exit Information Fields. Read-only diagnostic fields filled by the CPU during VM exits: VM_EXIT_REASON, EXIT_QUALIFICATION, VM_EXIT_INTR_INFO. These are critical for debugging — they answer why the VM exited.

4. `VM_EXIT_REASON` vs `VM_INSTRUCTION_ERROR`

This is probably the single most common conceptual confusion in beginner VMX development. These two fields answer completely different questions.

`VM_EXIT_REASON`

Why did the guest stop executing?

The important implication: the guest was successfully running, and a VM exit happened afterward.

VMLAUNCH
  → guest runs
  → VMEXIT occurs
  → hypervisor resumes
  → VM_EXIT_REASON available

Typical reasons: CPUID intercept, EPT violation, exception, HLT, I/O instruction, MOV CR3 intercept. You read it after a successful VM exit via VMREAD(VM_EXIT_REASON).

`VM_INSTRUCTION_ERROR`

Why did a VMX instruction fail?

Completely different concept. Examples: VMLAUNCH, VMRESUME, VMPTRLD, VMWRITE. The guest may never have executed at all.

VMLAUNCH
  → VMfailValid
  → VM_INSTRUCTION_ERROR explains why

This is not a VM exit. This is a VMX API failure.

The Car Analogy

VM_EXIT_REASON — the car was driving; why did it stop? Red light, crash, police stop, engine failure.

VM_INSTRUCTION_ERROR — the car never started; why? Invalid key, dead battery, broken engine.

The Beginner Trap

A very common mistake:

if (VMLAUNCH failed)
    read VM_EXIT_REASON;

Conceptually wrong. No VM exit occurred. The guest never ran. The correct field is VM_INSTRUCTION_ERROR.

`VMfailInvalid` vs `VMfailValid`

Another distinction worth knowing.

VMfailInvalid — fundamental VMX failure: invalid VMCS pointer, non-aligned VMCS, VMX disabled. There may be no valid VM_INSTRUCTION_ERROR to read.

VMfailValid — the instruction was understood, but parameters or VMCS state are invalid. VM_INSTRUCTION_ERROR will contain a precise reason.

Common `VM_INSTRUCTION_ERROR` Codes

Worth keeping a printed table near your debugger. The ones that come up most often:

Code	Meaning
4	`VMLAUNCH` with non-clear VMCS
5	`VMRESUME` with non-launched VMCS
7	VM entry with invalid control fields
8	VM entry with invalid host-state field
9	`VMPTRLD` with invalid physical address
10	`VMPTRLD` with `VMXON` pointer
11	`VMPTRLD` with incorrect VMCS revision identifier
12	`VMREAD`/`VMWRITE` to unsupported VMCS component
13	`VMWRITE` to read-only VMCS component
26	VM entry with events blocked by MOV SS

Codes 7 and 8 are by far the most common when wiring up a new hypervisor.

Practical Rule

If the guest executed and then exited → read VM_EXIT_REASON.
If a VMX instruction itself failed → read VM_INSTRUCTION_ERROR.

This distinction alone removes a massive amount of VMX debugging confusion.

5. Debugging Exit Reason 0

One of the most confusing VM exits for beginners:

EXIT_REASON = 0
EXIT_QUALIFICATION = 0

This often looks meaningless. It is not.

Exit Reason 0 = Exception or NMI

A guest exception occurred (or an NMI), and VMX controls caused a VM exit. The key point: EXIT_QUALIFICATION is usually not the important field here. The most important field is VM_EXIT_INTR_INFO.

`VM_EXIT_INTR_INFO`

This field tells you:

exception vector
interruption type
whether an error code exists
whether the field is valid (bit 31 — check this first)

If bit 31 is 0, the rest of the field is meaningless. Beginners often skip this check and chase a phantom vector. Always validate first, then decode.

Common vectors:

Vector	Exception
3	`#BP` (breakpoint)
6	`#UD` (invalid opcode)
13	`#GP` (general protection)
14	`#PF` (page fault)

Page Faults (`#PF`, vector 14)

EXIT_QUALIFICATION becomes useful — it contains page-fault semantics. Also inspect:

GUEST_LINEAR_ADDRESS
guest CR2
guest RIP

Important: a normal guest page fault is exit reason 0. An EPT violation is exit reason 48. Do not mix them up — they are routed by completely different logic in the CPU.

General Protection Faults (`#GP`, vector 13)

Common causes when bringing up a hypervisor: invalid segment state, bad CR4 setup, invalid MSRs, non-canonical addresses, broken host-state fields. Priority checks: guest RIP, guest CS, guest CR3, host-state validity.

Invalid Opcode (`#UD`, vector 6)

Likely causes: unsupported instruction, broken RIP, invalid execution mode, VMX instruction leaking into the guest, XSAVE/XRSTOR issues.

Useful Debug Checklist for Exit Reason 0

Read VM_EXIT_INTR_INFO. Check bit 31. Identify the vector.
Read GUEST_RIP. Locate the crashing instruction.
Read VM_EXIT_INSTRUCTION_LEN. Decode instruction bytes correctly.
Read guest CR3 / CS / RSP. Understand execution context.
Read GUEST_LINEAR_ADDRESS if memory-related.
Distinguish guest fault from EPT fault. (Big source of confusion.)

6. Suggested Reading Order for SDM Vol. 3C

If I restarted VMX learning from zero, I would read the Intel SDM in a very different order from how it’s printed. The SDM is structured as a reference manual, not a tutorial.

Start with VMX operation basics. VMX root vs non-root, VM entry, VM exit, VMCS lifecycle. Without this, the rest feels random.

Then read the VMCS layout chapter. Focus on guest state, host state, control fields, exit information fields. Do not try to memorize encodings yet — build a mental structure first.

Then read the VM-entry and VM-exit chapters. These explain what the CPU actually does, in which order, what gets validated, and what can fail. This section alone explains many mysterious crashes.

Keep these tables open while coding:

VM instruction error codes
VM exit reason codes
Control-field allowed settings (IA32_VMX_PINBASED_CTLS, IA32_VMX_PROCBASED_CTLS, etc.)

You will reference them constantly.

External References

HyperPlatform — very readable educational VMX codebase.
KVM (arch/x86/kvm/vmx/) — production-quality reference. Dense, but valuable.
Daax / Karvandi / Sina Karvandi (Hvpp / Hypervisor From Scratch) — some of the best practical VMX writing publicly available.

Conclusion

VMX becomes much easier once you stop thinking of it as “magic CPU behavior” and instead treat it as state transitions, execution contracts, and strict validation rules.

Most beginner VMX debugging problems are not caused by advanced concepts. They come from:

misunderstanding VMCS lifecycle
confusing VM exits with VMX instruction failures
reading the wrong diagnostic field
not knowing which state is transition state versus live execution state

This article focused on those foundations. It did not cover EPT internals, VPID, APIC virtualization, MSR bitmaps, posted interrupts, or nested virtualization — each deserves its own write-up, and EPT will be next.

If you understand VMCS lifecycle, current vs launched VMCS, VM exit diagnostics, and the difference between VM exits and VMX failures, you’ve already eliminated a surprisingly large percentage of early hypervisor debugging pain.

EPT Internals: Understanding Intel’s Second Layer of Paging

2026-05-18T00:00:00+00:00

This is the follow-up to VMCS by Practice. If that article focused on getting a guest to run, this one focuses on controlling what it sees in memory.

Extended Page Tables (EPT) is Intel’s hardware-assisted memory virtualization. Before EPT, hypervisors had to use shadow page tables - maintaining a parallel set of page tables and trapping every guest page table modification. It worked, but it was slow and complex.

EPT adds a second translation layer directly in hardware. The guest manages its own page tables normally (GVA → GPA), and the CPU automatically translates guest physical addresses to host physical addresses (GPA → HPA) through EPT. No traps needed for normal memory access.

I’ll go through the structures, the walk, the common mistakes, and how to debug all of it.

1. The Two-Layer Translation Model

Without virtualization, address translation is:

Virtual Address (VA) → Physical Address (PA)
          via CR3 → page tables

With EPT enabled, the guest still thinks it controls physical memory, but every “physical” address the guest produces is actually a guest physical address (GPA). The CPU then walks EPT to find the real host physical address:

Guest Virtual Address (GVA)
    → Guest Page Tables (controlled by guest CR3)
    → Guest Physical Address (GPA)
        → EPT Page Tables (controlled by EPTP in VMCS)
        → Host Physical Address (HPA)

The critical insight: the guest page table walk itself goes through EPT. When the CPU reads a guest PTE, that PTE lives at a GPA, which must be translated through EPT to find the actual memory. This means a single guest virtual address translation can trigger multiple EPT walks.

The Cost of Nested Translation

A full 4-level guest page walk with 4-level EPT means up to 20 memory accesses in the worst case (no TLB hits):

4 guest page table levels
Each level requires an EPT walk (4 EPT levels each)
4 × 4 = 16 EPT accesses + 4 guest table reads = 20 total

In practice, TLB caching (especially with VPIDs and EPT-tagged TLB entries) reduces this dramatically. But knowing the worst case explains why EPT misconfigurations hurt so much.

2. EPT Structure Layout

EPT uses the same hierarchical structure as regular x86-64 paging: 4 levels, 512 entries per table, 8 bytes per entry.

EPTP (in VMCS)
  → PML4 (Page Map Level 4)         512 entries, each covers 512 GB
    → PDPT (Page Directory Pointer)  512 entries, each covers 1 GB
      → PD (Page Directory)          512 entries, each covers 2 MB
        → PT (Page Table)            512 entries, each covers 4 KB

Each level uses 9 bits of the guest physical address:

GPA bits:
  [47:39] → PML4 index
  [38:30] → PDPT index
  [29:21] → PD index
  [20:12] → PT index
  [11:0]  → page offset

EPT Entry Format

An EPT entry at any level has this basic structure:

Bits 2:0   → Read / Write / Execute permissions
Bit  7     → Large page (1 GB at PDPT level, 2 MB at PD level)
Bits 51:12 → Physical address of next level (or final page frame)

The permission bits are the most important part for security research:

Bit	Permission	Meaning
0	Read	Guest can read this memory
1	Write	Guest can write this memory
2	Execute	Guest can execute from this memory

Setting all three to 0 on a valid mapping means any access causes an EPT violation (VM exit reason 48). This is the foundation of EPT-based memory monitoring.

The EPTP (EPT Pointer)

The EPT pointer is stored in the VMCS and tells the CPU where the PML4 table lives:

Bits 2:0   → Memory type for EPT structures (typically 6 = write-back)
Bits 5:3   → EPT page walk length minus 1 (set to 3 for 4-level walk)
Bits 51:12 → Physical address of PML4 table (4 KB aligned)

A common beginner mistake: setting the memory type wrong or forgetting the walk length field. Both produce immediate VM entry failures with unhelpful error messages.

3. Building an Identity Map

The simplest EPT configuration maps all guest physical memory 1:1 to host physical memory. GPA 0x1000 maps to HPA 0x1000. The guest sees exactly the real physical layout.

This is not useful for isolation, but it is the correct first step when bringing up a hypervisor. Get identity mapping working before attempting anything more complex.

Allocation Strategy

For a system with 4 GB of physical memory:

PML4:  1 table   (covers up to 256 TB)
PDPT:  1 table   (covers up to 512 GB, we need ~4 GB)
PD:    4 tables  (each covers 1 GB, 4 × 1 GB = 4 GB)
PT:    0 tables  (use 2 MB large pages to avoid this level)

Using 2 MB large pages simplifies the initial setup significantly. Set bit 7 in the PD entries to enable large pages:

for (uint64_t i = 0; i < 2048; i++) {  // 2048 × 2 MB = 4 GB
    uint64_t pd_index = i % 512;
    uint64_t pdpt_index = i / 512;

    pd_tables[pdpt_index][pd_index] =
        (i * 0x200000)    // physical address
        | (1 << 7)        // large page
        | 0x7;            // read + write + execute
}

Common Identity Map Mistakes

MTRR interaction will get you. The memory type in EPT entries interacts with MTRR (Memory Type Range Registers). For identity mapping, write-back (6) works for RAM regions, but MMIO regions need uncacheable (0). I spent way too long debugging subtle corruption before realizing this.

MMIO ranges need to be covered too. Device MMIO regions (like the local APIC at 0xFEE00000) must also be mapped. Missing MMIO mappings cause EPT violations when the guest accesses hardware.

Also watch out for physical address width. Not all CPUs support 48-bit physical addresses. Check CPUID.80000008H:EAX[7:0] for the actual width. Mapping beyond what the hardware supports causes undefined behavior.

4. EPT Violations (VM Exit Reason 48)

When the guest accesses memory in a way that violates EPT permissions, the CPU generates a VM exit with reason 48. This is the most important EPT-related exit for security research.

Exit Qualification

For EPT violations, EXIT_QUALIFICATION contains detailed information about what happened:

Bit 0  → caused by data read
Bit 1  → caused by data write
Bit 2  → caused by instruction fetch
Bit 3  → EPT entry read permission (at the faulting level)
Bit 4  → EPT entry write permission
Bit 5  → EPT entry execute permission
Bit 7  → GPA is valid in GUEST_PHYSICAL_ADDRESS field
Bit 8  → fault was a GPA translation (not a page walk)

Relevant VMCS Fields

After an EPT violation, read these fields:

GUEST_PHYSICAL_ADDRESS - the GPA that caused the violation
GUEST_LINEAR_ADDRESS - the GVA the guest was accessing (if bit 7 of qualification is set)
GUEST_RIP - where the guest was executing
VM_EXIT_INSTRUCTION_LEN - length of the faulting instruction

EPT Violation vs Page Fault

This is a critical distinction that confuses beginners:

Page fault (exit reason 0, vector 14): The guest’s own page tables rejected the access. The guest OS would normally handle this via its page fault handler. The hypervisor sees it only if exception bitmap bit 14 is set.

EPT violation (exit reason 48): The guest’s page tables were fine, but EPT rejected the GPA → HPA translation. The guest OS has no idea this happened. Only the hypervisor sees it.

GVA → guest page tables → GPA → EPT → HPA
         page fault ↑              ↑ EPT violation

EPT Misconfiguration (Exit Reason 49)

Different from EPT violations. A misconfiguration means the EPT entry itself is structurally invalid - for example, a write-only page (write=1, read=0) which Intel does not allow. The CPU cannot meaningfully process the entry.

EPT misconfigurations usually indicate a hypervisor bug, not a guest behavior issue. Check your EPT construction logic when you see exit reason 49.

5. EPT-Based Memory Monitoring

The reason EPT matters for security research: it provides transparent memory access control below the operating system.

Read/Write Monitoring

Set an EPT page to execute-only (read=0, write=0, execute=1). Any data read or write by the guest triggers an EPT violation, but code execution continues normally.

Use case: monitoring access to sensitive structures without modifying the guest. The guest kernel cannot detect this monitoring because EPT operates below its privilege level.

Execute Monitoring

Set an EPT page to read/write but not execute (read=1, write=1, execute=0). Any instruction fetch from that page triggers an EPT violation.

Use case: detecting code execution in data regions, monitoring shellcode, tracking JIT compilation.

The Split-TLB Approach

A more advanced technique: maintain two EPT views of the same physical memory.

View A: read+write, no execute. Used for data access.
View B: execute-only, no read/write. Used for code execution.

Switch between views on EPT violations. This allows monitoring both code execution and data access to the same memory region, which is useful for detecting self-modifying code or analyzing packed executables.

The complexity cost is significant. Each EPT violation requires determining whether to switch views, and high-frequency switching destroys performance. This is a research technique, not a production approach.

6. Debugging EPT Issues

EPT bugs are hard to debug because the symptoms are indirect. The guest crashes, hangs, or behaves incorrectly, with no obvious connection to EPT.

Diagnostic Checklist

Guest triple-faults immediately after VMLAUNCH:

EPT identity map is incomplete. The guest’s first instruction fetch hits an unmapped GPA.
Check that the guest RIP’s physical page is mapped with execute permission.

Guest runs but crashes accessing devices:

MMIO regions are not mapped in EPT.
Map at minimum: local APIC (0xFEE00000), IOAPIC, any device the guest uses.

Random guest corruption:

Memory type mismatch. EPT entry memory type conflicts with MTRR settings.
For RAM, use write-back. For MMIO, use uncacheable.

Performance is unexpectedly terrible:

Using 4 KB pages where 2 MB pages would work. Every TLB miss is more expensive with smaller pages.
VPID not enabled. Without VPID, every VM exit/entry flushes TLB entries.

EPT Walk Validation

When debugging, manually walk the EPT for a suspect GPA:

Given GPA = 0x00000000_001F5000:

PML4 index   = (GPA >> 39) & 0x1FF = 0
PDPT index   = (GPA >> 30) & 0x1FF = 0
PD index     = (GPA >> 21) & 0x1FF = 0
PT index     = (GPA >> 12) & 0x1FF = 0x1F5

Walk: EPTP → PML4[0] → PDPT[0] → PD[0] → PT[0x1F5]

At each level, verify:

The entry is present (at least one permission bit set, or the entry is valid)
The physical address in the entry points to a real allocated table
Permission bits match your intent

7. EPT in Defensive Security

This is where EPT gets interesting from a defensive perspective. A bunch of modern security products rely on EPT, and knowing how they use it helps you evaluate what they actually protect.

EPT can make kernel code pages non-writable at the hardware level. Any attempt to patch kernel code - a common rootkit technique - triggers an EPT violation. Microsoft’s HVCI relies on this principle.

During incident response, a hypervisor can read guest physical memory through EPT without being subverted by kernel-level rootkits that manipulate the OS’s own page tables. Forensic analysts get a trustworthy view of memory that the OS itself can’t tamper with.

EPT execute-monitoring can also detect when data regions start executing code - a strong indicator of exploitation. EDR products increasingly use hypervisor-based telemetry for exactly this, because it operates below the level where malware can interfere.

Windows Credential Guard uses a separate VTL (Virtual Trust Level) with its own EPT mapping to isolate LSASS secrets. Even if an attacker gains kernel access in VTL0, EPT prevents reading the isolated memory.

All of these rely on the same thing: EPT gives you a monitoring layer that the OS and anything running inside it cannot see or bypass.

Conclusion

EPT transforms a hypervisor from “a thing that runs a guest” into “a thing that controls what the guest sees.” The identity map gets things working. Permission manipulation makes things interesting.

Quick recap of what matters:

Two-layer translation: GVA → GPA → HPA, each with its own page tables
EPT violations (reason 48) are your primary tool for memory monitoring
EPT violations are not page faults - different translation layer entirely
Start with identity mapping using 2 MB large pages
Memory type and MTRR interaction causes the most subtle bugs
EPT-based monitoring sits below the OS, which is what makes it powerful

Next up: the PEB (Process Environment Block) on Windows. Completely different domain, but the same theme of understanding internal structures to do useful security work.

PEB Internals: What the Process Environment Block Reveals and Why Defenders Care

2026-05-15T00:00:00+00:00

Every process on Windows has a Process Environment Block (PEB). Most developers never interact with it directly - the Win32 API abstracts everything away. But if you’re doing malware analysis, EDR engineering, or any kind of defensive work, the PEB comes up constantly.

1. What Is the PEB?

The PEB is a user-mode structure that the Windows kernel creates for every process. It lives in the process’s own address space, readable from user mode without any system calls. This is by design - many common operations (checking if the debugger is attached, enumerating loaded modules, reading environment variables) need this data without the overhead of a syscall.

The PEB is accessible through the TEB (Thread Environment Block), which itself is pointed to by the GS segment register on x64:

// x64: TEB is at GS:[0x30], PEB is at TEB+0x60
PEB* peb = (PEB*)__readgsqword(0x60);

On x86:

// x86: TEB is at FS:[0x18], PEB is at TEB+0x30
PEB* peb = (PEB*)__readfsdword(0x30);

Why This Matters for Defense

Because the PEB is in user-mode memory, any code running in the process can read and modify it. This is both a feature and a security concern. Legitimate code reads the PEB for process information. Malware modifies the PEB to hide its tracks.

2. Key PEB Fields

The PEB is large and version-dependent. These are the fields that matter most for security analysis:

`BeingDebugged` (offset 0x02)

A single byte. Set to 1 when a debugger is attached via DebugActiveProcess or when the process is started under a debugger.

if (peb->BeingDebugged) {
    // debugger is attached
}

This is equivalent to calling IsDebuggerPresent(), which literally just reads this byte. Malware commonly checks this field and alters its behavior. Anti-analysis code typically modifies this field to 0 to hide the debugger.

If your EDR sees a process where BeingDebugged is 0 but debug events are active on the process, something is manipulating the PEB.

`Ldr` (offset 0x18) - PEB_LDR_DATA

Pointer to the loader data structure, which contains three linked lists of loaded modules:

InLoadOrderModuleList - modules in load order
InMemoryOrderModuleList - modules in memory address order
InInitializationOrderModuleList - modules in initialization order

Each entry is an LDR_DATA_TABLE_ENTRY containing:

typedef struct _LDR_DATA_TABLE_ENTRY {
    LIST_ENTRY InLoadOrderLinks;
    LIST_ENTRY InMemoryOrderLinks;
    LIST_ENTRY InInitializationOrderLinks;
    PVOID DllBase;               // base address of the module
    PVOID EntryPoint;            // entry point
    ULONG SizeOfImage;           // size in memory
    UNICODE_STRING FullDllName;  // full path
    UNICODE_STRING BaseDllName;  // just the filename
    // ... more fields
} LDR_DATA_TABLE_ENTRY;

Walking these lists is how tools like Process Explorer enumerate loaded DLLs. But malware can unlink entries from these lists to hide injected DLLs. The module is still loaded in memory, but PEB-based enumeration won’t find it.

`ProcessParameters` (offset 0x20) - RTL_USER_PROCESS_PARAMETERS

Contains the command line, current directory, environment variables, image path, and window information:

RTL_USER_PROCESS_PARAMETERS* params = peb->ProcessParameters;
// params->CommandLine      - full command line
// params->ImagePathName    - path to the executable
// params->Environment      - environment variable block
// params->CurrentDirectory - working directory

Malware can modify ImagePathName to make the process appear to be running from a different location. Comparing PEB ImagePathName with the actual executable path (from kernel structures) reveals this tampering.

`NtGlobalFlag` (offset 0x68 on x86, 0xBC on x64)

Debug-related flags set by the OS. When a debugger creates a process, certain flags are set:

FLG_HEAP_ENABLE_TAIL_CHECK (0x10)
FLG_HEAP_ENABLE_FREE_CHECK (0x20)
FLG_HEAP_VALIDATE_PARAMETERS (0x40)

Combined value when debugging: 0x70.

Malware checks this: if NtGlobalFlag contains 0x70, a debugger likely started the process. This is a more subtle check than BeingDebugged and is missed by naive anti-anti-debug tools.

3. PEB-Based Module Enumeration

Walking the PEB loader lists is the standard way to enumerate modules from user mode without calling EnumProcessModules or CreateToolhelp32Snapshot - API calls that security tools monitor.

The Walk

PEB* peb = get_peb();
PEB_LDR_DATA* ldr = peb->Ldr;
LIST_ENTRY* head = &ldr->InLoadOrderModuleList;
LIST_ENTRY* current = head->Flink;

while (current != head) {
    LDR_DATA_TABLE_ENTRY* entry =
        CONTAINING_RECORD(current, LDR_DATA_TABLE_ENTRY, InLoadOrderLinks);

    // entry->BaseDllName.Buffer - module name
    // entry->DllBase            - base address
    // entry->SizeOfImage        - size

    current = current->Flink;
}

Why Malware Does This

Calling GetModuleHandle or LoadLibrary goes through the Windows API, which EDR products hook. PEB walking achieves the same result (finding a module base address) without touching any hooked functions.

This is a common pattern in malware:

Walk PEB to find kernel32.dll base address
Parse its export table to find GetProcAddress
Use GetProcAddress to resolve everything else

No API calls that an EDR can intercept in the traditional sense.

How to Catch It

Even if malware avoids API hooks, ETW providers at the kernel level still see module loads. Comparing ETW module load events with PEB module lists reveals unlinking.

You can also scan process memory for PE headers (MZ / PE signatures) and compare against the PEB module list. If there’s a PE header in memory that doesn’t appear in the loader lists, something was unlinked.

Periodically comparing PEB module lists against kernel-side structures (EPROCESS.VadRoot, kernel module lists) works too. The kernel still tracks the memory regions even after PEB unlinking.

4. PEB Manipulation Techniques and Detection

Module Unlinking

The most common PEB manipulation: removing a LDR_DATA_TABLE_ENTRY from the three loader lists. After unlinking, the module is still loaded and functional, but will not appear when enumerating modules via the PEB.

void unlink_module(LDR_DATA_TABLE_ENTRY* entry) {
    entry->InLoadOrderLinks.Blink->Flink = entry->InLoadOrderLinks.Flink;
    entry->InLoadOrderLinks.Flink->Blink = entry->InLoadOrderLinks.Blink;
    // ... same for the other two lists
}

To catch this, compare the PEB module list against:

VAD (Virtual Address Descriptor) tree from kernel mode - memory regions are still tracked
PE header scanning in the process address space
ETW module load events (the load was logged before unlinking happened)

ImagePathName Spoofing

Overwriting ProcessParameters->ImagePathName to a different path. Some security tools trust this field to identify what binary is running.

Compare with EPROCESS.ImageFileName in kernel mode, or with the QueryFullProcessImageName result, which reads from kernel structures.

BeingDebugged / NtGlobalFlag Clearing

Malware zeroes these fields to evade anti-debug checks by analysis tools or its own anti-analysis routines.

From a debugger, compare expected debug state with PEB values. Tools like ScyllaHide do the reverse (clear these fields to help analysts), which is useful during malware analysis.

CommandLine Modification

Overwriting ProcessParameters->CommandLine after process creation. Process creation events capture the original command line, but tools querying the PEB later see the modified version.

Process creation events (Sysmon Event ID 1, ETW) capture the original command line. Comparing with the PEB reveals tampering.

5. PEB and the Heap

The PEB contains pointers to process heaps:

PVOID  ProcessHeap;              // default process heap
ULONG  NumberOfHeaps;            // total heap count
ULONG  MaximumNumberOfHeaps;     // maximum heap count
PVOID* ProcessHeaps;             // array of heap pointers

Heap-Based Debug Detection

The process heap (from GetProcessHeap() or PEB->ProcessHeap) has debug-specific flags when a debugger creates the process:

Heap->Flags should be HEAP_GROWABLE (0x02) normally
Under a debugger: additional flags like HEAP_TAIL_CHECKING_ENABLED, HEAP_FREE_CHECKING_ENABLED
Heap->ForceFlags should be 0 normally, non-zero under debugger

This is a more reliable anti-debug check than BeingDebugged because many anti-anti-debug tools forget to patch heap flags.

This one has bitten me during analysis. You patch BeingDebugged thinking you’re clean, and the malware still detects you because you forgot about the heap flags. A lot of anti-anti-debug tools miss this.

6. PEB Across Windows Versions

The PEB grows with each Windows version. Microsoft adds fields but does not remove them, maintaining backward compatibility. Key additions over time:

Windows Vista: Added AppCompatFlags, AppCompatFlagsUser
Windows 8: Added AppModelPolicy
Windows 10: Added LeapSecondData, ActiveCodePage
Windows 11: Various additions for security mitigations

Practical Implication

When writing tools that read the PEB, always verify the Windows version and use the correct structure layout. Using the wrong offsets causes silent reads of wrong fields. I’ve seen tools read garbage for months because of this.

The best approach: use NtQueryInformationProcess(ProcessBasicInformation) to get the PEB address, then read fields at known offsets rather than relying on a compiled structure definition that might not match the running OS version.

7. Defensive Tooling Using PEB Analysis

Integrity Monitoring

A lightweight detection approach: periodically snapshot PEB state and compare:

Module list: any entries added or removed since last check?
ImagePathName: still matches the real binary path?
BeingDebugged / NtGlobalFlag: consistent with actual debug state?
CommandLine: matches process creation event?

Changes to any of these fields outside of expected operations are strong indicators of compromise or tampering.

EDR Integration

Modern EDRs combine PEB inspection with other telemetry:

PEB module list + VAD scan + ETW events = comprehensive module visibility
PEB CommandLine + Sysmon creation event = tamper detection
PEB heap analysis + debug state = environment fingerprinting detection

Forensic Analysis

During incident response, dumping the PEB provides immediate context:

What modules are loaded (and what’s hiding)?
What was the real command line?
Is the process environment modified?
Are there signs of debugger evasion?

Tools like Volatility can extract PEB data from memory dumps, making this analysis possible even on dead systems.

Conclusion

The PEB is small but it comes up everywhere. Attackers read it to avoid API hooks. They modify it to hide their presence. Defenders read it to catch that modification.

The big points: the PEB is user-mode writable, so always assume it can be tampered with. Module enumeration via PEB is a malware staple because it avoids hooked APIs. Defensive detection works by comparing PEB state against kernel-side ground truth. And debug detection through PEB goes well beyond IsDebuggerPresent - heap flags and NtGlobalFlag catch a lot of analysts off guard.

Every detection rule for PEB tampering starts with knowing how the tampering actually works.

VMT Hooking: How It Works and How to Detect It

2026-05-12T00:00:00+00:00

Virtual Method Table (VMT) hooking is one of the oldest and most reliable hooking techniques on Windows. It exploits a fundamental C++ runtime mechanism - the vtable - to redirect virtual function calls without patching any code bytes. No code modification means most integrity scanners miss it entirely.

1. C++ Virtual Method Tables

Every C++ class with at least one virtual function has a vtable: a static array of function pointers, one per virtual method, in declaration order.

class IRenderer {
public:
    virtual void Initialize() = 0;  // vtable[0]
    virtual void BeginFrame() = 0;  // vtable[1]
    virtual void EndFrame() = 0;    // vtable[2]
    virtual void Present() = 0;     // vtable[3]
};

When an object is instantiated, its first 8 bytes (on x64) are a pointer to the class vtable:

Object in memory:
  +0x00: vtable pointer → [Initialize, BeginFrame, EndFrame, Present]
  +0x08: member data...

A virtual call like renderer->Present() compiles to:

mov  rax, [rcx]          ; load vtable pointer from object
call [rax + 0x18]        ; call vtable[3] (Present)

The CPU reads the vtable pointer, indexes into the table, and calls whatever address it finds. There is no validation that the address is legitimate.

2. How VMT Hooking Works

VMT hooking replaces an entry in the vtable with a pointer to a different function. After the hook, every virtual call to that method on any object using that vtable is redirected.

Method 1: Direct Vtable Patch

Overwrite a single entry in the vtable:

void** vtable = *(void***)target_object;

// Save original
original_Present = vtable[3];

// The vtable is typically in .rdata (read-only), so change protection first
DWORD old_protect;
VirtualProtect(&vtable[3], sizeof(void*), PAGE_READWRITE, &old_protect);

// Replace entry
vtable[3] = hooked_Present;

// Restore protection
VirtualProtect(&vtable[3], sizeof(void*), old_protect, &old_protect);

Pros: Simple, affects all objects sharing this vtable. Cons: Modifies the original vtable in .rdata, which can be detected by integrity checks.

Method 2: Vtable Replacement

Allocate a new vtable, copy all entries, modify the target entry, then point the object’s vtable pointer to the new table:

void** original_vtable = *(void***)target_object;
int vtable_size = count_vtable_entries(original_vtable);

// Allocate new vtable
void** new_vtable = (void**)malloc(vtable_size * sizeof(void*));
memcpy(new_vtable, original_vtable, vtable_size * sizeof(void*));

// Hook one entry in the copy
new_vtable[3] = hooked_Present;

// Point object to new vtable
*(void***)target_object = new_vtable;

Pros: Original vtable is untouched. Only one object is affected. Cons: The new vtable is in heap memory, which is unusual. Only affects the specific object instance.

3. Why VMT Hooks Are Hard to Detect

Inline hooks (patching the first bytes of a function with a JMP) are well-understood and widely detected:

Code integrity scanners compare function prologues against known-good copies
JMP or INT3 instructions at function entry are obvious indicators
ETW and kernel callbacks can detect code modification in certain scenarios

VMT hooks avoid all of this:

No code modification. The function code is untouched. Only a data pointer changes.
No executable memory changes. The modification is in a data section or on the heap.
Legitimate-looking calls. The CPU’s virtual dispatch mechanism works normally. The call instruction hasn’t changed.
No unusual instructions. There is no JMP to a trampoline, no INT3, no detour.

This is why traditional code integrity scanning doesn’t catch VMT hooks.

4. Real-World Attack Patterns

Here’s where VMT hooks actually show up in practice.

COM Object Hooking

Windows COM (Component Object Model) is built on vtables. Every COM interface is a vtable. Hooking a COM object’s vtable redirects interface method calls:

IUnknown vtable:
  [0] QueryInterface
  [1] AddRef
  [2] Release

IDXGISwapChain vtable:
  [0] QueryInterface
  [1] AddRef
  [2] Release
  ...
  [8] Present        ← commonly hooked

DXGI hooking via Present is used by game overlays (Steam, Discord), screen recorders, and performance tools. But the same technique can be used by malware to intercept graphics output or inject visual elements.

Browser COM Hooking

Browsers expose COM interfaces for automation. Hooking these interfaces allows intercepting web traffic, modifying page content, or stealing credentials - all through vtable manipulation that won’t trigger code integrity alerts.

Security Product Bypass

Some security products expose COM or C++ interfaces that can be hooked via VMT. If a security scanning function is virtual, replacing its vtable entry effectively disables that scan while the product continues to report as healthy.

5. Detection Strategies

Strategy 1: Vtable Pointer Validation

For known objects, verify that the vtable pointer points to the expected .rdata section of the correct module:

bool validate_vtable(void* object, HMODULE expected_module) {
    void** vtable = *(void***)object;

    MODULEINFO module_info;
    GetModuleInformation(GetCurrentProcess(), expected_module,
                         &module_info, sizeof(module_info));

    uintptr_t vtable_addr = (uintptr_t)vtable;
    uintptr_t module_start = (uintptr_t)module_info.lpBaseOfDll;
    uintptr_t module_end = module_start + module_info.SizeOfImage;

    // Vtable should be within the module's image
    return (vtable_addr >= module_start && vtable_addr < module_end);
}

If the vtable pointer points to heap memory or an unknown module, the object’s vtable has likely been replaced.

Strategy 2: Vtable Entry Validation

Even if the vtable itself is in the correct location, individual entries might be patched. Validate that each entry points to the expected module:

bool validate_vtable_entry(void** vtable, int index, HMODULE expected_module) {
    void* func_ptr = vtable[index];

    MODULEINFO module_info;
    GetModuleInformation(GetCurrentProcess(), expected_module,
                         &module_info, sizeof(module_info));

    uintptr_t func_addr = (uintptr_t)func_ptr;
    uintptr_t module_start = (uintptr_t)module_info.lpBaseOfDll;
    uintptr_t module_end = module_start + module_info.SizeOfImage;

    return (func_addr >= module_start && func_addr < module_end);
}

An entry pointing outside the expected module is a strong hook indicator.

Strategy 3: .rdata Integrity Comparison

Compare the in-memory vtable against the on-disk copy of the module:

Load the PE file from disk
Find the .rdata section
Locate the vtable by its known offset or by matching RTTI data
Compare each entry against the in-memory version

Any discrepancy indicates tampering. This is the most thorough approach but requires knowing which vtables to check.

Strategy 4: RTTI (Run-Time Type Information) Validation

MSVC stores RTTI data adjacent to the vtable. The vtable pointer at index -1 points to a CompleteObjectLocator structure, which contains class hierarchy information.

vtable[-1] → CompleteObjectLocator
               → TypeDescriptor (class name string)
               → ClassHierarchyDescriptor

If the RTTI chain is missing, corrupted, or points to unexpected locations, the vtable has been tampered with. Legitimate vtables always have valid RTTI in MSVC builds (unless compiled with /GR-).

Strategy 5: Memory Region Analysis

Vtables belong in .rdata (read-only initialized data). Check the memory protection of the region containing the vtable:

MEMORY_BASIC_INFORMATION mbi;
VirtualQuery(vtable, &mbi, sizeof(mbi));

if (mbi.Protect != PAGE_READONLY) {
    // .rdata should be PAGE_READONLY
    // PAGE_READWRITE suggests tampering
}

if (mbi.Type == MEM_PRIVATE) {
    // vtable is in heap/private memory, not in a module image
    // strong indicator of vtable replacement
}

6. Monitoring and Alerting

Periodic Integrity Scans

For high-value targets (security-critical COM interfaces, graphics subsystem objects), periodically validate vtable integrity:

Enumerate known critical objects
Validate vtable pointers and entries against expected modules
Check RTTI integrity
Log and alert on deviations

ETW Integration

ETW providers can be configured to monitor for:

VirtualProtect calls targeting .rdata sections (needed for direct vtable patches)
Memory allocation near module image ranges (might indicate vtable replacement setup)
Suspicious memcpy patterns targeting known vtable locations

Kernel-Level Monitoring

A kernel driver can:

Monitor page table permission changes for .rdata pages
Use hypervisor-based EPT monitoring (see EPT Internals) to detect writes to vtable memory without depending on user-mode integrity checks

This is the strongest detection approach, as it cannot be subverted from user mode.

7. VMT Hooking vs Other Hooking Techniques

Aspect	Inline Hook	IAT Hook	VMT Hook
Modifies code	Yes	No	No
Modifies data	No	Yes (IAT)	Yes (vtable)
Scope	All callers	Import callers only	Virtual call callers
Code integrity detection	Easy	Medium	Hard
Requires C++ target	No	No	Yes
Detectable by ETW	Partially	Partially	Harder

VMT hooking occupies a specific niche: it requires a C++ virtual interface, but in return it is the most difficult hook type to detect with standard code scanning tools. For defenders, this means that code integrity alone is not sufficient - data integrity must also be verified.

Conclusion

VMT hooking exploits a core C++ mechanism that was never designed with adversarial use in mind. The vtable is trust-by-convention: the runtime assumes function pointers are legitimate because nothing validates them.

The bottom line: code integrity scanning alone does not catch VMT hooks. You need data integrity checks too. Vtable pointers should point to .rdata in the expected module, not to heap or unknown regions. RTTI validation helps for MSVC-compiled binaries. COM interfaces are vtable-based and represent a broad attack surface. And if you really want the strongest detection, hypervisor-based EPT monitoring operates below anything user-mode can subvert.

If you’re only scanning for inline hooks, you’re leaving a gap that adversaries know about.

CRT vs NoCRT: How the C Runtime Helps Defenders Catch Injected DLLs

2026-05-10T00:00:00+00:00

When an attacker injects a DLL into a process, one of the first decisions they make - whether they realize it or not - is whether to link the C Runtime Library (CRT). That decision leaves distinct forensic traces that defenders can use to detect the injection.

1. What the CRT Actually Does

When you compile a DLL with Visual Studio using the default settings, the C Runtime Library is linked in. The CRT is not just printf and malloc - it’s a significant initialization framework that runs before your code.

When a CRT-linked DLL is loaded, this happens before DllMain executes:

Security cookie initialization (__security_init_cookie) - generates a random stack canary value
CRT heap initialization - sets up the CRT’s internal heap
Thread-local storage initialization - initializes TLS slots
Atexit/onexit registration - prepares cleanup handlers
Floating-point initialization - configures FPU state
Global C++ constructor calls - runs static object constructors (_initterm)

The actual entry point of a CRT-linked DLL is not DllMain - it is _DllMainCRTStartup, which does all of the above and then calls your DllMain.

The security cookie (/GS flag, enabled by default) is the most visible CRT artifact. The function __security_init_cookie generates a random value at DLL load time and stores it in __security_cookie. Every function that uses stack buffers places this value on the stack and validates it before returning.

The initialization is easy to spot in a disassembler:

_DllMainCRTStartup:
    call    __security_init_cookie    ; ← distinctive CRT artifact
    jmp     dllmain_dispatch

This single call is one of the most reliable indicators that a DLL was compiled with the CRT.

2. CRT-Linked DLLs: What Defenders See

A DLL compiled with the CRT has a recognizable fingerprint. Here’s what to look for.

Import Table

CRT-linked DLLs import from CRT libraries. The specific imports depend on the linking mode:

Dynamic CRT (/MD):

vcruntime140.dll
ucrtbase.dll (or api-ms-win-crt-*.dll on newer Windows)
Possibly msvcp140.dll for C++ standard library

Static CRT (/MT):

No CRT DLL imports (everything is compiled into the binary)
But the code patterns are still present in the .text section

Entry Point Pattern

The entry point follows a predictable pattern:

; _DllMainCRTStartup
push    rbp
mov     rbp, rsp
sub     rsp, 0x20
call    __security_init_cookie
; ... CRT initialization ...
call    dllmain_dispatch
; ... CRT cleanup ...

The call to __security_init_cookie near the entry point is a strong CRT indicator. This function reads RDTSC, GetCurrentProcessId, GetCurrentThreadId, GetSystemTimeAsFileTime, and QueryPerformanceCounter to generate entropy for the cookie. Those API calls or their patterns are detectable.

.rdata and .data Sections

CRT-linked DLLs contain specific global variables:

__security_cookie - the canary value (in .data or .rdata)
_onexit_table - atexit cleanup handlers
__acrt_iob_func references for stdio
CRT error messages as strings (“runtime error”, “assertion failed”)

Section Layout

A typical CRT DLL has well-structured sections:

.text    - code (substantial, includes CRT runtime code)
.rdata   - read-only data, vtables, CRT strings
.data    - writable data, security cookie, global state
.pdata   - exception handling unwind data
.rsrc    - resources (optional)
.reloc   - relocation table

The .pdata section (exception unwind information) is almost always present in CRT DLLs because the CRT uses structured exception handling.

3. Why Attackers Avoid the CRT

Sophisticated attackers compile DLLs without the CRT for several reasons.

A minimal NoCRT DLL can be 4-8 KB. A CRT-linked DLL starts at 50-100 KB. Smaller files are easier to inject, less likely to trigger size-based heuristics, and faster to write into remote process memory.

The CRT also pulls in dozens of API imports, and each import is a potential detection point. A NoCRT DLL can operate with just a handful of functions from ntdll.dll or kernel32.dll.

CRT initialization calls multiple API functions that EDR products monitor. Skipping it means the DLL’s entry point runs directly - less telemetry generated. The CRT also brings code the attacker doesn’t need, with its own behavior (heap allocations, TLS operations, exception handlers) that creates noise.

And many detection rules are tuned to CRT-compiled binaries because that’s what most software produces. A NoCRT binary doesn’t match those patterns - which is itself a signal, as we’ll see.

How NoCRT DLLs Are Built

// NoCRT entry point - no _DllMainCRTStartup wrapper
BOOL WINAPI _DllMainCRTStartup(HINSTANCE hinstDLL, DWORD fdwReason, LPVOID lpReserved) {
    if (fdwReason == DLL_PROCESS_ATTACH) {
        // attacker code runs directly here
    }
    return TRUE;
}

Compiled with:

/NODEFAULTLIB - no CRT libraries
/ENTRY:_DllMainCRTStartup - custom entry point
/GS- - no security cookie (requires CRT)
No printf, malloc, new - only direct Win32 or NT API calls

4. NoCRT DLLs: What Defenders See

The absence of CRT artifacts is just as distinctive as their presence.

No __security_init_cookie call at the entry point. No __security_cookie global. No __GSHandlerCheck exception handlers. For a DLL that does anything non-trivial, this is unusual.

A DLL with a .text section larger than 4 KB but no security cookie initialization is suspicious. Legitimate developers almost never disable /GS because it’s the default and has negligible performance cost.

Minimal Import Table

A NoCRT DLL often imports only from kernel32.dll or ntdll.dll, with a handful of functions:

kernel32.dll:
    VirtualAlloc
    VirtualProtect
    CreateThread
    LoadLibraryA
    GetProcAddress

Or even more minimal, using only ntdll.dll native API:

ntdll.dll:
    NtAllocateVirtualMemory
    NtProtectVirtualMemory
    NtCreateThreadEx
    LdrLoadDll

A DLL that imports exclusively from ntdll.dll with no CRT imports is highly unusual for legitimate software. Most legitimate DLLs use the Win32 API layer.

Tiny File Size

A NoCRT DLL doing real work can be 4-15 KB. Legitimate DLLs with business logic are almost always larger. The distribution of DLL sizes in a normal process is skewed toward larger files.

Flag unsigned DLLs under 20 KB loaded into processes where the typical module size is much larger.

Flat Entry Point

NoCRT DLLs have a simple entry point that goes directly to attacker logic:

_DllMainCRTStartup:
    cmp     edx, 1          ; DLL_PROCESS_ATTACH
    jne     short return_true
    ; ... immediately does attacker work ...
return_true:
    mov     eax, 1
    ret

Compare this with a CRT entry point that has initialization calls, exception handling setup, and a structured dispatch to DllMain. The difference is visible in static analysis.

Missing .pdata Section

NoCRT DLLs compiled without exception handling often lack a .pdata section entirely. On x64 Windows, the .pdata section contains unwind information for structured exception handling. Its absence means the DLL has no SEH support.

A x64 DLL without .pdata is unusual. Not definitive on its own, but combined with other NoCRT indicators it strengthens the signal.

5. The Signature Gap

This is the most straightforward detection opportunity.

Legitimate CRT-linked DLLs are almost always signed. Microsoft, Adobe, Google, game studios - every major software vendor signs their DLLs. The CRT itself (vcruntime140.dll, ucrtbase.dll) is Microsoft-signed.

An unsigned DLL that uses the CRT is suspicious because:

If the developer was professional enough to use the CRT (standard build process), they would typically also sign their binaries
Legitimate unsigned DLLs exist (open-source plugins, internal tools) but they are a small population
An injected DLL is, by definition, not part of the original application - it will not be signed by the application vendor

An unsigned DLL loaded into a process where all other DLLs are signed is a strong anomaly signal. If it also uses the CRT, it was compiled with standard tooling but not through a standard release process.

Why CRT Makes Unsigned DLLs Easier to Catch

Here’s the thing: __security_init_cookie calls 4 external functions to gather entropy:

GetSystemTimeAsFileTime
QueryPerformanceCounter
GetCurrentProcessId
GetCurrentThreadId

Every one of these can be hooked by a security product. When the hook fires, the defender inspects the return address on the call stack. That return address points back into the calling module - the injected DLL. Walking the call stack reveals the full chain:

GetSystemTimeAsFileTime          ← hooked, EDR gets control
  ← __security_init_cookie       ← return address is inside injected DLL
    ← _DllMainCRTStartup         ← CRT entry point
      ← LdrpCallInitRoutine      ← ntdll loader

The return address lands in a memory region. The defender checks: does this region belong to a signed, known module? If the return address resolves to an unsigned image, or to memory that is not backed by any loaded image at all, it is a strong injection indicator.

This is not a problem for legitimate DLLs. A signed module from a known vendor calling these same functions during normal CRT initialization will pass the return address check - the address resolves cleanly to a signed image with a valid certificate chain. The detection specifically targets the gap between “uses standard CRT tooling” and “did not go through a standard signing and distribution process.”

Beyond the security cookie, the CRT generates additional telemetry:

CRT heap initialization calls HeapCreate or RtlCreateHeap - same return address analysis applies
TLS callbacks are registered and executed - monitored by ETW
If dynamic CRT (/MD), loading the DLL triggers loads of vcruntime140.dll and ucrtbase.dll - module load events that EDR monitors

An attacker using a NoCRT DLL avoids all of these hook trigger points - but as covered in section 4, the absence of these patterns is also detectable through structural analysis.

6. Building a Detection Matrix

Combining these signals into a scoring system:

Signal	CRT DLL (suspicious)	NoCRT DLL (suspicious)	Weight
Unsigned	Strong indicator	Strong indicator	High
No security cookie	N/A	Present	Medium
Minimal imports	Unlikely	Likely	Medium
Small file size (<20 KB)	Unlikely	Likely	Medium
No `.pdata` section	Unlikely	Likely	Low
ntdll-only imports	Very unlikely	Possible	High
Not in application manifest	Strong indicator	Strong indicator	High
Loaded after process init	Moderate indicator	Moderate indicator	Medium
No version info resource	Moderate indicator	Likely	Low

Detection Algorithm

score = 0

if dll.is_unsigned:
    score += 30

if dll.loaded_after_process_init:
    score += 15

if not dll.has_security_cookie and dll.text_size > 0x1000:
    score += 20  # NoCRT indicator

if dll.import_count < 5:
    score += 15

if dll.file_size < 0x5000:  # 20 KB
    score += 10

if dll.imports_only_ntdll:
    score += 25

if not dll.has_pdata_section:
    score += 5

if not dll.has_version_info:
    score += 5

if score >= 50:
    alert("suspicious DLL injection detected")

This is a simplified example. Production EDR systems use more sophisticated scoring with machine learning and behavioral context. But the core signals are the same.

7. ETW and Kernel-Level Detection

Module Load Events

ETW provides IMAGE_LOAD events whenever a DLL is loaded. Each event includes:

Image file path
Image base address
Image size
Process ID
Signing level and signature status

Monitoring these events for unsigned images loaded after process initialization is the foundation of DLL injection detection.

Thread Creation Events

DLL injection typically involves creating a remote thread (via CreateRemoteThread, NtCreateThreadEx, or APC injection). ETW THREAD_START events capture:

Start address - does it point into a known module?
Thread creation time relative to process creation
Calling process (for remote thread creation)

A thread starting at an address that does not belong to any known signed module is a strong injection indicator.

Combining Telemetry

The strongest detection comes from correlating events:

IMAGE_LOAD for an unsigned DLL → timestamp T1
THREAD_START with start address in that DLL → timestamp T2
T2 shortly after T1 → high confidence injection

If the DLL also matches NoCRT patterns (small, minimal imports, no security cookie), the confidence increases further.

8. Practical Recommendations for Defenders

For EDR Engineers

Don’t just check signatures - also look for DLLs signed with revoked or untrusted certificates. Build a baseline of expected DLLs per application; any new DLL that appears in a stable application is worth investigating. Detect CRT absence, not just CRT presence - a DLL with no CRT artifacts doing complex work is more suspicious than one with the CRT. And watch for unexpected vcruntime140.dll or ucrtbase.dll loads, which signal something new was injected with CRT linkage.

For Malware Analysts

Check the entry point first. CRT vs NoCRT is immediately visible from the entry point structure. Examine import table density - NoCRT malware often resolves APIs dynamically after load, so look for GetProcAddress chains or manual export table walking. And don’t forget about statically linked CRT (/MT): no CRT imports show up, but the code is still there in the binary.

For Blue Teams

Sysmon Event ID 7 (Image Loaded) with signature status filtering catches unsigned DLLs immediately. WDAC or AppLocker can block unsigned DLLs from loading entirely in high-security environments. Module load auditing with baseline comparison detects any new DLL in monitored processes.

Conclusion

The CRT is not a security feature - it is a development convenience. But its presence or absence creates a distinctive forensic fingerprint that defenders can use.

An unsigned CRT-linked DLL is easy to catch because the CRT generates initialization telemetry, imports from known CRT libraries, and follows a recognizable structure. Attackers who avoid the CRT to reduce this footprint create a different but equally detectable pattern: minimal imports, no security cookie, tiny file size, and missing standard sections.

For defenders, the lesson is to detect in both directions:

CRT present + unsigned = amateur or careless injection, catch on telemetry and signature
CRT absent + unusual characteristics = deliberate evasion, catch on structural anomalies

Neither choice is invisible to a well-instrumented environment.

Youssix

VMCS by Practice: Notes from Writing a Hypervisor

1. How Many VMCS Exist, and How Many Are Current?

Allocated vs Current vs Launched

Maximum Current VMCS at Once

VMCS Migration Trap

2. Why Does VMPTRLD Fail Right After Allocation?

The VMCS Revision Identifier

Aside: VMXON Region

Physical vs Virtual Address Trap

Alignment Requirements

Why Intel Designed It This Way

3. The 6 Regions of a VMCS

4. VM_EXIT_REASON vs VM_INSTRUCTION_ERROR

VM_EXIT_REASON

VM_INSTRUCTION_ERROR

The Car Analogy

The Beginner Trap

VMfailInvalid vs VMfailValid

Common VM_INSTRUCTION_ERROR Codes

Practical Rule

5. Debugging Exit Reason 0

Exit Reason 0 = Exception or NMI

VM_EXIT_INTR_INFO

Page Faults (#PF, vector 14)

General Protection Faults (#GP, vector 13)

Invalid Opcode (#UD, vector 6)

Useful Debug Checklist for Exit Reason 0

6. Suggested Reading Order for SDM Vol. 3C

External References

Conclusion

EPT Internals: Understanding Intel’s Second Layer of Paging

1. The Two-Layer Translation Model

The Cost of Nested Translation

2. EPT Structure Layout

EPT Entry Format

The EPTP (EPT Pointer)

3. Building an Identity Map

Allocation Strategy

Common Identity Map Mistakes

4. EPT Violations (VM Exit Reason 48)

Exit Qualification

Relevant VMCS Fields

EPT Violation vs Page Fault

EPT Misconfiguration (Exit Reason 49)

5. EPT-Based Memory Monitoring

Read/Write Monitoring

Execute Monitoring

The Split-TLB Approach

6. Debugging EPT Issues

Diagnostic Checklist

EPT Walk Validation

7. EPT in Defensive Security

Conclusion

PEB Internals: What the Process Environment Block Reveals and Why Defenders Care

1. What Is the PEB?

Why This Matters for Defense

2. Key PEB Fields

BeingDebugged (offset 0x02)

Ldr (offset 0x18) - PEB_LDR_DATA

ProcessParameters (offset 0x20) - RTL_USER_PROCESS_PARAMETERS

NtGlobalFlag (offset 0x68 on x86, 0xBC on x64)

3. PEB-Based Module Enumeration

The Walk

Why Malware Does This

How to Catch It

4. PEB Manipulation Techniques and Detection

Module Unlinking

ImagePathName Spoofing

BeingDebugged / NtGlobalFlag Clearing

CommandLine Modification

5. PEB and the Heap

Heap-Based Debug Detection

6. PEB Across Windows Versions

Practical Implication

7. Defensive Tooling Using PEB Analysis

Integrity Monitoring

EDR Integration

Forensic Analysis

Conclusion

2. Why Does `VMPTRLD` Fail Right After Allocation?

4. `VM_EXIT_REASON` vs `VM_INSTRUCTION_ERROR`

`VM_EXIT_REASON`

`VM_INSTRUCTION_ERROR`

`VMfailInvalid` vs `VMfailValid`

Common `VM_INSTRUCTION_ERROR` Codes

`VM_EXIT_INTR_INFO`

Page Faults (`#PF`, vector 14)

General Protection Faults (`#GP`, vector 13)

Invalid Opcode (`#UD`, vector 6)

`BeingDebugged` (offset 0x02)

`Ldr` (offset 0x18) - PEB_LDR_DATA

`ProcessParameters` (offset 0x20) - RTL_USER_PROCESS_PARAMETERS

`NtGlobalFlag` (offset 0x68 on x86, 0xBC on x64)