the kvm that wasn't there

2026-05-18

the kvm that wasn’t there

I couldn’t use /dev/kvm on my Linux box. Nested virtualization isn’t available on my setup. So I built a fake one.

brood-box, which I forked from Stacklok, uses libkrun under the hood. libkrun talks to KVM. No KVM, no guests. I needed a way to develop without it.

the seam

libkrun calls open("/dev/kvm"), then a handful of ioctls: KVM_CREATE_VM, KVM_CREATE_VCPU, eventually KVM_RUN. All through libc.

libc calls are interceptable. LD_PRELOAD lets you load a library before any others. Export a function named open and yours gets called instead of libc’s. Inspect the path, decide whether to handle it, fall through for everything else.

This is normally for memory debuggers. Not for impersonating a kernel subsystem. But every KVM interaction goes through about a dozen libc functions. The seam is narrow. The question is whether you can answer them plausibly enough.

step 0: is it even fast enough

Before writing a line of emulation, I needed to know if QEMU’s TCG could boot an ARM64 kernel fast enough for dev use.

qemu-smoke — 150 lines of Go. Downloads Alpine’s ARM64 netboot kernel, launches qemu-system-aarch64 -M virt -accel tcg, times it to userspace.

About 3-8 seconds on same-arch. 10-40 seconds cross-arch. Fine for starting a VM and SSHing in for an hour. Not production. Wasn’t building production.

step 1: what does libkrun actually call

KVM has dozens of ioctls. I didn’t know which ones libkrun used. I could read the source — and did — but I wanted ground truth.

kvm-tracer — an LD_PRELOAD library that intercepts open, close, ioctl, mmap. When it sees a fd associated with /dev/kvm, it logs to stderr as JSON:

{"t":0.123,"fn":"open","fd":3,"flags":524290}
{"t":0.124,"fn":"ioctl","fd":3,"ioctl":"KVM_GET_API_VERSION","ret":-1,"errno":25}
{"t":0.126,"fn":"ioctl","fd":3,"ioctl":"KVM_CREATE_VM","ret":-1,"errno":25}

Everything returns -1 since /dev/kvm is missing, but the call sequence is preserved. Ran brood-box under it once, captured the trace. The surface was about a dozen ioctls. That’s the spec.

step 2: the emulator

kvm-emu doesn’t just log calls. It answers them.

It tracks VM state — memory regions, vCPU counts, run sizes — in Mutex<HashMap>. Allocates fake fds from a counter starting at 100. No real file descriptors needed. KVM_CREATE_VM → fake fd, recorded. KVM_SET_USER_MEMORY_REGION → recorded. KVM_ARM_PREFERRED_TARGET → GENERIC_V8. KVM_GET_VCPU_MMAP_SIZE → 12288.

Most of these are one-liners. KVM_RUN is where it gets interesting.

Real KVM’s KVM_RUN is a blocking ioctl that executes guest code until something interesting happens (MMIO, interrupt), then returns to userspace. libkrun handles device emulation between runs. QEMU TCG can’t do this — QEMU does its own device emulation internally.

So I didn’t try. KVM_RUN launches qemu-system-aarch64 as a child:

qemu-system-aarch64 \
  -M virt -accel tcg -cpu cortex-a57 \
  -kernel vmlinuz-lts -initrd initramfs \
  -netdev user,id=net0,hostfwd=tcp::2222-:22 \
  -device virtio-net-pci,netdev=net0 -nographic

Kernel boots in a few seconds, sshd starts, QEMU forwards port 22 to host 2222. libkrun connects via SSH. From brood-box’s perspective, nothing is unusual. It opened /dev/kvm, created a VM, called KVM_RUN, and a guest booted.

KVM_RUN blocks until the guest shuts down instead of returning per-exit. For coding agent sessions where you SSH in, work, and exit — this works. The whole illusion is a few hundred lines of Rust and a QEMU process.

the numbers

kvm-tracer:  ~400 lines Rust — the diagnostic
kvm-emu:     ~500 lines Rust — the emulator (lib.rs + kvm_types.rs + qemu.rs)
qemu-smoke:  ~150 lines Go   — the validation
e2e tests:   ~150 lines Rust — verifies ioctl responses

Under 3,000 lines total. A weekend.

what it feels like

$ QEMU_KERNEL=~/vmlinuz-lts \
  LD_PRELOAD=./target/release/libkvm_emu.so \
  bbox gemini

kvm-emu: launching QEMU (512 MiB, 2 vCPUs, ssh port 2222)
[  ...  ] Booting Linux
Starting sshd... done.

=== bbox session ===
$

Same tool, same workflow. Just a different path under the hood.

limitations

This is Phase 1b. Works for my use case. Has gaps:

KVM_RUN blocks until shutdown. Real KVM returns per-exit so libkrun can handle device emulation between runs. My emulator launches QEMU once. Works because SSH is the only interface. If libkrun starts using per-exit features, the emulator would need to grow.
Single QEMU instance. Multiple vCPUs share one process. Fine for single-vCPU guests.
ARM64 only. Hardcoded qemu-system-aarch64 and cortex-a57. x86 would need switching the QEMU binary and handling the x86 ioctl surface (larger, more tedious, same idea).
No live migration. Guest state stays in QEMU. Not captured back to kvm_run.

These aren’t bugs. The emulator does what I need and nothing more. That’s the point.

why

There’s a pattern here I keep finding myself in: locate the seam, sit on it, make it work, move on.

The KVM ioctl interface is a seam. It’s stable, documented, and narrow. libkrun talks to the kernel through it. If you can answer the calls plausibly, you don’t need a kernel. You just need something that quacks like one.

Same instinct as every mock, stub, shim, and adapter. Find the interface. Intercept it. Get unstuck.

I needed to develop a tool that requires KVM on a machine without it. This lets me. It’s not production-grade. Doesn’t need to be.