the kvm that wasn't there
the kvm that wasn’t there
I couldn’t use /dev/kvm on my Linux box. Nested
virtualization isn’t available on my setup. So I built a fake one.
brood-box, which I forked from Stacklok, uses libkrun under the hood. libkrun talks to KVM. No KVM, no guests. I needed a way to develop without it.
the seam
libkrun calls open("/dev/kvm"), then a handful of
ioctls: KVM_CREATE_VM, KVM_CREATE_VCPU,
eventually KVM_RUN. All through libc.
libc calls are interceptable. LD_PRELOAD lets you load a
library before any others. Export a function named open and
yours gets called instead of libc’s. Inspect the path, decide whether to
handle it, fall through for everything else.
This is normally for memory debuggers. Not for impersonating a kernel subsystem. But every KVM interaction goes through about a dozen libc functions. The seam is narrow. The question is whether you can answer them plausibly enough.
step 0: is it even fast enough
Before writing a line of emulation, I needed to know if QEMU’s TCG could boot an ARM64 kernel fast enough for dev use.
qemu-smoke
— 150 lines of Go. Downloads Alpine’s ARM64 netboot kernel, launches
qemu-system-aarch64 -M virt -accel tcg, times it to
userspace.
About 3-8 seconds on same-arch. 10-40 seconds cross-arch. Fine for starting a VM and SSHing in for an hour. Not production. Wasn’t building production.
step 1: what does libkrun actually call
KVM has dozens of ioctls. I didn’t know which ones libkrun used. I could read the source — and did — but I wanted ground truth.
kvm-tracer
— an LD_PRELOAD library that intercepts open,
close, ioctl, mmap. When it sees
a fd associated with /dev/kvm, it logs to stderr as
JSON:
{"t":0.123,"fn":"open","fd":3,"flags":524290}
{"t":0.124,"fn":"ioctl","fd":3,"ioctl":"KVM_GET_API_VERSION","ret":-1,"errno":25}
{"t":0.126,"fn":"ioctl","fd":3,"ioctl":"KVM_CREATE_VM","ret":-1,"errno":25}Everything returns -1 since /dev/kvm is missing, but the
call sequence is preserved. Ran brood-box under it once, captured the
trace. The surface was about a dozen ioctls. That’s the spec.
step 2: the emulator
kvm-emu doesn’t just log calls. It answers them.
It tracks VM state — memory regions, vCPU counts, run sizes — in
Mutex<HashMap>. Allocates fake fds from a counter
starting at 100. No real file descriptors needed.
KVM_CREATE_VM → fake fd, recorded.
KVM_SET_USER_MEMORY_REGION → recorded.
KVM_ARM_PREFERRED_TARGET → GENERIC_V8.
KVM_GET_VCPU_MMAP_SIZE → 12288.
Most of these are one-liners. KVM_RUN is where it gets
interesting.
Real KVM’s KVM_RUN is a blocking ioctl that executes
guest code until something interesting happens (MMIO, interrupt), then
returns to userspace. libkrun handles device emulation between runs.
QEMU TCG can’t do this — QEMU does its own device emulation
internally.
So I didn’t try. KVM_RUN launches
qemu-system-aarch64 as a child:
qemu-system-aarch64 \
-M virt -accel tcg -cpu cortex-a57 \
-kernel vmlinuz-lts -initrd initramfs \
-netdev user,id=net0,hostfwd=tcp::2222-:22 \
-device virtio-net-pci,netdev=net0 -nographic
Kernel boots in a few seconds, sshd starts, QEMU forwards port 22 to
host 2222. libkrun connects via SSH. From brood-box’s perspective,
nothing is unusual. It opened /dev/kvm, created a VM,
called KVM_RUN, and a guest booted.
KVM_RUN blocks until the guest shuts down instead of
returning per-exit. For coding agent sessions where you SSH in, work,
and exit — this works. The whole illusion is a few hundred lines of Rust
and a QEMU process.
the numbers
kvm-tracer: ~400 lines Rust — the diagnostic
kvm-emu: ~500 lines Rust — the emulator (lib.rs + kvm_types.rs + qemu.rs)
qemu-smoke: ~150 lines Go — the validation
e2e tests: ~150 lines Rust — verifies ioctl responses
Under 3,000 lines total. A weekend.
what it feels like
$ QEMU_KERNEL=~/vmlinuz-lts \
LD_PRELOAD=./target/release/libkvm_emu.so \
bbox gemini
kvm-emu: launching QEMU (512 MiB, 2 vCPUs, ssh port 2222)
[ ... ] Booting Linux
Starting sshd... done.
=== bbox session ===
$
Same tool, same workflow. Just a different path under the hood.
limitations
This is Phase 1b. Works for my use case. Has gaps:
KVM_RUNblocks until shutdown. Real KVM returns per-exit so libkrun can handle device emulation between runs. My emulator launches QEMU once. Works because SSH is the only interface. If libkrun starts using per-exit features, the emulator would need to grow.Single QEMU instance. Multiple vCPUs share one process. Fine for single-vCPU guests.
ARM64 only. Hardcoded
qemu-system-aarch64andcortex-a57. x86 would need switching the QEMU binary and handling the x86 ioctl surface (larger, more tedious, same idea).No live migration. Guest state stays in QEMU. Not captured back to
kvm_run.
These aren’t bugs. The emulator does what I need and nothing more. That’s the point.
why
There’s a pattern here I keep finding myself in: locate the seam, sit on it, make it work, move on.
The KVM ioctl interface is a seam. It’s stable, documented, and narrow. libkrun talks to the kernel through it. If you can answer the calls plausibly, you don’t need a kernel. You just need something that quacks like one.
Same instinct as every mock, stub, shim, and adapter. Find the interface. Intercept it. Get unstuck.
I needed to develop a tool that requires KVM on a machine without it. This lets me. It’s not production-grade. Doesn’t need to be.