gb-emu: Building a Game Boy CPU Emulator

gb-emu is an emulator for the original Game Boy that implements the LR35902 CPU and memory bus but stops short of the PPU. The PPU is the hardest component in the system — everyone gets stuck there — and I wanted to write up what did work rather than waiting until the whole thing is finished. A working CPU with a cycle-accurate memory bus is already enough to load, decode, and step through commercial ROMs up to the point they start asking for video memory.

The LR35902 Instruction Set

The LR35902 is a hybrid of the Intel 8080 and the Z80, with a slightly different register file and a unique instruction encoding. The key differences from a Z80:

The instruction decoder is a giant switch over the first opcode byte, with a second switch for 0xCB-prefixed instructions. Each case maps an opcode to its mnemonic, operand width, and cycle count. The implementation maps an opcode to a function pointer at init time so dispatch is an indirect call rather than a nested switch — 200kB of function table, but the CPU only touches the entries it needs.

typedef void (*opcode_fn)(lr35902_t *cpu);
static opcode_fn dispatch[512]; // 256 main + 256 cb-prefixed

void lr35902_step(lr35902_t *cpu) {
    uint8_t op = bus_read(cpu->bus, cpu->pc++);
    dispatch[op](cpu);
}

Memory Bus

The Game Boy addresses 64kB of memory split into regions:

The bus struct contains pointers to each region and the MBC state machine. Reads and writes route through a single bus_read/bus_write pair that maps the address to the correct backing store. MBC1, MBC3, and MBC5 cartridge controllers are handled by swapping the ROM and RAM bank pointers when the CPU writes to the magic addresses (0x2000–0x3FFF range).

Where I Stopped: The PPU

The Game Boy PPU runs on a pixel pipeline that produces a 160x144 frame 59.7 times per second. It has four modes (HBlank, VBlank, OAM search, pixel transfer) driven by a 4MHz dot clock, reads from VRAM and OAM, applies window and sprite priority, and mixes four shades of green. Getting the timing even slightly wrong produces visual garbage.

I implemented the mode state machine and the LCD status register (0xFF40), but pixel compositing — correctly handling sprite priority, window positioning, and the exact cycle counts for each mode — is where I hit the wall. Rather than ship a half-baked renderer, I decided to write up the parts that are solid.

Lessons

The CPU and memory bus were straightforward because they’re deterministic and well-documented. The instruction set is small (roughly 250 unique opcodes) and every operation reduces to register moves, ALU ops, and memory access. The PPU is where determinism meets analogue — exact timing matters, and emulation bugs produce subtly wrong output that’s hard to distinguish from correct output without reference frame data.

I still plan to finish the PPU. But the CPU and bus are complete, tested with the Blargg test ROMs, and that’s worth writing about on its own.

Have a comment on this article? Send me an email.

The Segfault Garden

Lu

frgmntedflower@linux.com