CompanionCube32: DLL Injection, VTable Hooking, and a Custom Overlay

CompanionCube32 is a game overlay companion that injects into a running process and renders custom UI on top of its output. It uses three techniques that build on each other: DLL injection to get code running inside the target process, virtual table hooking to intercept rendering calls, and a custom overlay drawn via DirectX.

DLL Injection

The entry point is CreateRemoteThread targeting LoadLibraryW. The injector process opens a handle to the target with PROCESS_ALL_ACCESS, writes the DLL path into remote memory via VirtualAllocEx/WriteProcessMemory, and spawns a remote thread that loads the DLL. Error handling is minimal but effective — if CreateRemoteThread fails we fall back to SetWindowsHookEx with a WH_GETMESSAGE hook that loads the DLL on the next message pump iteration. This second path is slower but works on terminals where CreateRemoteThread is blocked.

HANDLE hThread = CreateRemoteThread(hProcess, NULL, 0,
    (LPTHREAD_START_ROUTINE)LoadLibraryW,
    pRemotePath, 0, NULL);
if (!hThread) {
    // fallback: WH_GETMESSAGE hook
    SetWindowsHookEx(WH_GETMESSAGE, hookProc, hDll, dwThreadId);
}

VTable Hooking

Once the DLL lives inside the target, we need to intercept the game’s rendering pipeline. The approach is standard vtable patching: locate the virtual function table of the D3DDevice or SwapChain object, change Protect to PAGE_READWRITE, swap the Present or DrawIndexed entry with our trampoline, and restore protection.

DWORD oldProtect;
VirtualProtect(vtable, sizeof(void*), PAGE_READWRITE, &oldProtect);
originalPresent = (PresentFn)vtable[index];
vtable[index] = hookPresent;
VirtualProtect(vtable, sizeof(void*), oldProtect, &oldProtect);

The hook function receives the device pointer and can chain back to the original. We keep it minimal — store the device pointer, call through to the original, then trigger our overlay draw.

The Overlay

Rendering happens through a separate DirectX 11 device and immediate context created on a dedicated thread. The overlay uses its own swap chain with DXGI_SWAP_CHAIN_FLAG_OVERLAY where supported, or renders into a screen-space aligned quad via the hooked device. Text rendering goes through ID3D11DeviceContext::Draw with a prebuilt font atlas stored as a shader resource view.

The overlay thread synchronises with the hook via a std::atomic<bool> that flips when the hooked Present returns, ensuring we never draw over a partially swapped frame.

Results

The injector loads in under 10ms, the hook resolves the vtable in roughly a microsecond, and the overlay runs at the target’s native framerate with negligible overhead. The whole pipeline — inject, hook, render — fits in about 600 lines of C++ and has held up across a handful of DirectX 11 titles.

Have a comment on this article? Send me an email.

The Segfault Garden

Lu

frgmntedflower@linux.com