CompanionCube32: DLL Injection, VTable Hooking, and a Custom Overlay
CompanionCube32 is a game overlay companion that injects into a running process and renders custom UI on top of its output. It uses three techniques that build on each other: DLL injection to get code running inside the target process, virtual table hooking to intercept rendering calls, and a custom overlay drawn via DirectX.
DLL Injection
The entry point is CreateRemoteThread targeting LoadLibraryW. The
injector process opens a handle to the target with PROCESS_ALL_ACCESS,
writes the DLL path into remote memory via VirtualAllocEx/WriteProcessMemory,
and spawns a remote thread that loads the DLL. Error handling is minimal but
effective — if CreateRemoteThread fails we fall back to SetWindowsHookEx
with a WH_GETMESSAGE hook that loads the DLL on the next message pump
iteration. This second path is slower but works on terminals where
CreateRemoteThread is blocked.
HANDLE hThread = CreateRemoteThread(hProcess, NULL, 0,
(LPTHREAD_START_ROUTINE)LoadLibraryW,
pRemotePath, 0, NULL);
if (!hThread) {
// fallback: WH_GETMESSAGE hook
SetWindowsHookEx(WH_GETMESSAGE, hookProc, hDll, dwThreadId);
}
VTable Hooking
Once the DLL lives inside the target, we need to intercept the game’s
rendering pipeline. The approach is standard vtable patching: locate the
virtual function table of the D3DDevice or SwapChain object, change
Protect to PAGE_READWRITE, swap the Present or DrawIndexed entry
with our trampoline, and restore protection.
DWORD oldProtect;
VirtualProtect(vtable, sizeof(void*), PAGE_READWRITE, &oldProtect);
originalPresent = (PresentFn)vtable[index];
vtable[index] = hookPresent;
VirtualProtect(vtable, sizeof(void*), oldProtect, &oldProtect);
The hook function receives the device pointer and can chain back to the original. We keep it minimal — store the device pointer, call through to the original, then trigger our overlay draw.
The Overlay
Rendering happens through a separate DirectX 11 device and immediate
context created on a dedicated thread. The overlay uses its own swap chain
with DXGI_SWAP_CHAIN_FLAG_OVERLAY where supported, or renders into a
screen-space aligned quad via the hooked device. Text rendering goes
through ID3D11DeviceContext::Draw with a prebuilt font atlas stored as a
shader resource view.
The overlay thread synchronises with the hook via a std::atomic<bool>
that flips when the hooked Present returns, ensuring we never draw over a
partially swapped frame.
Results
The injector loads in under 10ms, the hook resolves the vtable in roughly a microsecond, and the overlay runs at the target’s native framerate with negligible overhead. The whole pipeline — inject, hook, render — fits in about 600 lines of C++ and has held up across a handful of DirectX 11 titles.
Have a comment on this article? Send me an email.