<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">

  <title>Articles tagged c at The Segfault Garden</title>
  <link rel="alternate" type="text/html"
        href="https://blog.segv.page/tags/c/"/>
  <link rel="self" type="application/atom+xml"
        href="https://blog.segv.page/tags/c/feed/"/>
  <updated>2026-06-02T01:40:41Z</updated>
  <id>urn:uuid:0ba7b921-5597-4fbc-8ceb-88afb378c637</id>

  <author>
    <name>Lu</name>
    <uri>https://blog.segv.page</uri>
    <email>frgmntedflower@linux.com</email>
  </author>

  
    
  <entry>
    <title>peachykeen32: Bare-Metal ARM Userspace Programming</title>
    <link rel="alternate" type="text/html" href="https://blog.segv.page/blog/2026/06/01/peachykeen32-bare-metal-ARM-userspace-programming/"/>
    <id>urn:uuid:1a9b8c3d-5e7f-4a62-b0d4-6c8e1f2a9d70</id>
    <updated>2026-06-01T02:00:00Z</updated>
    <category term="c"/><category term="arm"/><category term="linux"/>
    <content type="html">
      <![CDATA[<p>peachykeen32 is a bare-metal ARM32 userspace runtime — no libc, no crt,
no linker scripts. Everything runs directly on top of the Linux kernel
through syscalls. What started as a curiosity about what the minimum
viable userspace looks like turned into a toolkit with a handful of
genuinely useful commands.</p>

<h2 id="the-runtime">The Runtime</h2>

<p>The entry point is not <code class="language-plaintext highlighter-rouge">_start</code> but a hand-written assembly trampoline
that zeroes .bss, sets up the stack pointer from <code class="language-plaintext highlighter-rouge">AT_RANDOM</code> in the
auxiliary vector, and calls <code class="language-plaintext highlighter-rouge">main</code>. The trampoline is small enough to
inline in the binary — 16 bytes of ARM32 instructions.</p>

<pre><code class="language-asm">.globl _start
_start:
    ldr sp, =stack_top
    bl main
    mov r7, #1
    svc #0
</code></pre>

<p>All I/O goes through <code class="language-plaintext highlighter-rouge">svc #0</code> with the syscall number in <code class="language-plaintext highlighter-rouge">r7</code>. The runtime
provides thin wrappers for <code class="language-plaintext highlighter-rouge">read</code>, <code class="language-plaintext highlighter-rouge">write</code>, <code class="language-plaintext highlighter-rouge">exit</code>, <code class="language-plaintext highlighter-rouge">mmap</code>, <code class="language-plaintext highlighter-rouge">open</code>, and
<code class="language-plaintext highlighter-rouge">close</code>. No buffering, no errno — just the raw kernel ABI.</p>

<h2 id="commands">Commands</h2>

<h3 id="cat"><code class="language-plaintext highlighter-rouge">cat</code></h3>

<p>Reads a file via <code class="language-plaintext highlighter-rouge">open</code>/<code class="language-plaintext highlighter-rouge">read</code>/<code class="language-plaintext highlighter-rouge">write</code> and dumps it to stdout. The buffer
is 4kB (one page), allocated with <code class="language-plaintext highlighter-rouge">mmap</code> at startup and reused across
calls. Error handling checks the return value of <code class="language-plaintext highlighter-rouge">open</code> and prints a
syscall-based error message.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span> <span class="nf">cmd_cat</span><span class="p">(</span><span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">path</span><span class="p">)</span> <span class="p">{</span>
    <span class="kt">long</span> <span class="n">fd</span> <span class="o">=</span> <span class="n">sys_open</span><span class="p">(</span><span class="n">path</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">fd</span> <span class="o">&lt;</span> <span class="mi">0</span><span class="p">)</span> <span class="k">return</span> <span class="mi">1</span><span class="p">;</span>
    <span class="kt">long</span> <span class="n">n</span><span class="p">;</span>
    <span class="k">while</span> <span class="p">((</span><span class="n">n</span> <span class="o">=</span> <span class="n">sys_read</span><span class="p">(</span><span class="n">fd</span><span class="p">,</span> <span class="n">buf</span><span class="p">,</span> <span class="mi">4096</span><span class="p">))</span> <span class="o">&gt;</span> <span class="mi">0</span><span class="p">)</span>
        <span class="n">sys_write</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">buf</span><span class="p">,</span> <span class="n">n</span><span class="p">);</span>
    <span class="n">sys_close</span><span class="p">(</span><span class="n">fd</span><span class="p">);</span>
    <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<h3 id="hexdump"><code class="language-plaintext highlighter-rouge">hexdump</code></h3>

<p>Like <code class="language-plaintext highlighter-rouge">cat</code> but prints a hex+ASCII side-by-side view. Each line shows the
offset, sixteen hex bytes, and the printable characters. The implementation
reuses <code class="language-plaintext highlighter-rouge">cat</code>’s read loop and formats directly into the write buffer to
avoid a second copy.</p>

<h3 id="sysinfo"><code class="language-plaintext highlighter-rouge">sysinfo</code></h3>

<p>Calls <code class="language-plaintext highlighter-rouge">sys_sysinfo()</code> and prints the result: uptime, total RAM, free RAM,
process count, and load averages. The <code class="language-plaintext highlighter-rouge">sysinfo</code> struct is defined locally
since there’s no <code class="language-plaintext highlighter-rouge">&lt;sys/sysinfo.h&gt;</code>.</p>

<h3 id="ls"><code class="language-plaintext highlighter-rouge">ls</code></h3>

<p>Reads a directory via <code class="language-plaintext highlighter-rouge">open</code> (<code class="language-plaintext highlighter-rouge">O_RDONLY|O_DIRECTORY</code>) and <code class="language-plaintext highlighter-rouge">getdents64</code>.
The <code class="language-plaintext highlighter-rouge">struct linux_dirent64</code> is defined manually; the command iterates
entries, skips <code class="language-plaintext highlighter-rouge">.</code> and <code class="language-plaintext highlighter-rouge">..</code>, and prints each name. This one touches more
of the kernel ABI than the others — <code class="language-plaintext highlighter-rouge">getdents64</code> is a less common syscall
that most libc-free experiments overlook.</p>

<h2 id="why-this-works">Why This Works</h2>

<p>Each command is a self-contained demonstration of a specific kernel
interface exercised from a minimal runtime. Together they show that a
useful userspace — file I/O, directory traversal, system introspection —
needs nothing more than the syscall interface and a willingness to define
your own struct layouts. The entire runtime is ~300 lines of C and asm,
and each command is under 50 lines. There is no startup overhead, no
dynamic linker, no PT_INTERP segment.</p>

]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>gb-emu: Building a Game Boy CPU Emulator</title>
    <link rel="alternate" type="text/html" href="https://blog.segv.page/blog/2026/06/01/gb-emu-building-a-game-boy-cpu-emulator/"/>
    <id>urn:uuid:72f8d1c4-9e6b-4a0d-b573-8c1f5a2e7d90</id>
    <updated>2026-06-01T02:00:00Z</updated>
    <category term="c"/><category term="emulation"/>
    <content type="html">
      <![CDATA[<p>gb-emu is an emulator for the original Game Boy that implements the
LR35902 CPU and memory bus but stops short of the PPU. The PPU is the
hardest component in the system — everyone gets stuck there — and I wanted
to write up what <em>did</em> work rather than waiting until the whole thing is
finished. A working CPU with a cycle-accurate memory bus is already enough
to load, decode, and step through commercial ROMs up to the point they
start asking for video memory.</p>

<h2 id="the-lr35902-instruction-set">The LR35902 Instruction Set</h2>

<p>The LR35902 is a hybrid of the Intel 8080 and the Z80, with a slightly
different register file and a unique instruction encoding. The key
differences from a Z80:</p>

<ul>
  <li>No <code class="language-plaintext highlighter-rouge">IX</code>, <code class="language-plaintext highlighter-rouge">IY</code> index registers; instead we have the standard AF/BC/DE/HL
set plus a 16-bit stack pointer SP and program counter PC.</li>
  <li>The <code class="language-plaintext highlighter-rouge">LD</code> family covers most data movement; <code class="language-plaintext highlighter-rouge">LDH</code> is a fast 8-bit port load.</li>
  <li>Interrupt handling uses the IE/IF registers at 0xFFFF/0xFF0F.</li>
</ul>

<p>The instruction decoder is a giant switch over the first opcode byte, with
a second switch for <code class="language-plaintext highlighter-rouge">0xCB</code>-prefixed instructions. Each case maps an opcode
to its mnemonic, operand width, and cycle count. The implementation maps an
opcode to a function pointer at init time so dispatch is an indirect call
rather than a nested switch — 200kB of function table, but the CPU only
touches the entries it needs.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">typedef</span> <span class="nf">void</span> <span class="p">(</span><span class="o">*</span><span class="n">opcode_fn</span><span class="p">)(</span><span class="n">lr35902_t</span> <span class="o">*</span><span class="n">cpu</span><span class="p">);</span>
<span class="k">static</span> <span class="n">opcode_fn</span> <span class="n">dispatch</span><span class="p">[</span><span class="mi">512</span><span class="p">];</span> <span class="c1">// 256 main + 256 cb-prefixed</span>

<span class="kt">void</span> <span class="nf">lr35902_step</span><span class="p">(</span><span class="n">lr35902_t</span> <span class="o">*</span><span class="n">cpu</span><span class="p">)</span> <span class="p">{</span>
    <span class="kt">uint8_t</span> <span class="n">op</span> <span class="o">=</span> <span class="n">bus_read</span><span class="p">(</span><span class="n">cpu</span><span class="o">-&gt;</span><span class="n">bus</span><span class="p">,</span> <span class="n">cpu</span><span class="o">-&gt;</span><span class="n">pc</span><span class="o">++</span><span class="p">);</span>
    <span class="n">dispatch</span><span class="p">[</span><span class="n">op</span><span class="p">](</span><span class="n">cpu</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<h2 id="memory-bus">Memory Bus</h2>

<p>The Game Boy addresses 64kB of memory split into regions:</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">0x0000–0x3FFF</code>: ROM bank 0 (16kB fixed)</li>
  <li><code class="language-plaintext highlighter-rouge">0x4000–0x7FFF</code>: ROM bank N (16kB switchable via MBC)</li>
  <li><code class="language-plaintext highlighter-rouge">0x8000–0x9FFF</code>: VRAM (8kB)</li>
  <li><code class="language-plaintext highlighter-rouge">0xA000–0xBFFF</code>: external RAM (8kB, battery-backed in cartridges)</li>
  <li><code class="language-plaintext highlighter-rouge">0xC000–0xDFFF</code>: WRAM (8kB)</li>
  <li><code class="language-plaintext highlighter-rouge">0xE000–0xFDFF</code>: echo RAM (mirrors WRAM, rarely used)</li>
  <li><code class="language-plaintext highlighter-rouge">0xFE00–0xFE9F</code>: OAM (sprite data)</li>
  <li><code class="language-plaintext highlighter-rouge">0xFF00–0xFF7F</code>: I/O registers</li>
  <li><code class="language-plaintext highlighter-rouge">0xFF80–0xFFFE</code>: HRAM (high RAM)</li>
  <li><code class="language-plaintext highlighter-rouge">0xFFFF</code>: interrupt enable register</li>
</ul>

<p>The bus struct contains pointers to each region and the MBC state machine.
Reads and writes route through a single <code class="language-plaintext highlighter-rouge">bus_read</code>/<code class="language-plaintext highlighter-rouge">bus_write</code> pair that
maps the address to the correct backing store. MBC1, MBC3, and MBC5
cartridge controllers are handled by swapping the ROM and RAM bank pointers
when the CPU writes to the magic addresses (0x2000–0x3FFF range).</p>

<h2 id="where-i-stopped-the-ppu">Where I Stopped: The PPU</h2>

<p>The Game Boy PPU runs on a pixel pipeline that produces a 160x144 frame 59.7
times per second. It has four modes (HBlank, VBlank, OAM search, pixel
transfer) driven by a 4MHz dot clock, reads from VRAM and OAM, applies
window and sprite priority, and mixes four shades of green. Getting the
timing even slightly wrong produces visual garbage.</p>

<p>I implemented the mode state machine and the LCD status register (<code class="language-plaintext highlighter-rouge">0xFF40</code>),
but pixel compositing — correctly handling sprite priority, window
positioning, and the exact cycle counts for each mode — is where I hit the
wall. Rather than ship a half-baked renderer, I decided to write up the
parts that are solid.</p>

<h2 id="lessons">Lessons</h2>

<p>The CPU and memory bus were straightforward because they’re deterministic
and well-documented. The instruction set is small (roughly 250 unique
opcodes) and every operation reduces to register moves, ALU ops, and
memory access. The PPU is where determinism meets analogue — exact timing
matters, and emulation bugs produce subtly wrong output that’s hard to
distinguish from correct output without reference frame data.</p>

<p>I still plan to finish the PPU. But the CPU and bus are complete, tested
with the Blargg test ROMs, and that’s worth writing about on its own.</p>

]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>comfy-lang: A Compiler Pipeline for ARM32</title>
    <link rel="alternate" type="text/html" href="https://blog.segv.page/blog/2026/06/01/comfy-lang-a-compiler-pipeline-for-ARM32/"/>
    <id>urn:uuid:8d4f1a7c-3e6b-4c90-b2d5-9a0e7f8c1d23</id>
    <updated>2026-06-01T02:00:00Z</updated>
    <category term="c"/><category term="compilers"/><category term="arm"/>
    <content type="html">
      <![CDATA[<p>comfy-lang is a small compiled language targeting ARM32 Linux. The
front-end produces an AST, the middle-end lowers it through a series of
tree rewrites, and the back-end emits ARM32 machine code directly — no
assembler, no linker. The compiler is about 2000 lines of C and the
interesting parts that actually work are the ARM32 code generator and the
syscall wrappers.</p>

<h2 id="front-end">Front-End</h2>

<p>The lexer and parser are a single-pass recursive descent parser. The
grammar is expression-oriented — everything returns a value, including
blocks and conditionals. Types are inferred bottom-up; there is no type
checker pass, just a <code class="language-plaintext highlighter-rouge">typeof()</code> that walks the AST at parse time.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">typedef</span> <span class="k">enum</span> <span class="p">{</span>
    <span class="n">NODE_INT</span><span class="p">,</span> <span class="n">NODE_IDENT</span><span class="p">,</span> <span class="n">NODE_BINOP</span><span class="p">,</span>
    <span class="n">NODE_BLOCK</span><span class="p">,</span> <span class="n">NODE_CALL</span><span class="p">,</span> <span class="n">NODE_IF</span><span class="p">,</span>
<span class="p">}</span> <span class="n">node_kind</span><span class="p">;</span>

<span class="k">typedef</span> <span class="k">struct</span> <span class="n">node</span> <span class="p">{</span>
    <span class="n">node_kind</span> <span class="n">kind</span><span class="p">;</span>
    <span class="k">struct</span> <span class="n">node</span> <span class="o">*</span><span class="n">kids</span><span class="p">[</span><span class="mi">3</span><span class="p">];</span>
    <span class="kt">int</span> <span class="n">ival</span><span class="p">;</span>
    <span class="kt">char</span> <span class="o">*</span><span class="n">sval</span><span class="p">;</span>
<span class="p">}</span> <span class="n">node</span><span class="p">;</span>
</code></pre></div></div>

<h2 id="middle-end-rewrites">Middle-End: Rewrites</h2>

<p>The middle-end applies a fixed set of tree rewrites before codegen:
constant folding, dead-branch elimination, and tail-call identification.
Each rewrite is a recursive walk that returns a (possibly new) node.
The passes are cheap enough that we run them to fixpoint — typically two
iterations exhaust all opportunities.</p>

<h2 id="arm32-code-generation">ARM32 Code Generation</h2>

<p>The back-end walks the AST and emits instructions into a fixed-size buffer.
Registers are allocated on the fly with a simple linear-scan scheme: the
first four available registers (<code class="language-plaintext highlighter-rouge">r0</code>–<code class="language-plaintext highlighter-rouge">r3</code>) serve as the working set, and
anything beyond that spills to the stack.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="kt">void</span> <span class="nf">emit</span><span class="p">(</span><span class="n">node</span> <span class="o">*</span><span class="n">n</span><span class="p">)</span> <span class="p">{</span>
    <span class="k">switch</span> <span class="p">(</span><span class="n">n</span><span class="o">-&gt;</span><span class="n">kind</span><span class="p">)</span> <span class="p">{</span>
    <span class="k">case</span> <span class="n">NODE_INT</span><span class="p">:</span>
        <span class="n">emit_mov_imm</span><span class="p">(</span><span class="n">n</span><span class="o">-&gt;</span><span class="n">ival</span><span class="p">);</span>
        <span class="k">break</span><span class="p">;</span>
    <span class="k">case</span> <span class="n">NODE_BINOP</span><span class="p">:</span>
        <span class="n">emit</span><span class="p">(</span><span class="n">n</span><span class="o">-&gt;</span><span class="n">kids</span><span class="p">[</span><span class="mi">0</span><span class="p">]);</span>
        <span class="n">push</span><span class="p">();</span>
        <span class="n">emit</span><span class="p">(</span><span class="n">n</span><span class="o">-&gt;</span><span class="n">kids</span><span class="p">[</span><span class="mi">1</span><span class="p">]);</span>
        <span class="n">pop_into_r1</span><span class="p">();</span>
        <span class="k">switch</span> <span class="p">(</span><span class="n">n</span><span class="o">-&gt;</span><span class="n">ival</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">case</span> <span class="sc">'+'</span><span class="p">:</span> <span class="n">emit_add</span><span class="p">();</span> <span class="k">break</span><span class="p">;</span>
        <span class="k">case</span> <span class="sc">'-'</span><span class="p">:</span> <span class="n">emit_sub</span><span class="p">();</span> <span class="k">break</span><span class="p">;</span>
        <span class="k">case</span> <span class="sc">'*'</span><span class="p">:</span> <span class="n">emit_mul</span><span class="p">();</span> <span class="k">break</span><span class="p">;</span>
        <span class="p">}</span>
        <span class="k">break</span><span class="p">;</span>
    <span class="k">case</span> <span class="n">NODE_CALL</span><span class="p">:</span>
        <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">n</span><span class="o">-&gt;</span><span class="n">kids</span><span class="p">[</span><span class="n">i</span><span class="p">];</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span>
            <span class="n">emit</span><span class="p">(</span><span class="n">n</span><span class="o">-&gt;</span><span class="n">kids</span><span class="p">[</span><span class="n">i</span><span class="p">]);</span>
        <span class="n">emit_bl</span><span class="p">(</span><span class="n">n</span><span class="o">-&gt;</span><span class="n">sval</span><span class="p">);</span>
        <span class="k">break</span><span class="p">;</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Each <code class="language-plaintext highlighter-rouge">emit_*</code> function writes 2–4 bytes into the buffer and advances the
cursor. The final buffer is a valid <code class="language-plaintext highlighter-rouge">.text</code> segment that can be <code class="language-plaintext highlighter-rouge">mprotect</code>ed
to <code class="language-plaintext highlighter-rouge">PROT_EXEC</code> and called directly.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">uint8_t</span> <span class="o">*</span><span class="n">buf</span> <span class="o">=</span> <span class="n">mmap</span><span class="p">(</span><span class="nb">NULL</span><span class="p">,</span> <span class="mi">4096</span><span class="p">,</span> <span class="n">PROT_READ</span><span class="o">|</span><span class="n">PROT_WRITE</span><span class="p">,</span>
                     <span class="n">MAP_PRIVATE</span><span class="o">|</span><span class="n">MAP_ANONYMOUS</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
<span class="c1">// ... emit into buf ...</span>
<span class="n">mprotect</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span> <span class="mi">4096</span><span class="p">,</span> <span class="n">PROT_EXEC</span><span class="p">);</span>
<span class="kt">int</span> <span class="p">(</span><span class="o">*</span><span class="n">fn</span><span class="p">)(</span><span class="kt">int</span><span class="p">)</span> <span class="o">=</span> <span class="p">(</span><span class="kt">int</span> <span class="p">(</span><span class="o">*</span><span class="p">)(</span><span class="kt">int</span><span class="p">))</span><span class="n">buf</span><span class="p">;</span>
<span class="n">fn</span><span class="p">(</span><span class="mi">42</span><span class="p">);</span>
</code></pre></div></div>

<h2 id="syscall-wrappers">Syscall Wrappers</h2>

<p>The runtime library replaces the C standard library with direct Linux
syscalls via <code class="language-plaintext highlighter-rouge">SVC</code> instructions. Each wrapper follows the AAPCS calling
convention — arguments in <code class="language-plaintext highlighter-rouge">r0</code>–<code class="language-plaintext highlighter-rouge">r3</code>, syscall number in <code class="language-plaintext highlighter-rouge">r7</code>, <code class="language-plaintext highlighter-rouge">SVC #0</code>
to trap, return value in <code class="language-plaintext highlighter-rouge">r0</code>.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// _exit(0)  →  mov r0, #0; mov r7, #1; svc #0</span>
<span class="k">static</span> <span class="kt">void</span> <span class="nf">emit_syscall</span><span class="p">(</span><span class="kt">int</span> <span class="n">n</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">emit_mov_imm</span><span class="p">(</span><span class="mi">7</span><span class="p">,</span> <span class="n">n</span><span class="p">);</span>   <span class="c1">// r7 = syscall number</span>
    <span class="n">emit_svc</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span>           <span class="c1">// svc #0</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The linker stub resolves symbolic calls like <code class="language-plaintext highlighter-rouge">print_int</code> to the
corresponding syscall (<code class="language-plaintext highlighter-rouge">write</code> to stdout), so user code never needs to
know the syscall table.</p>

<h2 id="status">Status</h2>

<p>The front-end handles the core language. The ARM32 code generator produces
correct output for integer arithmetic, conditionals, and function calls.
The syscall wrappers cover <code class="language-plaintext highlighter-rouge">exit</code>, <code class="language-plaintext highlighter-rouge">write</code>, <code class="language-plaintext highlighter-rouge">read</code>, and <code class="language-plaintext highlighter-rouge">mmap</code>. The missing
pieces — structs, floats, a proper register allocator — are standard
compiler engineering that builds on the pipeline already in place.</p>

]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>NetMoon: Raw Sockets and Network Monitoring</title>
    <link rel="alternate" type="text/html" href="https://blog.segv.page/blog/2026/06/01/NetMoon-raw-sockets-and-network-monitoring/"/>
    <id>urn:uuid:3f7b2e9a-1c5d-4a80-b6e3-8d0f9c4a1e57</id>
    <updated>2026-06-01T02:00:00Z</updated>
    <category term="c"/><category term="networking"/><category term="linux"/>
    <content type="html">
      <![CDATA[<p>NetMoon is a network monitoring tool built on Linux raw sockets. It
captures packets, parses TCP headers, and presents connection-level
metrics in real time. The implementation is about 700 lines of C and
demonstrates how far you can get with nothing more than a well-chosen
syscall and a couple of struct definitions.</p>

<h2 id="raw-socket-setup">Raw Socket Setup</h2>

<p>Raw sockets on Linux require <code class="language-plaintext highlighter-rouge">CAP_NET_RAW</code> (or root). The call is
straightforward — <code class="language-plaintext highlighter-rouge">socket(AF_PACKET, SOCK_RAW, htons(ETH_P_IP))</code> —
which delivers every IP frame that reaches the interface.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span> <span class="n">sock</span> <span class="o">=</span> <span class="n">socket</span><span class="p">(</span><span class="n">AF_PACKET</span><span class="p">,</span> <span class="n">SOCK_RAW</span><span class="p">,</span> <span class="n">htons</span><span class="p">(</span><span class="n">ETH_P_IP</span><span class="p">));</span>
<span class="k">if</span> <span class="p">(</span><span class="n">sock</span> <span class="o">&lt;</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">perror</span><span class="p">(</span><span class="s">"socket"</span><span class="p">);</span>
    <span class="k">return</span> <span class="mi">1</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The socket is placed into promiscuous mode via <code class="language-plaintext highlighter-rouge">PACKET_ADD_MEMBERSHIP</code>
with <code class="language-plaintext highlighter-rouge">PACKET_MR_PROMISC</code>. This tells the NIC to forward all frames, not
just those addressed to our MAC, so we see traffic from other hosts on the
same broadcast domain.</p>

<h2 id="packet-capture-loop">Packet Capture Loop</h2>

<p>The capture thread reads from the raw socket into a fixed 64kB buffer and
hands the buffer to a parser running in a second thread. The split keeps
the capture side lossless — if the parser lags, the kernel buffer fills
and drops packets in the NIC ring, not in userspace.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for</span> <span class="p">(;;)</span> <span class="p">{</span>
    <span class="kt">ssize_t</span> <span class="n">n</span> <span class="o">=</span> <span class="n">recvfrom</span><span class="p">(</span><span class="n">sock</span><span class="p">,</span> <span class="n">buf</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">buf</span><span class="p">),</span> <span class="mi">0</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">n</span> <span class="o">&lt;</span> <span class="mi">0</span><span class="p">)</span> <span class="k">break</span><span class="p">;</span>
    <span class="n">parse_frame</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span> <span class="n">n</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<h2 id="tcp-header-parsing">TCP Header Parsing</h2>

<p><code class="language-plaintext highlighter-rouge">parse_frame</code> walks the protocol stack: Ethernet header → IP header → TCP
header. Each step checks the relevant length field before advancing.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">parse_frame</span><span class="p">(</span><span class="kt">uint8_t</span> <span class="o">*</span><span class="n">buf</span><span class="p">,</span> <span class="kt">size_t</span> <span class="n">len</span><span class="p">)</span> <span class="p">{</span>
    <span class="k">struct</span> <span class="n">ethhdr</span> <span class="o">*</span><span class="n">eth</span> <span class="o">=</span> <span class="p">(</span><span class="k">struct</span> <span class="n">ethhdr</span> <span class="o">*</span><span class="p">)</span><span class="n">buf</span><span class="p">;</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">ntohs</span><span class="p">(</span><span class="n">eth</span><span class="o">-&gt;</span><span class="n">h_proto</span><span class="p">)</span> <span class="o">!=</span> <span class="n">ETH_P_IP</span><span class="p">)</span> <span class="k">return</span><span class="p">;</span>

    <span class="k">struct</span> <span class="n">iphdr</span> <span class="o">*</span><span class="n">ip</span> <span class="o">=</span> <span class="p">(</span><span class="k">struct</span> <span class="n">iphdr</span> <span class="o">*</span><span class="p">)(</span><span class="n">buf</span> <span class="o">+</span> <span class="k">sizeof</span><span class="p">(</span><span class="k">struct</span> <span class="n">ethhdr</span><span class="p">));</span>
    <span class="kt">size_t</span> <span class="n">ip_hlen</span> <span class="o">=</span> <span class="n">ip</span><span class="o">-&gt;</span><span class="n">ihl</span> <span class="o">*</span> <span class="mi">4</span><span class="p">;</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">ip</span><span class="o">-&gt;</span><span class="n">protocol</span> <span class="o">!=</span> <span class="n">IPPROTO_TCP</span><span class="p">)</span> <span class="k">return</span><span class="p">;</span>

    <span class="k">struct</span> <span class="n">tcphdr</span> <span class="o">*</span><span class="n">tcp</span> <span class="o">=</span> <span class="p">(</span><span class="k">struct</span> <span class="n">tcphdr</span> <span class="o">*</span><span class="p">)((</span><span class="kt">uint8_t</span> <span class="o">*</span><span class="p">)</span><span class="n">ip</span> <span class="o">+</span> <span class="n">ip_hlen</span><span class="p">);</span>
    <span class="c1">// extract src_port, dst_port, seq, ack, flags</span>
    <span class="c1">// update connection table</span>
<span class="p">}</span>
</code></pre></div></div>

<h2 id="connection-tracking">Connection Tracking</h2>

<p>The parser maintains a hash table of active connections keyed by the 4-tuple
<code class="language-plaintext highlighter-rouge">(src_ip, src_port, dst_ip, dst_port)</code>. Each entry tracks byte counts,
packet counts, the current TCP state (from flags), and a rough RTT measured
from SYN/SYN-ACK timing. Expired entries — those with no activity for 60
seconds — are evicted on every tenth iteration to keep the table bounded.</p>

<h2 id="real-time-display">Real-Time Display</h2>

<p>A curses-based UI refreshes the connection table once per second, printing
per-connection bandwidth as a bar chart and flags as human-readable state:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>192.168.1.20:44012 → 10.0.0.1:443  (ESTABLISHED)  1.2 MB   ████████░░
192.168.1.20:44013 → 10.0.0.1:443  (ESTABLISHED)  340 kB   ██░░░░░░░░
192.168.1.30:22   → 10.0.0.2:53041 (ESTABLISHED)  4.1 MB   ██████████
</code></pre></div></div>

<p>Packet capture at 60% with TCP header parsing already gives a useful
picture of what crosses the wire. The missing pieces — reassembly, deeper
protocol dissection, and a filter language — are natural extensions once
the core loop is solid.</p>

]]>
    </content>
  </entry>
    
  
    
  <entry>
    <title>MilkyLogger: A Minimalist Logging Mechanism</title>
    <link rel="alternate" type="text/html" href="https://blog.segv.page/blog/2026/06/01/MilkyLogger-a-minimalist-logging-mechanism/"/>
    <id>urn:uuid:5c9a8e2b-7f4d-4b16-a803-d91e6c5f2a74</id>
    <updated>2026-06-01T02:00:00Z</updated>
    <category term="c"/>
    <content type="html">
      <![CDATA[<p>MilkyLogger is a logging library built around a single idea: a lock-free
ring buffer shared between a producer (the application) and a consumer (a
dedicated writer thread). There is no format string parsing at the call
site, no dynamic allocation in the hot path, and no configuration files.
The mechanism is the interesting part, not the feature set.</p>

<h2 id="the-ring-buffer">The Ring Buffer</h2>

<p>The ring buffer is a fixed-size array of <code class="language-plaintext highlighter-rouge">log_entry</code> structs, an atomic
write index, and an atomic read index. Producers increment the write index
(modulo capacity) to claim a slot, copy their data in, and advance.
Consumers read entries strictly behind the write cursor.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">log_entry</span> <span class="p">{</span>
    <span class="kt">uint64_t</span> <span class="n">timestamp</span><span class="p">;</span>
    <span class="kt">uint8_t</span>  <span class="n">level</span><span class="p">;</span>
    <span class="kt">uint8_t</span>  <span class="n">len</span><span class="p">;</span>
    <span class="kt">char</span>     <span class="n">data</span><span class="p">[</span><span class="n">LOG_MAX_ENTRY</span><span class="p">];</span>
<span class="p">};</span>

<span class="k">struct</span> <span class="n">ring</span> <span class="p">{</span>
    <span class="k">struct</span> <span class="n">log_entry</span> <span class="n">buf</span><span class="p">[</span><span class="n">LOG_RING_SIZE</span><span class="p">];</span>
    <span class="k">_Atomic</span> <span class="kt">uint32_t</span> <span class="n">head</span><span class="p">;</span>
    <span class="k">_Atomic</span> <span class="kt">uint32_t</span> <span class="n">tail</span><span class="p">;</span>
<span class="p">};</span>
</code></pre></div></div>

<p>The head and tail are both monotonic counters that never wrap — only the
index into the buffer wraps. This avoids the ABA problem entirely and
makes the single-producer single-consumer case lock-free with just
<code class="language-plaintext highlighter-rouge">atomic_store</code> / <code class="language-plaintext highlighter-rouge">atomic_load</code> on release / acquire ordering.</p>

<h2 id="producer-path">Producer Path</h2>

<p>A call to <code class="language-plaintext highlighter-rouge">log_info("message")</code> expands to a macro that computes the
timestamp via <code class="language-plaintext highlighter-rouge">clock_gettime(CLOCK_MONOTONIC)</code>, copies the string literal
into the claimed entry, and stores the length. No formatting, no heap —
the macro ensures the string is a compile-time constant so it lands in
.rodata and the copy is a fixed <code class="language-plaintext highlighter-rouge">memcpy</code>.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#define log_info(msg) do {                           \
    uint32_t slot = atomic_fetch_add(&amp;ring.head, 1); \
    struct log_entry *e = &amp;ring.buf[slot &amp; MASK];    \
    e-&gt;timestamp = get_ns();                         \
    e-&gt;level     = LOG_INFO;                         \
    e-&gt;len       = sizeof(msg);                      \
    memcpy(e-&gt;data, msg, sizeof(msg));               \
    atomic_store(&amp;ring.head_vis, slot + 1);          \
} while (0)
</span></code></pre></div></div>

<p>The store to <code class="language-plaintext highlighter-rouge">head_vis</code> is the commit; the consumer can safely read any
slot with index less than <code class="language-plaintext highlighter-rouge">head_vis</code>.</p>

<h2 id="consumer-thread">Consumer Thread</h2>

<p>The writer thread spins on <code class="language-plaintext highlighter-rouge">tail &lt; head_vis</code>, drains entries into a file
descriptor with a single <code class="language-plaintext highlighter-rouge">writev</code> (one iovec per field), and advances.
Flushing is implicit — the consumer runs at a fixed priority and yields
after emptying the ring.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">logger_thread</span><span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="n">arg</span><span class="p">)</span> <span class="p">{</span>
    <span class="k">struct</span> <span class="n">ring</span> <span class="o">*</span><span class="n">r</span> <span class="o">=</span> <span class="n">arg</span><span class="p">;</span>
    <span class="k">while</span> <span class="p">(</span><span class="n">r</span><span class="o">-&gt;</span><span class="n">running</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">while</span> <span class="p">(</span><span class="n">r</span><span class="o">-&gt;</span><span class="n">tail</span> <span class="o">&lt;</span> <span class="n">atomic_load</span><span class="p">(</span><span class="o">&amp;</span><span class="n">r</span><span class="o">-&gt;</span><span class="n">head_vis</span><span class="p">))</span> <span class="p">{</span>
            <span class="kt">uint32_t</span> <span class="n">idx</span> <span class="o">=</span> <span class="n">r</span><span class="o">-&gt;</span><span class="n">tail</span> <span class="o">&amp;</span> <span class="n">MASK</span><span class="p">;</span>
            <span class="k">struct</span> <span class="n">log_entry</span> <span class="o">*</span><span class="n">e</span> <span class="o">=</span> <span class="o">&amp;</span><span class="n">r</span><span class="o">-&gt;</span><span class="n">buf</span><span class="p">[</span><span class="n">idx</span><span class="p">];</span>
            <span class="n">write_log_entry</span><span class="p">(</span><span class="n">e</span><span class="p">);</span>
            <span class="n">r</span><span class="o">-&gt;</span><span class="n">tail</span><span class="o">++</span><span class="p">;</span>
        <span class="p">}</span>
        <span class="n">thrd_yield</span><span class="p">();</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<h2 id="why-this-works">Why This Works</h2>

<p>The design makes three trade-offs. First, fixed-size entries: if a message
exceeds <code class="language-plaintext highlighter-rouge">LOG_MAX_ENTRY</code> it’s truncated silently. In practice the macro
catches this at compile time. Second, the ring is static — no growth,
which means under extreme load old entries are overwritten. Third, the
consumer is single-threaded, so the file descriptor never contends.</p>

<p>The result is a logger with a producer cost of roughly a dozen nanoseconds
and zero allocations, suitable for real-time audio callbacks or inner
game loops where <code class="language-plaintext highlighter-rouge">fprintf</code> would cause audible stutter.</p>

]]>
    </content>
  </entry>
    
  
    
  

</feed>
