docs · decisions · integrations · benchmarks
the 1bit monster docs
Plain-English explanations of the architectural calls we made — one page per decision, citations where they exist. Rendered from the docs/wiki/ source in the 1bit-systems monorepo.
decisions
Why we built it this way, in order of load-bearing-ness.
Why 1.58-bit ternary?the weight format and what ternary buys us
Why Rust above + C++ below?the language split, hard-lined at the HIP boundary
Why Strix Halo (gfx1151)?the APU bet — 128 GB shared LPDDR5 at 256 GB/s
Why shadow-burnin?continuous parity check, cutover with evidence
Why our own .h1b format?tilt vs GGUF — mmap-first, ternary-native
Why Caddy + systemd?the ops layer, no containers, no interpreters
Why 1bit-agents?self-maintaining mesh of 17 specialists
Why no Python at runtime?Rule A — binaries boot in 200 ms, not 10 s
Why parity gates?PPL + byte-exact before we cut over
Why no NPU yet?XDNA 2 status and the wait for Linux
Why halo-power?the RyzenAdj-based power CLI
Why this way + how?long-form walkthrough, end-to-end path
integrations + plans
Hermes agentNous Research external client
AMD GAIALemonade-compat gateway on :8200
Medusa headsspeculative decoding plan
peak projection~280 tok/s, NPU prefill crossover at L≥33
whisper streamingSTT partials plan
NPU kernel designPeano C++ + libxrt + aie-rt
ternary on AIEpacking plan for XDNA 2
CPU lane plansampler + dispatcher off the iGPU