docs · decisions · integrations · benchmarks

the 1bit monster docs

Plain-English explanations of the architectural calls we made — one page per decision, citations where they exist. Rendered from the docs/wiki/ source in the 1bit-systems monorepo.

decisions

Why we built it this way, in order of load-bearing-ness.

Why 1.58-bit ternary?the weight format and what ternary buys us Why Rust above + C++ below?the language split, hard-lined at the HIP boundary Why Strix Halo (gfx1151)?the APU bet — 128 GB shared LPDDR5 at 256 GB/s Why shadow-burnin?continuous parity check, cutover with evidence Why our own .h1b format?tilt vs GGUF — mmap-first, ternary-native Why Caddy + systemd?the ops layer, no containers, no interpreters Why 1bit-agents?self-maintaining mesh of 17 specialists Why no Python at runtime?Rule A — binaries boot in 200 ms, not 10 s Why parity gates?PPL + byte-exact before we cut over Why no NPU yet?XDNA 2 status and the wait for Linux Why halo-power?the RyzenAdj-based power CLI Why this way + how?long-form walkthrough, end-to-end path

integrations + plans

Hermes agentNous Research external client AMD GAIALemonade-compat gateway on :8200 Medusa headsspeculative decoding plan peak projection~280 tok/s, NPU prefill crossover at L≥33 whisper streamingSTT partials plan NPU kernel designPeano C++ + libxrt + aie-rt ternary on AIEpacking plan for XDNA 2 CPU lane plansampler + dispatcher off the iGPU

reference

Wiki homefull index, as rendered from Home.md FAQshort answers to common questions Benchmarkslive tok/s, PPL, what we measure