1bit-systems wiki
Plain-English explanations of the architectural calls we made — one page per decision, citations where they exist.
Decisions
- Why 1.58-bit ternary? — the weight format
- Why Rust above + C++ below? — the language split
- Why Strix Halo (gfx1151)? — the hardware target
- Why shadow-burnin? — the cutover discipline
- Why our own
.h1bformat? — vs GGUF - Why Caddy + systemd? — the ops layer
- Why 1bit-agents? — the self-maintaining mesh
- Why no Python at runtime? — Rule A
- Why shadow-traffic parity gates? — cutover criteria
- Why no NPU yet? — XDNA 2 status + ONNX/FastFlowLM/IREE evaluation
- Why this way + how? — long-form walkthrough of all the decisions + end-to-end request path
Integrations
- Hermes Agent integration — Nous Research's agent as external client on 1bit-server; feature-port list for 1bit-agents
Benchmarks + proof
- Live tok/s + PPL — what we measure, what it means
- FAQ — short answers to common questions
Pointers
- Architectural data-flow:
../../ARCHITECTURE.md - Cutover runbook:
../../CUTOVER.md - Demo script:
../../DEMO.md - Contributing:
../../CONTRIBUTING.md - Repo conventions for AI agents:
../../CLAUDE.md