1bit monster · strix halo unlocked · bare metal · zero cloud

Wake the 1bit monster sleeping in your desktop.

AMD Ryzen AI MAX+ 395 is a CPU + iGPU + NPU sharing 128 GB LPDDR5 at 256 GB/s. Most benchmark it like a laptop and miss what it is. We run ternary BitNet-b1.58 on native HIP kernels above our own C++ — 83 tok/s decode today, ~280 tok/s on the roadmap, sub-second voice mouth-to-ear. No datacenter GPU. No subscription. One box, one drive, one mesh.

Target ceiling: ~280 tok/s decode · NPU prefill crossover at 33-token prompts · 256 GB/s LPDDR5 wall. how we get there →

The APU play

Unified CPU + iGPU + NPU on one 128 GB LPDDR5 pool at 256 GB/s. No PCIe tax, no host-device copy, no discrete-card bottleneck. Most people miss the point when they benchmark the iGPU against a discrete card — the whole box is the accelerator, not one tile of it.

iGPU — shipping

Radeon 8060S

gfx1151, RDNA 3.5. Native HIP ternary GEMV at 92% of LPDDR5 peak. Our kernels, no hipBLAS.

NPU — evaluating

XDNA 2, 50 TOPS

Lemonade 10.2 Linux NPU shipped 2026-04-20 (Q4 today, ternary→INT8 soon). Target: prefill offload.

CPU — next

16 Zen5 cores

Sitting idle while iGPU grinds. Move sampler + dispatcher + tokenizer off iGPU critical path. +5-15% tok/s without a kernel change.

Honest numbers · live

Last burnin pass 2026-04-20 · 14,344 rounds · wikitext-103 · post-idx=7-tiebreak.

83
tok/s decode @64
33
tok/s decode @1024
+0.02
PPL Δ vs baseline
~98.9%
byte-exact burn-in
92%
of LPDDR5 peak
6.78×
attn @2048 vs prior
1.23 s
first audio (TTS)
269
tests · 0 fail
2.4 MB
server binary
128 GB
shared LPDDR5

Install

curl -fsSL https://1bit.systems/install.sh | bash
  • First token in ~60 seconds on a primed Strix Halo box.
  • No sudo password leaves the machine. Script runs unprivileged, prompts locally.
  • Idempotent — rerun anytime to repair or upgrade.

Prefer git? Clone the monorepo and run ./install.sh directly.

The 1bit family

Fourteen crates, one monorepo, one job each. Rust above the HIP line, C++ below. No Python in the hot path. No container overhead. Everything runs as a user-systemd unit.

1bit-server

OpenAI-compat HTTP. /v1/chat/completions with SSE. Bearer-gated via Caddy. 2.4 MB binary.

1bit-router

Backend dispatcher: iGPU / NPU / CPU. HALO_BACKEND=xdna wires FLM subprocess for NPU prefill.

1bit-hip

FFI into rocm-cpp HIP kernels. 92%-LPDDR5-peak ternary GEMV lives on the other side.

1bit-xdna

FFI into libxrt for XDNA 2 NPU. FastFlowLM subprocess bridge; ternary-INT8 mapping pending AMD.

1bit-voice

Sentence-boundary streaming TTS. 1.23 s first audio. 17 unit tests.

1bit-echo

WebSocket voice server, Opus 20 ms frames. Cancellation path on disconnect. 11 tests.

1bit-whisper

STT via whisper.cpp on gfx1151. Streaming partials on roadmap.

1bit-kokoro

TTS, 24 kHz mono WAV, onnxruntime. af_sky / am_michael / bf_emma / bm_george.

1bit-mcp

Stdio MCP bridge. 19 tools: 17 specialists + skill_manage + memory_manage.

1bit-agents

17 typed specialists. 1bit-watch-discord + 1bit-watch-github keep the fleet online.

1bit-cli

1bit status · doctor · install · skill · memory · npu · burnin · power. One command, whole stack.

1bit-landing

LAN dashboard on :8190. Live metrics, /_live/stats SSE.

1bit-lemonade

OpenAI-compat gateway on :8200. Hermes + AMD GAIA interop.

1bit-helm

Native desktop pane (egui, Rust). No Electron, no web-view.

1bit-core

Pure parsers. mmap, zero I/O beyond the file handle. Only crate with zero deps beyond std.