Radeon 8060S
gfx1151, RDNA 3.5. Native HIP ternary GEMV at 92% of LPDDR5 peak. Our kernels, no hipBLAS.
AMD Ryzen AI MAX+ 395 is a CPU + iGPU + NPU sharing 128 GB LPDDR5 at 256 GB/s. Most benchmark it like a laptop and miss what it is. We run ternary BitNet-b1.58 on native HIP kernels above our own C++ — 83 tok/s decode today, ~280 tok/s on the roadmap, sub-second voice mouth-to-ear. No datacenter GPU. No subscription. One box, one drive, one mesh.
Target ceiling: ~280 tok/s decode · NPU prefill crossover at 33-token prompts · 256 GB/s LPDDR5 wall. how we get there →
Unified CPU + iGPU + NPU on one 128 GB LPDDR5 pool at 256 GB/s. No PCIe tax, no host-device copy, no discrete-card bottleneck. Most people miss the point when they benchmark the iGPU against a discrete card — the whole box is the accelerator, not one tile of it.
gfx1151, RDNA 3.5. Native HIP ternary GEMV at 92% of LPDDR5 peak. Our kernels, no hipBLAS.
Lemonade 10.2 Linux NPU shipped 2026-04-20 (Q4 today, ternary→INT8 soon). Target: prefill offload.
Sitting idle while iGPU grinds. Move sampler + dispatcher + tokenizer off iGPU critical path. +5-15% tok/s without a kernel change.
Last burnin pass 2026-04-20 · 14,344 rounds · wikitext-103 · post-idx=7-tiebreak.
curl -fsSL https://1bit.systems/install.sh | bash
Prefer git? Clone the monorepo and run ./install.sh directly.
Fourteen crates, one monorepo, one job each. Rust above the HIP line, C++ below. No Python in the hot path. No container overhead. Everything runs as a user-systemd unit.
OpenAI-compat HTTP. /v1/chat/completions with SSE. Bearer-gated via Caddy. 2.4 MB binary.
Backend dispatcher: iGPU / NPU / CPU. HALO_BACKEND=xdna wires FLM subprocess for NPU prefill.
FFI into rocm-cpp HIP kernels. 92%-LPDDR5-peak ternary GEMV lives on the other side.
FFI into libxrt for XDNA 2 NPU. FastFlowLM subprocess bridge; ternary-INT8 mapping pending AMD.
Sentence-boundary streaming TTS. 1.23 s first audio. 17 unit tests.
WebSocket voice server, Opus 20 ms frames. Cancellation path on disconnect. 11 tests.
STT via whisper.cpp on gfx1151. Streaming partials on roadmap.
TTS, 24 kHz mono WAV, onnxruntime. af_sky / am_michael / bf_emma / bm_george.
Stdio MCP bridge. 19 tools: 17 specialists + skill_manage + memory_manage.
17 typed specialists. 1bit-watch-discord + 1bit-watch-github keep the fleet online.
1bit status · doctor · install · skill · memory · npu · burnin · power. One command, whole stack.
LAN dashboard on :8190. Live metrics, /_live/stats SSE.
OpenAI-compat gateway on :8200. Hermes + AMD GAIA interop.
Native desktop pane (egui, Rust). No Electron, no web-view.
Pure parsers. mmap, zero I/O beyond the file handle. Only crate with zero deps beyond std.