1bit.systems

Why no Python at runtime? (Rule A)

One-line answer: Python is fine as a caller or a build-time tool. It's banned inside any systemd service, any HTTP endpoint, any hot-path process. We've been burned too many times by dependency hell, silent import failures, and 300 MB interpreter images shipped to run a 4 GB model.

Scope

Banned at runtime: anything that's the target of a systemd unit, anything serving an HTTP request, anything on the startup path of the stack.

Allowed elsewhere:

What we tried first

Gen-1 of 1bit systems was Python (AMD Lemonade + transformers + a custom FastAPI layer). It was the natural starting point — Microsoft's reference is Python, MLX is Python, lemonade is Python, every BitNet community project is Python.

What we hit:

  1. Cold-start latency — 4-12 seconds to import torch + transformers + numpy before the first inference. On a service meant to serve LAN requests, that's a restart outage. Rust 1bit-server cold-start is ~200 ms.
  2. Dependency churnpip install resolvers pick different versions day to day. One morning the Lemonade server wouldn't start because a transitive dep bumped past a compatible minor. Rust cargo's lockfile prevents this class of bug.
  3. Silent failure modes — Python typing is advisory. JSON parsing fails at the first bad field, which may be six levels deep in an error the user sees as a 500. Rust serde fails at deserialization with a precise path.
  4. Multiple Pythons — the dev box had python3.12, the install script pulled python3.11, the AUR package expected python3.13. Three stacks. Three venv roots. Three reasons the service won't start on a fresh box.
  5. Binary size — the gen-1 install weighed 1.2 GB of Python + wheels. Gen-2 Rust is 15 MB of static-ish binaries.
  6. Global interpreter lock — Python's concurrency story is either subprocess (slow) or asyncio (complicated, doesn't parallelize CPU-bound code). Rust tokio handles 1000+ concurrent HTTP connections on one thread.

The immediate wins of the Rust rewrite

But what about speed of development?

Genuinely, Python's write-time is faster. We accept ~1.5× the implementation time in exchange for ~10× the deploy reliability. For a stack that's going to run unattended in a closet for months, the trade favors Rust.

"What about the kernels?"

Kernels are C++. Rule B, not Rule A. C++ at the kernel layer is specifically blessed because:

"What about the Python requantizer?"

Build-time, not runtime. requantize-h1b.py reads Microsoft's .safetensors, writes our .h1b format, exits. Runs once per model release. Never shipped with the binary, never called by a service.

If we ever needed a runtime requantize (which we don't — the output is cached to disk), we'd port to Rust before running it under a systemd unit.

Enforcement

Citations