1bit.systems

AMD ROCm AI Dev Hub — Scan for 1-bit BitNet on Strix Halo

Date: 2026-04-20 Source root: <https://www.amd.com/en/developer/resources/rocm-hub/dev-ai.html> Fetch method: WebSearch (WebFetch permission denied). Link anchors verified via search snippets, not DOM walk. Scope: ROCm AI landing + top 10 sub-resources relevant to low-bit inference.

Root page inventory

The dev-ai hub surfaces five buckets: Tutorials (Jupyter), Performance Results, ROCm Docs, Infinity Hub containers, and AMD Developer Cloud / AI Dev Program. No direct BitNet / ternary references appear on the landing page; everything is Instinct-first marketing with Ryzen AI as a secondary tier.

Resources crawled

NameURLPurposeLicensePython-only?gfx1151?
AI Developer Hub rootamd.com/.../dev-ai.htmlLanding page, links to all belowmarketingn/ano
Performance Results.../dev-ai/performance-results.htmlMI300X/MI325X/MI355 benchmarksmarketingn/ano (Instinct only)
Tutorials for AI developersrocm.docs.amd.com/.../ai-developer-hub/latestJupyter notebooks hubMIT (repo)yes (ipynb)no
ROCm/gpuaidev (notebook source)github.com/ROCm/gpuaidevGit source for aboveMITyesno
AMD Quark quantizerquark.docs.amd.com/latestQuantization toolkit (FP8/MXFP4/INT4/INT3)proprietary redistributableyes (PyTorch/ONNX)no
Quark MXFP4 for vLLM tutorial.../mxfp4_quantization_quark_vllm.htmlLlama3.3-70B MXFP4 recipeMITyesno
ROCm-LLMExtgithub.com/ROCm/ROCm-LLMExt + docsLLM reference stack (train→infer→orch)MITyesnot called out
AITER (AI Tensor Engine)github.com/ROCm/aiterCentralized high-perf op library; MLA/all-gather/etc.MITPython bindings, kernels C++/Tritonno (MI300/MI350 focus)
Composable Kernel (CK)github.com/ROCm/composable_kernel (now in rocm-libraries)Templated C++ GEMM/reduction device library; pk_int4_tMITC++gfx1153 listed, gfx1151 not
Strix Halo system optimizationrocm.docs.amd.com/.../system-optimization/strixhalo.htmlTTM/GTT tuning, amd-ttm, kernel requirementsdocsn/ayes
llama.cpp on ROCm (official)rocm.docs.amd.com/projects/llama-cpp/en/docs-26.02AMD-hosted llama.cpp branch + prebuilt binaries for gfx1151MITC++yes
AI Developer Program / Cloudamd.com/.../ai-dev-program.html$100 free cloud credits (MI-class)TOSn/ano (cloud = Instinct)

Rule-A (no-Python-runtime) verdict

Service-side-safe: CK, llama.cpp on ROCm, Strix Halo optimization doc, AITER core kernels (bindings are Python but the kernels themselves compile down; vLLM/SGLang integration is still Python-glue). Service-side-blocked: Quark, gpuaidev notebooks, ROCm-LLMExt, all Jupyter tutorials. Useful for caller-side reference/design only.

Ternary / sub-byte tooling findings

gfx1151-specific mentions

Two real ones:

  1. Strix Halo system optimization page — TTM/GTT knobs, amd-ttm tool, VRAM-vs-shared guidance (keep BIOS VRAM at 0.5 GB, push TTM limit to ~100 GB). We should sanity-check our current TTM against this.
  2. AMD-hosted llama.cpp prebuilt binaries for Ubuntu 24.04 on gfx1150/gfx1151. Build flags documented: -DGGML_HIP=ON -DGGML_HIP_ROCWMMA_FATTN=ON -DAMDGPU_TARGETS=gfx1151. We already have our own HIP kernels but their Flash-Attn rocWMMA flag is worth diffing against our split-KV FD kernel.

Everything else in the dev-ai hub is MI300/MI325/MI355 or generic RDNA3.5 (gfx1150/51/52 lumped).

New-to-us list

Things not previously documented in memory:

  1. ROCm-LLMExt (ROCm/ROCm-LLMExt) — AMD's own reference LLM stack. Python/vLLM-heavy → caller-side reference only, but worth reading for recipe parity.
  2. AITER (ROCm/aiter) — centralized AMD op library; kernels land in vLLM/SGLang upstream. No gfx1151 target today but the MLA decode pattern is relevant once we scale context.
  3. amd-ttm tool + Strix Halo optimization doc — official knob for TTM/GTT page limit. Action item: verify our /etc/modprobe.d/ttm.conf matches AMD's guidance.
  4. AMD-hosted llama.cpp branch at rocm.docs.amd.com/projects/llama-cpp with gfx1151 prebuilt binaries — diff their HIP build flags vs ours.
  5. AMD Developer Cloud $100 free credits via AI Dev Program — MI-class only, useful for distillation runs if Battlemage slips.
  6. CK pk_int4_t preshuffle GEMM — confirms 4-bit is the floor AMD upstream cares about; our ternary kernel has no competition in the official tree.

Honesty notes