1bit.systems

AMD AI/ML SDK Tools Scan

Scope: catalog AMD-published AI/ML/inference/quantization/model-serving SDKs from amd.com/en/developer/browse-by-resource-type/software-tools.html and adjacent AMD pages. FPGA-only (Vitis/Versal) excluded — another agent owns that lane. Filter applied: Rule A (no Python at runtime), Rule B (C++ for kernels, Rust above).

AMD's top-level "Software Tools" page itself is mostly a tile-grid marketing hub (WebFetch was denied; content reconstructed from cited AMD sub-pages and GitHub). The real signal lives on the product sub-pages. Catalogue below is deduplicated to distinct tools (libraries that are just sub-libraries of ROCm rolled up).

Catalogue

namepurposelicenseRule A safe?Strix Halo today?worth evaluating?
ROCm 7 (umbrella)GPU compute stack (HIP, LLVM, clang, runtime)MIT-ish / openYesYes (gfx1151)Already in use
AITERROCm "AI Tensor Engine" — fused FA/MLA/MoE kernelsMIT (ROCm/aiter)C++/HIP core, Py dispatchInstinct-tuned; gfx1151 untestedSteal (kernels only)
Composable Kernel (CK)Header-only C++ GEMM/conv kernel templatesMITYes (pure C++)gfx1151 builds (~1h)Steal — fused ternary GEMV candidate
MIOpencuDNN-analog (conv/rnn/batchnorm)MITYesYesSkip — we bypass it
MIGraphXGraph-mode inference runtime (ONNX/TF in)MITYes (C++ core)YesSkip — no ternary op
hipBLAS / hipBLASLtBLAS façade, MXFP8/MXFP4 pathMITYesYesBanned (CLAUDE.md Rule C)
rocBLAS / TensileGEMM kernels under hipBLASMITYesYesSkip (dragged in by ban)
rocFFT / rocSPARSE / rocRAND / rocSOLVER / rocPRIM / rocThrust / rocWMMA / hipCUB / RCCLMath/parallel primitivesMITYesYesAvailable if needed; not core
AMD QuarkModel quantization toolkit (PTQ/QAT, ONNX/PyTorch)MITNo — PythonN/A (dev-box only)Watch — offline compile OK
ZenDNN / ZenDNNLZen CPU DNN kernels (BF16, INT8, exp INT4)Apache-2.0 (fork of oneDNN)Yes (C++ lib)CPU lane on Strix Halo Zen5Steal — CPU fallback + llama.cpp backend
AOCL (BLIS, libFLAME, FFTW, Sparse, libm)CPU mathMixed: BLIS/libFLAME Apache-ish, some closed EULAMostly yesZen5 optimizedEvaluate for non-AI host math only
AOCL-DLPCPU deep-learning primitives (low-prec GEMM)AMD EULA (not open)C lib, Rule-A safe at runtimeZen5Watch — closed licence is a negative
ROCm TritonPython DSL → GPU kernelsMITNo — Python compilegfx1151 (patchy)Skip (Rule A + GEAK is Py too)
GEAK-OptimAgent / GEAK-OpenEvolveLLM-agent Triton kernel auto-tunerResearchPythonMI-class onlySkip
ROCprofiler-SDK + rocpdSQLite-backed profiler replacing rocprofMITYes (C API)gfx1151 fork by woct0rdhoSteal — upgrade path from rocprof
AMD SMIGPU telemetry CLI + C libMITYesYesAvailable
AMD uProfCPU/GPU sampling profiler (IBS, PMC)Freeware (proprietary)Yes (binary)Zen5 + Instinct onlyEvaluate — no gfx1151 GPU path
Ryzen AI Software 1.7.1NPU SDK (ONNX + Vitis AI EP)ProprietaryNo — Python wrappersSTX-H unsupportedAlready rejected
MIVisionX / RPPCV graph + preprocessingMITYesYesNot AI/LLM — skip
Infinity HubCatalog of Docker containersN/AContainers violate bare-metal ruleInstinct-orientedSkip
AMD AI Developer PortalWebsite / learning hubN/AN/AN/AReference only
Adrenalin AI BundleDesktop installer (Ollama + PyTorch + ComfyUI)Proprietary shellNoRadeon-dGPUSkip

Count: 22 distinct tools / SDK families catalogued after dedup.

Verdicts

Steal (adopt now, fills a real gap)

  1. Composable Kernel (CK)ROCm/composable_kernel. Header-only C++ template

kernels; gfx1151 target already compiles. Candidate backbone for a fused ternary-GEMV + Sherry 3:4 unpack kernel, staying inside Rule A/B.

  1. AITERROCm/aiter. Reference implementations of Flash-Decoding, MLA,

MoE dispatch. Even though the Python dispatcher violates Rule A, the HIP kernel sources are MIT and directly portable into our Rust-driven loader.

  1. ZenDNN 5.2 — ships a llama.cpp backend and INT4 experimental path. Drop-in

CPU lane for Strix Halo Zen5 cores when iGPU is saturated or during fallback. Apache-2.0 fork of oneDNN; pure C++; zero Python at runtime.

Watch (not ready but roadmap-relevant)

  1. AMD Quark — only AMD quantizer with real LLM INT4 / UINT4 + AutoSearch.

Python-only, so we'd treat it as an offline compile tool, never runtime. Watch for a ternary / 1.58-bit codepath (currently none). MIT licensed.

  1. ROCprofiler-SDK + rocpd — successor to the rocprof we already use. SQLite

output is a big win for kernel-level regression tracking on bitnet_decode. Community fork has the gfx1151 PC-sampling patch (woct0rdho/rocm-systems).

Skip (closed, Python-only runtime, or already owned)

Notes on source quality

AMD's software-tools.html returned as a permission-blocked WebFetch and the AI Developer Portal (March 2026 launch) is mostly a tile-grid. The real catalogue lives at rocm.docs.amd.com/.../api-libraries.html and in the individual GitHub orgs (ROCm/*, amd/Quark, amd/ZenDNN). Nothing surprising surfaced that isn't already in ROCm or a named Zen/Ryzen SDK — AMD is not hiding a secret low-precision library.