AMD AI/ML SDK Tools Scan

Scope: catalog AMD-published AI/ML/inference/quantization/model-serving SDKs from amd.com/en/developer/browse-by-resource-type/software-tools.html and adjacent AMD pages. FPGA-only (Vitis/Versal) excluded — another agent owns that lane. Filter applied: Rule A (no Python at runtime), Rule B (C++ for kernels, Rust above).

AMD's top-level "Software Tools" page itself is mostly a tile-grid marketing hub (WebFetch was denied; content reconstructed from cited AMD sub-pages and GitHub). The real signal lives on the product sub-pages. Catalogue below is deduplicated to distinct tools (libraries that are just sub-libraries of ROCm rolled up).

Catalogue

name	purpose	license	Rule A safe?	Strix Halo today?	worth evaluating?
ROCm 7 (umbrella)	GPU compute stack (HIP, LLVM, clang, runtime)	MIT-ish / open	Yes	Yes (gfx1151)	Already in use
AITER	ROCm "AI Tensor Engine" — fused FA/MLA/MoE kernels	MIT (ROCm/aiter)	C++/HIP core, Py dispatch	Instinct-tuned; gfx1151 untested	Steal (kernels only)
Composable Kernel (CK)	Header-only C++ GEMM/conv kernel templates	MIT	Yes (pure C++)	gfx1151 builds (~1h)	Steal — fused ternary GEMV candidate
MIOpen	cuDNN-analog (conv/rnn/batchnorm)	MIT	Yes	Yes	Skip — we bypass it
MIGraphX	Graph-mode inference runtime (ONNX/TF in)	MIT	Yes (C++ core)	Yes	Skip — no ternary op
hipBLAS / hipBLASLt	BLAS façade, MXFP8/MXFP4 path	MIT	Yes	Yes	Banned (CLAUDE.md Rule C)
rocBLAS / Tensile	GEMM kernels under hipBLAS	MIT	Yes	Yes	Skip (dragged in by ban)
rocFFT / rocSPARSE / rocRAND / rocSOLVER / rocPRIM / rocThrust / rocWMMA / hipCUB / RCCL	Math/parallel primitives	MIT	Yes	Yes	Available if needed; not core
AMD Quark	Model quantization toolkit (PTQ/QAT, ONNX/PyTorch)	MIT	No — Python	N/A (dev-box only)	Watch — offline compile OK
ZenDNN / ZenDNNL	Zen CPU DNN kernels (BF16, INT8, exp INT4)	Apache-2.0 (fork of oneDNN)	Yes (C++ lib)	CPU lane on Strix Halo Zen5	Steal — CPU fallback + llama.cpp backend
AOCL (BLIS, libFLAME, FFTW, Sparse, libm)	CPU math	Mixed: BLIS/libFLAME Apache-ish, some closed EULA	Mostly yes	Zen5 optimized	Evaluate for non-AI host math only
AOCL-DLP	CPU deep-learning primitives (low-prec GEMM)	AMD EULA (not open)	C lib, Rule-A safe at runtime	Zen5	Watch — closed licence is a negative
ROCm Triton	Python DSL → GPU kernels	MIT	No — Python compile	gfx1151 (patchy)	Skip (Rule A + GEAK is Py too)
GEAK-OptimAgent / GEAK-OpenEvolve	LLM-agent Triton kernel auto-tuner	Research	Python	MI-class only	Skip
ROCprofiler-SDK + rocpd	SQLite-backed profiler replacing rocprof	MIT	Yes (C API)	gfx1151 fork by woct0rdho	Steal — upgrade path from rocprof
AMD SMI	GPU telemetry CLI + C lib	MIT	Yes	Yes	Available
AMD uProf	CPU/GPU sampling profiler (IBS, PMC)	Freeware (proprietary)	Yes (binary)	Zen5 + Instinct only	Evaluate — no gfx1151 GPU path
Ryzen AI Software 1.7.1	NPU SDK (ONNX + Vitis AI EP)	Proprietary	No — Python wrappers	STX-H unsupported	Already rejected
MIVisionX / RPP	CV graph + preprocessing	MIT	Yes	Yes	Not AI/LLM — skip
Infinity Hub	Catalog of Docker containers	N/A	Containers violate bare-metal rule	Instinct-oriented	Skip
AMD AI Developer Portal	Website / learning hub	N/A	N/A	N/A	Reference only
Adrenalin AI Bundle	Desktop installer (Ollama + PyTorch + ComfyUI)	Proprietary shell	No	Radeon-dGPU	Skip

Count: 22 distinct tools / SDK families catalogued after dedup.

Verdicts

Steal (adopt now, fills a real gap)

Composable Kernel (CK) — ROCm/composable_kernel. Header-only C++ template

kernels; gfx1151 target already compiles. Candidate backbone for a fused ternary-GEMV + Sherry 3:4 unpack kernel, staying inside Rule A/B.

AITER — ROCm/aiter. Reference implementations of Flash-Decoding, MLA,

MoE dispatch. Even though the Python dispatcher violates Rule A, the HIP kernel sources are MIT and directly portable into our Rust-driven loader.

ZenDNN 5.2 — ships a llama.cpp backend and INT4 experimental path. Drop-in

CPU lane for Strix Halo Zen5 cores when iGPU is saturated or during fallback. Apache-2.0 fork of oneDNN; pure C++; zero Python at runtime.

Watch (not ready but roadmap-relevant)

AMD Quark — only AMD quantizer with real LLM INT4 / UINT4 + AutoSearch.

Python-only, so we'd treat it as an offline compile tool, never runtime. Watch for a ternary / 1.58-bit codepath (currently none). MIT licensed.

ROCprofiler-SDK + rocpd — successor to the rocprof we already use. SQLite

output is a big win for kernel-level regression tracking on bitnet_decode. Community fork has the gfx1151 PC-sampling patch (woct0rdho/rocm-systems).

Skip (closed, Python-only runtime, or already owned)

Ryzen AI SDK 1.7.1 (no STX-H Linux, already rejected).
hipBLAS/hipBLASLt (Rule C ban).
MIGraphX / MIOpen (no ternary op, we bypass with native kernels).
Triton + GEAK (Python compile-time; GEAK targets MI only).
Infinity Hub (containers violate bare-metal lock-in).
Adrenalin AI Bundle (consumer-desktop, Windows-first).
AOCL-DLP (EULA is not redistributable — BLIS is fine, DLP is not).

Notes on source quality

AMD's software-tools.html returned as a permission-blocked WebFetch and the AI Developer Portal (March 2026 launch) is mostly a tile-grid. The real catalogue lives at rocm.docs.amd.com/.../api-libraries.html and in the individual GitHub orgs (ROCm/*, amd/Quark, amd/ZenDNN). Nothing surprising surfaced that isn't already in ROCm or a named Zen/Ryzen SDK — AMD is not hiding a secret low-precision library.