1bit.systems

AMD Compilers, Analyzers, Debuggers — 2026-04-20 scan

Lens: what improves 1bit systems's AIE kernel authoring, HIP perf, or ROCm profiling beyond our current stack (gcc 14, clang 20, hipcc, Peano/llvm-aie, rocprof, rocgdb, bindgen). Landing page: https://www.amd.com/en/developer/browse-by-resource-type/software-tools.html (fetch blocked; table from GPUOpen + ROCm docs).

Catalog

ToolWhat it doesLinux today?LicenseFits our stack?
AOCCZen-tuned Clang/Flang (LLVM 17), no GPU offloadYesAMD EULAOverlaps clang 20; Zen5 wins marginal vs CachyOS znver4. Skip.
AOMPLLVM+OpenMP GPU offload stagingYesApache 2.0hipcc covers us. Defer.
hipcc / ROCm LLVMHIP + amdclang++YesNCSA/MITUsing.
ROCgdbGDB with AMDGPU wavefront debugYesGPLUsing.
rocprof / rocprofv2HW counters, kernel traces (EOL 2026-Q2)YesMITUsing. Migrate Q2.
rocprofiler-sdk / rocprofv3Successor C++ APIYesMITAdopt on next bump.
ROCm Compute Profiler (rocprof-compute, ex-Omniperf)Roofline, L1/L2/LDS counters, Grafana UIYes (ROCm ≥6.2)MITAdopt. Serves Gerganov L1 lead + Sherry bytes-read validation.
ROCm Systems Profiler (ex-Omnitrace)Whole-app timeline, GPU+CPU samplingYesMITAdopt for voice lane.
Radeon GPU Analyzer (RGA)Offline compiler + ISA inspector (Vulkan/DX/GL/OpenCL), VGPR/SGPR/occupancyYes (CLI + VS Code ext)MITgfx1151 supported (2.12+ adds gfx1150/1151/1152/1201). No native HIP input, but OpenCL+GCN ISA dump fills a real gap vs rocprof. Adopt.
Radeon GPU Profiler (RGP)Vulkan/DX12 frame profilerYesMITGraphics only. Skip.
Radeon Raytracing Analyzer (RRA)BVH inspectorYesMITSkip.
Radeon Memory Visualizer (RMV)VRAM residency timelineYes (AMDVLK only)MIT25.20 RADV-default breaks capture. Skip for now.
AMD uProfZen CPU profiler: IBS, branch mispredict, LLC miss, AVX-512 retire; also MI metricsYes, AUR amduprofAMD EULAAdopt for CPU offload lane. Only tool with clean Zen5 PMC coverage on Linux.
GPU PerfStudio / RCPDeprecatedSkip.
llvm-aie llvm-mc, llvm-objdumpAIE2/AIE2P assembler + disassembler in Peano forkYes, at /opt/peano/binApache-2.0Already installed; just not invoking. llvm-objdump -d --triple=aie2p-none-unknown-elf disassembles AIE2P ELFs. Use for AIE debug.
mlir-aie / IRONMLIR AIE dialect + Python IRON front-endYesApache-2.0Authoring-time Python OK (Rule A covers runtime). Defer until first AIE2P kernel.

Special-focus answers

Adopt order (hot-path impact)

  1. RGA — offline gfx1151 ISA / occupancy / register-pressure on the ternary GEMV (rocprof doesn't show this).
  2. rocprof-compute — roofline validates "92% of LPDDR5 peak" and Sherry's projected bytes-read drop.
  3. AMD uProf — CPU offload lane, only clean Zen5 PMC coverage on Linux.
  4. llvm-objdump --triple=aie2p — free, already on disk.
  5. rocprofiler-sdk — migrate before rocprof EOL 2026-Q2.

Skips: AOCC, AOMP (overlap); RGP/RRA/RMV (wrong workload); GPU PerfStudio, RCP (deprecated).

URLs