AMD ALERT 🚀 MI355 is now 40% cheaper than B200 on GLM5 architecture for Single Node serving FP8 14 weeks after the initial launch of GLM5 on both non-MTP & MTP with spec decode for SGLang v0.12 for both CUDA & ROCm. SPEED IS THE MOAT!! Great work to Anush E. Ramine Roane Henry X. & his team! Next step is for MI355X to catch up to CUDA when composing production inference optimizations like FP4 & on distributed inferencing where you can gang up MI355 boxes such that per GPU performance goes up thus the cost per million tokens goes down
How are we doing with ROCm support on Triton/Gluon? That's been my reasoning for not owning any AMD GPU thus far. Currently I "only" have nVidia, BrainChip, and Google Coral in play. (Please sell me something, Akhetonics)
Hi SemiAnalysis what’s your analysis on AMD software finally beginning to match their hardware after years of a painfully obvious disconnect in this space?
Oh, it's really cool, it becomes more a more helpfully to optimise code on rocm!
It's all coming together... Or crashing down depending on who you ask 😂 Glad the American Triopoly/Monopolies across the supply chain are collapsing.