AMD ALERT 🚀 MI355 is now 40% cheaper than B200 on GLM5 architecture for Single Node serving FP8 14 weeks after the initial launch of GLM5 on both non-MTP & MTP with spec decode for SGLang v0.12 for…

39,560 followers

23h

AMD ALERT 🚀 MI355 is now 40% cheaper than B200 on GLM5 architecture for Single Node serving FP8 14 weeks after the initial launch of GLM5 on both non-MTP & MTP with spec decode for SGLang v0.12 for both CUDA & ROCm. SPEED IS THE MOAT!! Great work to Anush E. Ramine Roane Henry X. & his team! Next step is for MI355X to catch up to CUDA when composing production inference optimizations like FP4 & on distributed inferencing where you can gang up MI355 boxes such that per GPU performance goes up thus the cost per million tokens goes down

5 Comments

Koosha Paridehpour 16h

It's all coming together... Or crashing down depending on who you ask 😂 Glad the American Triopoly/Monopolies across the supply chain are collapsing.

Wilson Bilkovich 13h

How are we doing with ROCm support on Triton/Gluon? That's been my reasoning for not owning any AMD GPU thus far. Currently I "only" have nVidia, BrainChip, and Google Coral in play. (Please sell me something, Akhetonics)

Aaron A. Glenn 10h

Hi SemiAnalysis what’s your analysis on AMD software finally beginning to match their hardware after years of a painfully obvious disconnect in this space?

Alexey Manakonov 22h

Oh, it's really cool, it becomes more a more helpfully to optimise code on rocm!

Omar Shehab 14h

Rajib Anwar

See more comments

To view or add a comment, sign in

LinkedIn respects your privacy

SemiAnalysis’ Post

More from this author

Huawei and Pangu Ultra Mixture of Experts (MoE)

Explore content categories