Sources have told SemiAnalysis that Mark Zuckerberg posted internally at Meta this morning: "I want to be clear that we do not expect other company-wide layoffs this year."
SemiAnalysis
Semiconductors
Bridging the gap between business and the world's most important industry.
About us
Bridging the gap between business and the worlds most important industry.
- Website
-
www.semianalysis.com/
External link for SemiAnalysis
- Industry
- Semiconductors
- Company size
- 51-200 employees
- Type
- Privately Held
Employees at SemiAnalysis
Updates
-
Warren Buffett's Berkshire Hathaway first invested in Google in Q3 2025, coincidentally the same time that SemiAnalysis called out a huge increase in TPU purchases from Anthropic. In Q1 Berkshire added more to their position. Buffett actually said this about his own diligence into Google's AI Infrastructure supremacy: "I don't buy what I can't understand. So Greg sat me down with the TPU v5p spec. 8,960 chips wired in a 3D torus, every chip talking to six neighbors over ICI links at 4.8 Tbps a pop, wraparound rails so no chip is ever at the end of the line, and optical circuit switches throwing the junctions mid-job to carve out whatever submesh your sharded matmul needs. Add in ring all-reduce running consists in both directions along each torus axis, collective permutes shuffling shards between sidings, and bandwidth-optimal SPMD partitioning across the data, model, and pipeline dimensions. Folks, it's just BNSF. Six neighbors, scheduled consists, a yardmaster throwing switches, trains that never stop. And I've been understanding railroads since 1942." Want to understand the TPU system architecture as deeply as Buffett? Read more here: https://lnkd.in/dAvjCBJt
-
AMD ALERT 🚀 MI355 is now 40% cheaper than B200 on GLM5 architecture for Single Node serving FP8 14 weeks after the initial launch of GLM5 on both non-MTP & MTP with spec decode for SGLang v0.12 for both CUDA & ROCm. SPEED IS THE MOAT!! Great work to Anush E. Ramine Roane Henry X. & his team! Next step is for MI355X to catch up to CUDA when composing production inference optimizations like FP4 & on distributed inferencing where you can gang up MI355 boxes such that per GPU performance goes up thus the cost per million tokens goes down
-
-
While you were posting hot takes on Twitter, I studied the die shot. While you were chasing clout with your "AI will change everything" & "Top 10 GPT-3 Prompt tips" LinkedIn posts, I mastered the JAX first principles bible coauthored by my cousin. While you wasted your days reposting Sam Altman tweets in pursuit of engagement, I cultivated deep knowledge of MoE routing, KV cache sizing, and FP4 GEMM throughput on tcgen05 MMA uarch becnhmarking While you were partying at NeurIPS afterparties, I was SSH'd into a Slurm cluster at 3am debugging enroot squashfs errors & driving to the colocation to replace broken SXM modules. And now that Rubin is launching and your clients are asking about inference cost modelling and rack-level power delivery and modelling interactivity which are NVLink bandwidth bound, you have the audacity to come to me for help
-
Our SemiAnalysis Weekly Podcast often asks - Is the AI cycle this time truly different from other cycles? Well, at least from our analysis, we think the return from AI is real and it looks like a structural trend that is truly different from other cycles. We tracked token spend vs human labor cost across 9 real workflows at SemiAnalysis - these are tasks our analyst team do consistently to stay on top of the industry: company initiations, earnings recaps, conference transcript mining, financial data pulls. The ROI on every single task was over 10x. Most were 60-90x. This is why the demand isn't cyclical - once you see that a 20-hour task costs $21 in tokens, you can never go back to doing it by hand. The workflow is just permanently different and provides multiples in term of returns from time-savings. And we at SemiAnalysis only represents the tip of the iceberg. We track Claude Code Github Commits daily, and the image below represents what it looked like in Feb 2026 (Hint: Our daily tracker has is showing the line only going up further up and to the right since this publicly available chart in Feb). But we still think we are in the early innings. The banks aren't using it. Enterprises are still figuring out how to use it effectively. Compliance and IT teams are still figuring out guardrails. The gap between "this is obviously useful" and "we've actually wired it into how our analysts work every day" is still massive at most places.
-
-
Join our own Jordan Nanos from SemiAnalysis and Nebius on May 20 at 9 AM PT / 6 PM CEST for a technical deep dive into what really determines GPU infrastructure efficiency in production. What we’ll cover: → How SemiAnalysis ClusterMAX™ evaluates GPU cloud providers → Why reliability failures become hidden infrastructure costs → How Nebius designs fault-tolerant GPU clusters → Results from independent live-cluster testing → TCO modeling across 3 real AI workloads Register here: https://lnkd.in/ese3D2_S! You don't want to miss it.
-
-
While you were posting hot takes on Twitter, I studied the die shot. While you were chasing clout with your "AI will change everything" & "Top 10 GPT-3 Prompt tips" LinkedIn posts, I mastered the JAX first principles bible coauthored by my cousin. While you wasted your days reposting Sam Altman tweets in pursuit of engagement, I cultivated deep knowledge of MoE routing, KV cache sizing, and FP4 GEMM throughput on tcgen05 MMA uarch becnhmarking While you were partying at NeurIPS afterparties, I was SSH'd into a Slurm cluster at 3am debugging enroot squashfs errors & driving to the colocation to replace broken SXM modules. And now that Rubin is launching and your clients are asking about inference cost modelling and rack-level power delivery and modelling interactivity which are NVLink bandwidth bound, you have the audacity to come to me for help