Will the Metal4 update bring significant optimizations for future pytorch mps performance and compatibility?

Hi there, I’m a Mac user using pytorch, and I understand that pytorch’s metal backend is implemented through the metal performance shader(mps), and at WWDC25 I noticed that the latest Metal4 has been heavily optimized for machine learning, and is starting to natively support tensor, which in my mind should drastically reduce the difficulty of making pytorch mps-compatible, and lead to a huge performance boost!
This issue is just for discussing the possible performance improvement of metal4, if there is any misrepresentation please point it out and I will make a statement and correction!
quote from malfet

Thank you for creating the issue, but as you’ve stating this feels like a discussion at this point, so please do not hesitate to continue the discussion at https://dev-discuss.pytorch.org. My quick experiments showed that new API is limited to just 4D tensor, not available to general public until Sep and I have not noticed significant performance improvements (which undoubtedly will change in the future)