Release Notes Release 2.5

Key Features and Enhancements

  • [Jax] Added support for sliding window attention (SWA) in context parallel ring attention using THD format and striped sharding.

  • [Jax] Improved performance for per-tensor scaling FP8 recipe.

  • [PyTorch] Enabled FP8 tensor-parallel communication for FP8 block scaling recipe for Hopper by supporting coalesced gather of FP8 quantized tensors.

  • [PyTorch] Optimized MXFP8 Userbuffers implementation by overlapping wgrad NCCL all-gather with dgrad GEMM..

  • [PyTorch] Added support for CPU offloading when using FP8 parameters.

  • [PyTorch] Added support for Context Parallel for Multi Latent Attention (MLA).

  • [PyTorch] Reduced CPU overhead in MoE.

  • [C][PyTorch] Improved performance for FP8 padding and unpadding kernels for MoE.

  • [Jax] Added MXFP8 support for the GroupedDense module and handle the case with zero input tokens.

  • Added support for Python 3.12+

  • Added support for head dimension (head_dim) > 128 for attention for all architectures.

  • [PyTorch] Added support for FP8 current scaling in operation-based API.

Fixed Issues

  • [Jax] Fixed a numerical error in the scaled masked softmax kernel.

  • [Jax] Fixed output dtype for FP8 GEMM.

  • [PyTorch] Fixed a bug that appeared when the FP8 recipe is changed in between training steps.

  • [PyTorch] Made miscellaneous fixes in TransformerLayer: Pass missing arguments cu_seqlens and max_seqlen to cross-attention and allow attn_input_format=thd.

  • [PyTorch] Fixed a crash when loading checkpoints from previously generated Transformer Engine versions.

  • [PyTorch] Made miscellaneous fixes in CPU offloading logic.

  • [PyTorch] Fixed a numerical issue in cross-entropy loss.

  • [C][PyTorch][Jax] Fixed source installation when using NVTE_FRAMEWORK=all.

  • [PyTorch] Fixed a crash in GroupedLinear when using CUDA graphs.

Known Issues in This Release

There are no known issues in this release.

Breaking Changes in This Release

There are no breaking changes in this release.

Deprecated Features

There are no deprecated features in this release.

Miscellaneous

There are no miscellaneous issues in this release.