cuSPARSELt: A High-Performance CUDA Library for Sparse Matrix-Matrix Multiplication#
NVIDIA cuSPARSELt is a high-performance CUDA library dedicated to general matrix-matrix operations in which at least one operand is a structured sparse matrix with 50% sparsity ratio:
where refers to in-place operations such as transpose/non-transpose, and
are scalars or vectors.
The cuSPARSELt APIs allow flexibility in the algorithm/operation selection, epilogue, and matrix characteristics, including memory layout, alignment, and data types.
Download: developer.nvidia.com/cusparselt/downloads
Provide Feedback: Math-Libs-Feedback@nvidia.com
Examples: cuSPARSELt Example 1, cuSPARSELt Example 2
Blog post:
Exploiting NVIDIA Ampere Structured Sparsity with cuSPARSELt
Structured Sparsity in the NVIDIA Ampere Architecture and Applications in Search Engines
Making the Most of Structured Sparsity in the NVIDIA Ampere Architecture
Key Features#
NVIDIA Sparse MMA tensor core support
Mixed-precision computation support:
Input A/B
Input C
Output D
Compute
Block scaled
Support SM arch
FP32FP32FP32FP32No
8.0, 8.6, 8.79.0, 10.0, 10.111.0, 12.0, 12.1BF16BF16BF16FP32FP16FP16FP16FP32FP16FP16FP16FP16No
9.0INT8INT8INT8INT32No
8.0, 8.6, 8.79.0, 10.0, 10.111.0, 12.0, 12.1INT32INT32FP16FP16BF16BF16INT8INT8INT8INT32No
8.0, 8.6, 8.79.0, 10.0, 10.111.0, 12.0, 12.1INT32INT32FP16FP16BF16BF16E4M3FP16E4M3FP32No
9.0, 10.0, 10.111.0, 12.0, 12.1BF16E4M3FP16FP16BF16BF16FP32FP32E5M2FP16E5M2FP32No
9.0, 10.0, 10.111.0, 12.0, 12.1BF16E5M2FP16FP16BF16BF16FP32FP32E4M3FP16E4M3FP32A/B/D_OUT_SCALE =
VEC64_UE8M0D_SCALE =
32F10.0, 10.1, 11.012.0, 12.1BF16E4M3FP16FP16A/B_SCALE =
VEC64_UE8M0BF16BF16FP32FP32E2M1FP16E2M1FP32A/B/D_SCALE =
VEC32_UE4M3D_SCALE =
32F10.0, 10.1, 11.012.0, 12.1BF16E2M1FP16FP16A/B_SCALE =
VEC32_UE4M3BF16BF16FP32FP32Matrix pruning and compression functionalities
Activation functions, bias vector, and output scaling
Batched computation (multiple matrices in a single run)
GEMM Split-K mode
Auto-tuning functionality (see cusparseLtMatmulSearch())
NVTX ranging and Logging functionalities
Support#
Supported SM Architectures:
SM 8.0,SM 8.6,SM 8.7,SM 8.9,SM 9.0,SM 10.0,SM 10.1,SM 11.0,SM 12.0,SM 12.1Supported CPU architectures and operating systems:
OS |
CPU archs |
|---|---|
|
|
|
|