Trending

See what the GitHub community is most excited about this month.

deepseek-ai / DeepEP

DeepEP: an efficient expert-parallel communication library

Cuda 8,243 834 Built by

528 stars this month

thu-ml / SageAttention

Quantized Attention achieves speedup of 2-5x and 3-11x compared to FlashAttention and xformers, without lossing end-to-end metrics across language, image, and video models.

Cuda 1,859 141 Built by

230 stars this month

nerfstudio-project / gsplat

CUDA accelerated rasterization of gaussian splatting

Cuda 3,243 475 Built by

203 stars this month

karpathy / llm.c

LLM training in simple, raw C/CUDA

Cuda 27,043 3,110 Built by

293 stars this month

NVIDIA / CUDALibrarySamples

CUDA Library Samples

Cuda 1,999 396 Built by

41 stars this month

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Cuda 3,292 360 Built by

200 stars this month

Infatoshi / cuda-course

Cuda 1,239 224 Built by

86 stars this month

ROCm / rccl-tests

RCCL Performance Benchmark Tests

Cuda 69 49 Built by

2 stars this month

NVIDIA / nccl-tests

NCCL Tests

Cuda 1,166 291 Built by

37 stars this month

mit-han-lab / torchsparse

[MICRO'23, MLSys'22] TorchSparse: Efficient Training and Inference Framework for Sparse Convolution on GPUs.

Cuda 1,368 169 Built by

23 stars this month

NVIDIA / AMGX

Distributed multigrid linear solver library on GPU

Cuda 575 157 Built by

10 stars this month

HazyResearch / ThunderKittens

Tile primitives for speedy kernels

Cuda 2,491 158 Built by

75 stars this month

NVlabs / instant-ngp

Instant neural graphics primitives: lightning fast NeRF and more

Cuda 16,721 1,988 Built by

104 stars this month

rahul-goel / fused-ssim

Lightning fast differentiable SSIM.

Cuda 143 29 Built by

7 stars this month

princeton-vl / lietorch

Cuda 776 77 Built by

9 stars this month

tgale96 / grouped_gemm

PyTorch bindings for CUTLASS grouped GEMM.

Cuda 100 66 Built by

7 stars this month

Dao-AILab / causal-conv1d

Causal depthwise conv1d in CUDA, with a PyTorch interface

Cuda 503 110 Built by

33 stars this month

NVIDIA / nvbench

CUDA Kernel Benchmarking Library

Cuda 672 79 Built by

20 stars this month

rapidsai / cuvs

cuVS - a library for vector search and clustering on the GPU

Cuda 451 111 Built by

25 stars this month