Member of Technical Staff, Kernel Engineering
Inferact
San Francisco, CA
See who Inferact has hired for this role
See who Inferact has hired for this role
Inferact provided pay range
This range is provided by Inferact. Your actual pay will be based on your skills and experience — talk with your recruiter to learn more.
Base pay range
About The Role
We're looking for a performance engineer to squeeze every FLOP out of modern accelerators. You'll write the kernels and low-level optimizations that make vLLM the fastest inference engine in the world. Your code will run on hundreds of accelerator types, from NVIDIA GPUs to emerging silicon. When hardware vendors develop new chips, they integrate with vLLM. You'll work directly with these teams to ensure we're extracting maximum performance from every generation of hardware.
Skills And Qualifications
Minimum qualifications:
- Bachelor's degree or equivalent experience in computer science, engineering, or similar.
- Deep experience writing CUDA kernels or equivalent (CuTeDSL, Triton, TileLang, Pallas).
- Strong understanding of GPU architecture: memory hierarchy, warp scheduling, tiling, tensor cores.
- Proficiency in C++ and Python with demonstrated ability to write high-performance code.
- Experience with profiling tools (Nsight, rocprof) and performance optimization methodologies.
- Obsession with benchmarks and squeezing every percentage point of speedup.
- Experience with ML-specific kernel optimization (FlashAttention, fused kernels).
- Knowledge of quantization techniques (INT8, FP8, mixed-precision).
- Familiarity with multiple accelerator platforms (NVIDIA, AMD, TPU, Intel).
- Experience with compiler technologies (LLVM, MLIR, XLA).
- Kernel-related contributions to vLLM or other inference engine projects.
- Contributions to open-source GPU, ML systems, or compiler optimization projects
- Written deep technical blogs on GPU optimization.
- Location: This role is based in San Francisco, California. Will consider remote in the US for exceptional candidates.
- Compensation: Depending on background, skills, and experience, the expected annual salary range for this position is $200,000 - $400,000 USD + equity.
- Visa sponsorship: We sponsor visas on a case-by-case basis.
- Benefits: Inferact offers generous health, dental, and vision benefits as well as 401(k) company match.
-
Seniority level
Mid-Senior level -
Employment type
Full-time -
Job function
Engineering and Information Technology -
Industries
Software Development
Referrals increase your chances of interviewing at Inferact by 2x
See who you knowGet notified about new Member of Technical Staff jobs in San Francisco, CA.
Sign in to create job alertSimilar jobs
People also viewed
-
Full Stack Software Engineer
Full Stack Software Engineer
-
Software Engineer
Software Engineer
-
Software Engineer, Backend
Software Engineer, Backend
-
Software Engineer, Trust
Software Engineer, Trust
-
Full-Stack Software Engineer, Incubation
Full-Stack Software Engineer, Incubation
-
Software Engineer, Backend
Software Engineer, Backend
-
Software Engineer – Frontend / Backend / Fullstack
Software Engineer – Frontend / Backend / Fullstack
-
Software Engineer, Backend
Software Engineer, Backend
-
Full-Stack Software Engineer
Full-Stack Software Engineer
-
Full-Stack Software Engineer (in-person / remote optional)
Full-Stack Software Engineer (in-person / remote optional)
Similar Searches
Explore top content on LinkedIn
Find curated posts and insights for relevant topics all in one place.
View top content