Website Nvidia
About Nvidia
Nvidia is a leading technology company specializing in AI and high-performance computing.
Job Summary
We are seeking highly skilled and motivated software engineers to join us and build AI inference systems that serve large-scale models with extreme efficiency.
Key Responsibilities
- Contribute features to vLLM that empower the newest models with the latest NVIDIA GPU hardware features.
- Profile and optimize the inference framework (vLLM) with methods like speculative decoding, data/tensor/expert/pipeline-parallelism, prefill-decode disaggregation.
- Develop, optimize, and benchmark GPU kernels using techniques such as fusion, autotuning, and memory/layout optimization.
- Define and build inference benchmarking methodologies and tools.
- Architect the scheduling and orchestration of containerized large-scale inference deployments on GPU clusters across clouds.
- Conduct and publish original research that pushes the pareto frontier for the field of ML Systems.
Requirements
- Bachelor’s degree (or equivalent experience) in Computer Science, Computer Engineering, or Software Engineering with 7+ years of experience; alternatively, Master’s degree with 5+ years of experience; or PhD degree with top-tier publications in ML Systems.
- Strong programming skills in Python and C/C++; experience with Go or Rust is a plus.
- Solid CS fundamentals: algorithms & data structures, operating systems, computer architecture, parallel programming, distributed systems, deep learning theories.
- Knowledgeable and passionate about performance engineering in ML frameworks.
- Familiarity with GPU programming and performance: CUDA, memory hierarchy, streams, NCCL.
- Excellent debugging, problem-solving, and communication skills.
Ways to stand out from the crowd
- Experience building and optimizing LLM inference engines.
- Hands-on work with ML compilers and DSLs.
To apply for this job please visit nvidia.wd5.myworkdayjobs.com.