Website Nvidia
About Nvidia
NVIDIA is the world leader in accelerated computing. NVIDIA pioneered accelerated computing to tackle challenges no one else can solve. Our work in AI and digital twins is transforming the world’s largest industries and profoundly impacting society. Learn more about NVIDIA.
Job Summary
NVIDIA is looking for a passionate talent to work in its Compute Developer Technology (DevTech) team. In this role, you will research and develop techniques to GPU-accelerate leading applications in high performance computing fields within machine and deep learning, scientific computing, and data processing, performing in-depth analysis and optimization to ensure the best possible performance on current- and next-generation GPU architectures.
Key Responsibilities
- Working directly with key application developers (especially LLM) to understand the current and future problems they are solving.
- Creating and optimizing core parallel algorithms and data structures to provide the best solutions using GPUs, through both library development and direct contribution to the applications.
- Training and inference optimization for large language models, directly contributing to frameworks such as Megatron and TRTLLM, SGLang, vLLM.
- Collaborating closely with the architecture, research, libraries, tools, and system software teams at NVIDIA to influence the design of next-generation architectures, software platforms, and programming models.
- Engaging in deep optimization of high-performance operators, involving but not limited to CUDA deep optimization, instruction and compiler optimization.
Requirements
- Pursuing MS or PhD from a leading University in an engineering or Computer Science related discipline.
- Strong knowledge of C/C++ and/or Fortran.
- Knowledge of software design, programming techniques, and algorithms.
- Knowledge of LLM training/inference optimization, including but not limited to development and optimization experience in distributed training/inference, NCCL, NVSHMEM, IB, RoCE, etc.
- Strong mathematical fundamentals, including linear algebra and numerical methods.
- Experience with parallel programming, ideally CUDA C/C++ and OpenACC.
- Good communication and organization skills, with a logical approach to problem solving, good time management, and task prioritization skills.
To apply for this job please visit nvidia.wd5.myworkdayjobs.com.