Design, implement, and optimize GPU kernels for complex computational workloads such as AI inferencing. Research and develop novel optimization techniques for generation of GPU kernels. Document optimization strategies and maintain performance benchmarks.
Responsibilities
Profile and analyze kernel performance using advanced diagnostic tools. Generate automated solutions for kernel optimization and tuning. Collaborate with other researchers to improve model performance. Contribute to the development of internal GPU computing frameworks.
Required Qualifications
Doctorate in relevant field - OR equivalent experience. Solid understanding of GPU architecture, memory hierarchies, parallel computing and algorithm optimization. Hands-on experience in GPU programming, including performance profiling and optimization tools. Advanced C++ programming skills. Other Requirements 5+ years of experience in GPU programming and optimization, expert knowledge of CUDA, ROCm, Triton, PTX, CUTLASS, or similar GPU programming frameworks Experience with machine learning frameworks (PyTorch, TensorFlow) Familiarity with compiler optimization techniques and background in auto-tuning and automated code generation Publication record in relevant conferences or journals (MLSys, NeurIPS, ICML, ICLR, AISTATS, ACL, EMNLP, NAACL, ISCA, MICRO, ASPLOS, HPCA, SOSP, OSDI, NSDI, etc.)
Original Posting
This role is sourced from Microsoft. Apply on Microsoft careers page