Bachelor’s or.
Responsibilities
Build and optimize routing services that handle large-scale inferencing across heterogeneous models. Develop robust APIs and orchestration layers for model lifecycle management and cost-quality trade-offs. Collaborate with applied scientists to integrate fine-tuning, evaluation, and adaptive routing algorithms into production systems. Ensure reliability, scalability, and security for mission-critical AI workloads.
Required Qualifications
master's degree in computer science, Engineering, or a related field, or equivalent practical experience. 5+ years of professional experience, including 2+ years with Python and ML frameworks such as PyTorch or TensorFlow. Hands-on experience with training or fine-tuning LLMs or multimodal models. Familiarity with production ML systems and concepts like model serving, caching, batching, and monitoring. Understanding of distributed systems and cloud-based infrastructure. Experience with containerization tools (e.g., Docker, Kubernetes). Exposure to MLOps or DevOps practices (CI/CD, automated testing, deployment). Interest in generative AI and open-source model ecosystems. Ability to work in a fast-paced, collaborative environment with a growth mindset. Strong communication and documentation skills.
Original Posting
This role is sourced from Microsoft. Apply on Microsoft careers page