Own Search Quality & Evals Strategy: Define the evaluation framework for Teams Search, covering both classical IR metrics (NDCG, precision@k) and LLM-era metrics (relevance, groundedness, faithfulness, helpfulness). Represent Teams Search in cross-org quality reviews, red-lines, and AI safety reviews. Mentor junior PMs on the team; model strong PM craft: crisp specs, rigorous eval thinking, and structured decision-making.
Responsibilities
Lead the design of human evaluation pipelines, including task design, rater guidelines, inter-rater reliability, and feedback loops into model/ranking improvements. Build and maintain an offline eval harness: query sets, golden datasets, annotation workflows, and regression benchmarks used to gate every major search update. Partner with Applied Science and engineering to translate eval signals into roadmap priorities and quality SLAs. Work with the Copilot/AI team to integrate retrieval-augmented generation (RAG) patterns into Teams Search, ensuring quality, safety, and latency bars are met before ship. Build a culture of evidence-based decision making; push back on intuition-only decisions with data. Collaborate with partner teams across M365 Search, Copilot, to align on shared infrastructure, signals, and quality bars.
Required Qualifications
Bachelor's Degree AND 8+ years experience in product/service/program management or software development. OR equivalent experience. 8+ years of product management experience, with at least 3 years in search, information retrieval, or a related ML/AI-powered product area. Deep, hands-on experience with evaluation frameworks for search or AI systems. You have personally designed eval sets, defined metrics, and used eval results to drive product decisions. Solid understanding of IR fundamentals: ranking, relevance, query understanding, recall/precision trade-offs. Experience with LLM-era eval challenges: hallucination detection, groundedness, response quality, and human eval design for generative outputs. Demonstrated ability to work cross-functionally with applied scientists, ML engineers, and data teams. You are comfortable reading model cards, experiment results, and offline eval reports. Experience shipping search or retrieval features in a large-scale consumer or enterprise product (100M+ users). Familiarity with RAG pipelines, vector databases, hybrid retrieval architectures, or semantic search systems. Experience with annotation platforms (Scale AI, Appen, internal tools) and managing human evaluation programs. Prior experience in a Hyderabad/Bangalore engineering or PM role, with ability to influence across time zones. MBA or Master's in a technical field. High impact: Search is used by every Teams user, every day. Quality improvements are felt immediately at 300M+ scale. IDC ownership: This is not a satellite role. IDC Bangalore has full feature ownership within Teams Search, and you will lead strategy, not just execute. Growth: Evals discipline is one of the fastest-growing skill sets in the industry. This role makes you a domain expert at one of the world's leading AI companies.
Original Posting
This role is sourced from Microsoft. Apply on Microsoft careers page