g. , Flink or Spark Structured Streaming) that meet strict freshness and correctness constraints. Own data contracts between producers, pipelines, and consumers: schema evolution, versioning, compatibility, validation, and safe rollout. g. , Triton/ONNX Runtime/TensorRT) including batching and latency/cost tradeoffs. Ship safely with CI/CD, automated testing (unit/integration/data quality), and operational playbooks/runbooks. Bachelor’s or.
Responsibilities
Design and implement real-time streaming ETL / feature pipelines (e. Build and operate reliable messaging and ingestion with Kafka/Pulsar (partitioning strategy, retries, ordering guarantees, DLQs, backpressure handling). Implement production-grade backfill/replay workflows Define and meet SLOs using OpenTelemetry/Prometheus/Grafana for metrics, tracing, dashboards, alerting, and incident response readiness. Integrate pipelines with online stores/caches and ML consumers (feature stores, embedding pipelines, LLM API calls, online/offline consistency patterns). Partner with applied scientists on feature/embedding definitions, validation, and end-to-end quality measurement. Optimize end-to-end performance and efficiency: CPU/memory/I/O, serialization, caching, network overhead, concurrency, and pipeline compute cost. Contribute to serving/inference integrations where needed (e.
Required Qualifications
Master's degree in Computer Science, Electrical/Computer Engineering, or a related field, with 6+ years of related experience. Strong programming skills in language C++,C# or Python (at least one required). Hands-on experience in one or more: Building and operating streaming data pipelines in production (Flink or Spark Structured Streaming), Distributed systems engineering with strong reliability and operational rigor, Messaging systems such as Kafka/Pulsar. Experience operating services with Kubernetes/containers and production readiness practices (deployments, scaling, rollbacks). Experience with observability stacks such as OpenTelemetry, Prometheus, Grafana. Experience with feature stores, embedding pipelines, and online/offline consistency (freshness guarantees, correctness validation). Experience with data lakehouse/table formats and optimizations eg partitioning, compaction, and incremental processing. Experience with GPU inference serving (Triton, ONNX Runtime/TensorRT) and performance techniques (batching, request shaping, tail-latency reduction). Background in cost/performance modeling, capacity planning, and reliability improvements for high-scale data platforms. Experience in Ads/search/recommendations or other high-scale systems where freshness, latency, and cost are important
Original Posting
This role is sourced from Microsoft. Apply on Microsoft careers page