
Uber AI Engineer Case Interview: Real‑Time ETA and Courier-Dispatch ML System for Delivery
What this covers: A realistic, end‑to‑end ML system design case patterned after Uber’s marketplace problem spaces (Mobility, Delivery, and Freight), with emphasis on Uber Eats–style ETA accuracy and courier dispatch decisions at global scale. The interviewer assesses depth in ML/AI, large‑scale systems, experimentation in networked marketplaces, and Uber‑specific pragmatism (bias for action, safety, ownership, and metrics‑driven decisions). Case prompt (given at start): “You’re the AI engineer leading an initiative to improve the promised‑by times shown to consumers and the dispatch of couriers for Uber’s Delivery segment in a new metro. Design an ML system that predicts end‑to‑end delivery time (prep + pickup + travel + handoff) and informs real‑time dispatch decisions. The system must handle high request volumes, heterogenous geographies, and rapidly changing conditions while protecting user safety and maintaining marketplace balance.” Key focus areas the interviewer will probe: - Problem formulation and objectives: Define primary metrics (e.g., ETA MAE/MAPE, on‑time rate, p95/p99 lateness), guardrails (cancellations, courier idle time, restaurant wait time), and business KPIs (conversion, repeat rate, GMV). Show how you’d set explicit SLOs (e.g., p99 inference latency targets and uptime) and calibration goals (well‑calibrated ETAs by product and region). - Data + features at Uber scale: Identify signal sources (order/dispatch events, courier GPS traces, restaurant prep histories, traffic, weather, holidays, road restrictions). Discuss geospatial representation (H3 hex indexing commonly used at Uber) and dynamic marketplace features (supply/demand ratios by H3 ring, queue depth at merchants, real‑time traffic speeds). Cover data quality, leakage avoidance, and privacy considerations. - Modeling approach: Compare pragmatic baselines (gradient‑boosted trees) vs. deep architectures (sequence models, spatiotemporal nets, mixture‑of‑experts). Explain label construction (true end‑to‑end time), handling censored/late data, cold‑start strategies (similar‑merchant backfills, neighborhood priors), and calibration (Platt/Isotonic). Address fairness and safety (do not systematically over‑ or under‑promise by neighborhood or merchant class). - Training/serving platform thinking: Outline offline training on a data lake and a feature store, plus an online feature store for low‑latency retrieval. Discuss streaming updates (near‑real‑time prep time signals, traffic), canary/shadow deployments, autoscaling stateless inference services, and caching strategies for hot H3 cells. Mention model/feature versioning and rollback playbooks. - Decisioning + control: Describe how ETA prediction feeds dispatch and batching policies; articulate trade‑offs between consumer promise accuracy and courier/restaurant efficiency. Propose fallback logic when models or upstreams degrade, and safe‑ops for severe events. - Experimentation in marketplaces: Propose A/B and switchback designs to avoid spillovers in two‑sided networks. Define success metrics, power, ramp criteria, and guardrail dashboards. Cover offline replay/simulation to de‑risk before field tests. - Monitoring & reliability: Define end‑to‑end observability—data freshness SLAs, feature drift checks, calibration/over‑under bias by segment, alerting on latency/error budgets. Include on‑call preparedness and incident response (e.g., auto‑revert to heuristic ETA). What “Uber‑specific” looks like in this case: Candidates who connect geospatial thinking (H3), pragmatic model choices, and marketplace‑aware experimentation tend to align with Uber’s culture. Expect the interviewer to push for measurable impact, safety considerations for all sides of the trip, and crisp trade‑offs over purely academic solutions. Familiarity with internal‑style platforms (e.g., ML platform + feature store concepts, continuous delivery, robust experimentation) and large‑scale, low‑latency serving is rewarded. Format and timing (guideline): - 5 min: Clarify goals, constraints, and success metrics. - 30–35 min: System/ML design (data, features, models, training/serving, decisioning loop). - 15–20 min: Deep dives (experimentation design, drift/cold start, failure modes, safety/fairness). - 5–10 min: Trade‑offs, cost/latency sizing, rollout plan, and Q&A. Evaluation rubric (what the interviewer scores): - Product + marketplace framing (clear objectives, correct guardrails, stakeholder awareness). - Technical depth (feature engineering, modeling choices, calibration, drift mitigation). - Systems engineering (feature store usage, streaming vs. batch, low‑latency serving, reliability/SLOs). - Experimentation rigor (A/B vs. switchback, metrics, ramp/rollback, interpreting results). - Uber culture signals (ownership, data‑driven decisions, safety, bias for action, pragmatism).
75 minutes
Practice with our AI-powered interview system to improve your skills.
About This Interview
Interview Type
PRODUCT SENSE
Difficulty Level
4/5
Interview Tips
• Research the company thoroughly
• Practice common questions
• Prepare your STAR method responses
• Dress appropriately for the role