Databricks engineering Interview Questions

Databricks Software Engineer Case Interview — Lakehouse-Scale Streaming + Delta Lake System Design

Scenario: You’re the on-call SWE joining a Databricks team to build a multi-tenant, real-time analytics and ML feature pipeline on the Databricks Lakehouse. The system must ingest 1–2M events/sec of clickstream and transactions across regions, unify streaming and batch, power near-real-time dashboards and model features, and share curated data with internal teams and external partners. What this case covers (Databricks-specific focus): - Lakehouse architecture: Propose a Bronze/Silver/Gold layout on Delta Lake; justify partitioning, file size targets, and compaction strategy (OPTIMIZE + Z-ORDER vs partition columns; data skipping semantics). Explain schema enforcement/evolution, time travel, VACUUM retention trade-offs, and MERGE for CDC/upserts. - Streaming design: Use Auto Loader and Structured Streaming with checkpointing and exactly-once behavior to land data into Delta. Handle late/duplicate events with watermarks, idempotency keys, and state store tuning; discuss backpressure and recovery. Decide between DLT (Delta Live Tables) with expectations vs Jobs/Workflows for orchestration. - Performance & cost on Databricks: Explain join strategies (broadcast vs sort-merge), skew mitigation (salting/AQE), caching, and Photon acceleration for SQL/DataFrame workloads. Discuss autoscaling, cluster sizing, and DBU/cost controls; when to prefer Serverless SQL warehouses vs Jobs compute. - Governance & sharing: Apply Unity Catalog for fine-grained permissions, lineage, and isolation across workspaces; design secure data sharing with Delta Sharing. Cover multi-tenant concerns, row/column-level access via views/policies, PII handling, and audit/event logs. - Reliability & operations: Define SLOs (e.g., p95 < 3s for aggregates; 99.9% pipeline availability), alerting/observability (Spark UI, event logs, metrics), checkpoint corruption playbook, schema drift handling, backfills, and blue/green pipeline deploys. Incorporate data quality with expectations/tests in DLT or custom checks. - ML/AI tie-in: Produce online/offline features consistently (feature tables on Delta), track lineage/metrics with MLflow, and support batch scoring and near-real-time inference triggers. How the interview runs (75 min): - 10 min clarifying requirements: traffic, SLAs, data shapes, tenancy, compliance, clouds/regions. - 30–35 min architecture: Whiteboard an end-to-end design using core Databricks components (Auto Loader → Bronze Delta → transformations via DLT/Structured Streaming → Silver/Gold Delta → Serverless SQL/BI + feature tables + Delta Sharing). Call out storage layout, job boundaries, retries, failure modes. - 15 min deep dives: Partition/Z-ORDER rationale; watermarking and state management; MERGE semantics and file compaction; AQE/skew handling; Photon vs non-Photon choices; Unity Catalog policy design. - 10 min trade-offs & cost: Present alternatives (e.g., Kinesis vs Kafka source; DLT vs Jobs), DBU/cost estimates, and how you’d iterate safely. - 5 min Q&A: Risks, next steps, and what you’d measure first. What strong answers look like (aligned with Databricks culture): - Customer-obsessed scoping of SLAs and data consumers; simple, pragmatic architecture that can be owned end-to-end. - Clear reasoning about Spark internals (shuffle, joins, AQE), Delta Lake mechanics (transaction log, OPTIMIZE/VACUUM, time travel), and operational guardrails (checkpointing, expectations, lineage). - Thoughtful governance with Unity Catalog and secure external sharing; explicit cost/performance trade-offs using Photon, autoscaling, and file layout. - Communication that raises the bar: crisp diagrams, explicit assumptions, and measurable success criteria.

engineering

8 minutes

Practice with our AI-powered interview system to improve your skills.

About This Interview

Interview Type

PRODUCT SENSE

Difficulty Level

4/5

Interview Tips

• Research the company thoroughly

• Practice common questions

• Prepare your STAR method responses

• Dress appropriately for the role