Overview
A leading UK insurance provider needed to modernize their data platform to support real-time analytics and scale with growing data volumes.
The Challenge
The existing ETL system, built on SQL Server Integration Services, was struggling to keep pace with 10 million daily records. Pipeline failures were common, requiring manual intervention and causing reporting delays. The business needed a solution that could scale, provide reliability, and maintain data quality.
The Approach
We architected a Databricks-based lakehouse using Delta Live Tables for automated pipeline orchestration. The medallion architecture (bronze, silver, gold) provided clear separation of concerns and enabled incremental processing.
Implementation
Key components included:
- Bronze layer: Raw data ingestion using Auto Loader
- Silver layer: Cleansed and validated data with DLT expectations
- Gold layer: Business-ready aggregations for reporting
- Unity Catalogue: Enterprise-grade data governance
Results
- 60% faster pipeline processing compared to legacy system
- 40% cost reduction through optimized Spark configurations
- Zero manual intervention for standard pipeline runs
- Single source of truth for analytics across the organization
Key Lessons
- Start with a well-defined medallion architecture
- Invest in data quality expectations early
- Use Unity Catalogue from day one for governance
- Document runbooks for common failure scenarios