Insurance

Migrating Insurance Platform to Databricks at Scale

The Challenge

Legacy ETL couldn't handle 10M+ daily records. Pipeline failures were causing reporting delays and manual intervention.

The Outcome

60% faster processing, 40% cost reduction, unified lakehouse architecture

Technologies

Databricks, Delta Live Tables, Unity Catalogue, Azure Data Factory

Overview

A leading UK insurance provider needed to modernize their data platform to support real-time analytics and scale with growing data volumes.

The Challenge

The existing ETL system, built on SQL Server Integration Services, was struggling to keep pace with 10 million daily records. Pipeline failures were common, requiring manual intervention and causing reporting delays. The business needed a solution that could scale, provide reliability, and maintain data quality.

The Approach

We architected a Databricks-based lakehouse using Delta Live Tables for automated pipeline orchestration. The medallion architecture (bronze, silver, gold) provided clear separation of concerns and enabled incremental processing.

Implementation

Key components included:

  • Bronze layer: Raw data ingestion using Auto Loader
  • Silver layer: Cleansed and validated data with DLT expectations
  • Gold layer: Business-ready aggregations for reporting
  • Unity Catalogue: Enterprise-grade data governance

Results

  • 60% faster pipeline processing compared to legacy system
  • 40% cost reduction through optimized Spark configurations
  • Zero manual intervention for standard pipeline runs
  • Single source of truth for analytics across the organization

Key Lessons

  1. Start with a well-defined medallion architecture
  2. Invest in data quality expectations early
  3. Use Unity Catalogue from day one for governance
  4. Document runbooks for common failure scenarios