End-to-end medallion pipeline orchestrated in Azure Data Factory, refining raw data through bronze, silver, and gold layers. Lookups and Get Metadata activities drive metadata-driven ingestion, with Power BI delivering real-time insights on top of Synapse.
Hi, I'm Sparsh.
Data Engineer
Microsoft Certified Data Engineer building scalable Azure-native data platforms.
01 / Who I am
A storyteller of data.
4+ years delivering production data platforms across Databricks, Snowflake, ADF, and Azure SQL. I turn complex data challenges into simple, high-impact outcomes — from medallion lakehouses and SCD Type 2 pipelines to dbt-driven transformations.
Five years. Four chapters. One mission — make data make sense.
Microsoft Certified Data Engineer specialising in cloud-based ETL workflows, scalable data pipelines, and turning data into business insight. Here's how the story unfolded — pick a chapter.
“Building cross-asset analytics across 3M+ insurance policies — bike, van, car, home.”
Today, at Hastings Direct, I work on the Pricing & Analytics team — designing dbt transformation models, ADF pipelines, and CI/CD on Azure DevOps. I integrated multi-product insurance data across 50+ tables, optimised 5,000+ rating factors, and cut a critical Python pipeline from 2.5h to 1h. Awarded Best Newcomer for impact within the team.
- M.S., Big Data ScienceQueen Mary University of LondonSept 2023 — Sept 2024 · 83%
- B.S., Computer EngineeringBharati Vidyapeeth University, PuneAug 2016 — Oct 2020 · 85%
02 / Toolkit
Skills universe.
A constellation of tools I use to design, build and ship data products at scale.
03 / Career
Where I've shipped.
From SQL Server modernisation to medallion lakehouses and AI pipelines.
- London, UK·Jun 2025 — Present
Junior Data Engineer
Hastings Direct- Designed ETL pipelines integrating multi-product insurance data (bike, van, car, home), extending 50+ tables for unified cross-asset analytics across 3M+ active policies.
- Built and maintained ADF pipelines, dbt models, and DevOps CI/CD pipelines to automate workflows and enforce infra best practices.
- Engineered 10+ SQL stored procedures and optimised 5,000+ rating factors — removed 100+ redundant datapoints, cutting storage cost by 20% and improving query performance by 15%.
- Optimised Python pipelines to meet SLA, reducing processing time by 60% (2.5h → 1h).
- Awarded Best Newcomer for outstanding contribution within the data engineering team.
ADFdbtPythonSQLAzure DevOps - London, UK·May 2024 — Apr 2025
Data & AI Engineer
Assentian Limited- Automated ingestion from Azure SQL, REST APIs, and flat files into ADLS Gen2 via ADF — cut manual mapping/validation by 40% and enabled real-time financial dashboards.
- Designed a Medallion Architecture on Databricks + ADF unifying 7+ sources, improving pipeline efficiency by 35% via agile iterations.
- Engineered a Common Data Model, SCD Type 2 workflows, and DQ rules — slashed report errors by 99% and accelerated CDC pipelines by 25%.
- Optimised Kimball star schemas, reducing BI query latency by 30% across 15+ KPI Power BI dashboards.
DatabricksADFPySparkPower BIAzure SQL - India·Nov 2020 — Aug 2023
Data Engineer (Client: Aon Netherlands Insurance)
Infosys Limited- Managed a 1.2 TB enterprise data warehouse on SQL Server and migrated it to Azure — reduced compute cost by 25% with scalable compute & storage strategies.
- Modernised 15+ ETL pipelines by replatforming SSIS to ADF, reducing latency by 20% via serverless trigger-based execution.
- Designed 50+ scalable models using Data Vault 2.0 (hubs/links/satellites) and Kimball star schemas — cut query time by 25% across 10+ executive dashboards.
- Resolved 90% of SQL Server bottlenecks via root-cause analysis, ensuring 99.9% accuracy in premium calculations across 2M+ policies.
SQL ServerSSISADFData Vault 2.0T-SQL
04 / Selected work
Products & projects.
Each one shipped with measurable outcomes — speed, scale, or accuracy.
Computer-vision system detecting worksite activities in real time. YOLOv9 handles object detection while a CNN-LSTM classifies activity sequences — enabling safety compliance monitoring at 96% accuracy.
Multi-Product Insurance Analytics
Cross-asset analytics across 3M+ policies.
Unified bike, van, car and home insurance data into a cross-asset analytics layer. Built ADF + dbt pipelines, extended 50+ tables, and automated CI/CD via Azure DevOps to support agile delivery for the analytics team.
05 / Let's build
Get in touch.
Open to data engineering roles in the UK and remote.
Let's design what's next.
Whether it's a medallion lakehouse, a real-time CDC pipeline, or an AI agent — happy to chat about it.