Booting digital universe…

SSparsh
Available for Data Engineering roles

Hi, I'm Sparsh.
Data Engineer

Microsoft Certified Data Engineer building scalable Azure-native data platforms.

discover_me.py
Python
Press Run to execute

01 / Who I am

A storyteller of data.

4+ years delivering production data platforms across Databricks, Snowflake, ADF, and Azure SQL. I turn complex data challenges into simple, high-impact outcomes — from medallion lakehouses and SCD Type 2 pipelines to dbt-driven transformations.

My journey

Five years. Four chapters. One mission — make data make sense.

Microsoft Certified Data Engineer specialising in cloud-based ETL workflows, scalable data pipelines, and turning data into business insight. Here's how the story unfolded — pick a chapter.

Chapter 04 · Scaling at Speed
Hastings Direct, London

Building cross-asset analytics across 3M+ insurance policies — bike, van, car, home.

Today, at Hastings Direct, I work on the Pricing & Analytics team — designing dbt transformation models, ADF pipelines, and CI/CD on Azure DevOps. I integrated multi-product insurance data across 50+ tables, optimised 5,000+ rating factors, and cut a critical Python pipeline from 2.5h to 1h. Awarded Best Newcomer for impact within the team.

4+
Years experience
30+
Pipelines shipped
DP-203
Azure certified
Lakehouse
Databricks certified
Education
  • M.S., Big Data Science
    Queen Mary University of London
    Sept 2023 — Sept 2024 · 83%
  • B.S., Computer Engineering
    Bharati Vidyapeeth University, Pune
    Aug 2016 — Oct 2020 · 85%
Certifications
Microsoft Azure Data Engineer Associate (DP-203)Databricks Lakehouse FundamentalsSnowPro Platform CertificationHackerRank Advanced SQL Certification

02 / Toolkit

Skills universe.

A constellation of tools I use to design, build and ship data products at scale.

Languages
Cloud & Platforms
Data Engineering
Databases
AI / ML
DevOps & BI

03 / Career

Where I've shipped.

From SQL Server modernisation to medallion lakehouses and AI pipelines.

  • London, UK·Jun 2025 — Present

    Junior Data Engineer

    Hastings Direct
    • Designed ETL pipelines integrating multi-product insurance data (bike, van, car, home), extending 50+ tables for unified cross-asset analytics across 3M+ active policies.
    • Built and maintained ADF pipelines, dbt models, and DevOps CI/CD pipelines to automate workflows and enforce infra best practices.
    • Engineered 10+ SQL stored procedures and optimised 5,000+ rating factors — removed 100+ redundant datapoints, cutting storage cost by 20% and improving query performance by 15%.
    • Optimised Python pipelines to meet SLA, reducing processing time by 60% (2.5h → 1h).
    • Awarded Best Newcomer for outstanding contribution within the data engineering team.
    ADFdbtPythonSQLAzure DevOps
  • London, UK·May 2024 — Apr 2025

    Data & AI Engineer

    Assentian Limited
    • Automated ingestion from Azure SQL, REST APIs, and flat files into ADLS Gen2 via ADF — cut manual mapping/validation by 40% and enabled real-time financial dashboards.
    • Designed a Medallion Architecture on Databricks + ADF unifying 7+ sources, improving pipeline efficiency by 35% via agile iterations.
    • Engineered a Common Data Model, SCD Type 2 workflows, and DQ rules — slashed report errors by 99% and accelerated CDC pipelines by 25%.
    • Optimised Kimball star schemas, reducing BI query latency by 30% across 15+ KPI Power BI dashboards.
    DatabricksADFPySparkPower BIAzure SQL
  • India·Nov 2020 — Aug 2023

    Data Engineer (Client: Aon Netherlands Insurance)

    Infosys Limited
    • Managed a 1.2 TB enterprise data warehouse on SQL Server and migrated it to Azure — reduced compute cost by 25% with scalable compute & storage strategies.
    • Modernised 15+ ETL pipelines by replatforming SSIS to ADF, reducing latency by 20% via serverless trigger-based execution.
    • Designed 50+ scalable models using Data Vault 2.0 (hubs/links/satellites) and Kimball star schemas — cut query time by 25% across 10+ executive dashboards.
    • Resolved 90% of SQL Server bottlenecks via root-cause analysis, ensuring 99.9% accuracy in premium calculations across 2M+ policies.
    SQL ServerSSISADFData Vault 2.0T-SQL

04 / Selected work

Products & projects.

Each one shipped with measurable outcomes — speed, scale, or accuracy.

SourcesBronzeSilverGold

Automated Bronze → Gold Lakehouse

Medallion architecture in Azure for real-time BI.

End-to-end medallion pipeline orchestrated in Azure Data Factory, refining raw data through bronze, silver, and gold layers. Lookups and Get Metadata activities drive metadata-driven ingestion, with Power BI delivering real-time insights on top of Synapse.

3
Layers
7+
Sources unified
-30%
BI latency
ADFSynapseDatabricksPySparkPower BISQLPython
View on GitHub
FramesYOLOv9CNN-LSTMAlerts

AI-Driven Worksite Monitoring

YOLOv9 + CNN-LSTM for activity recognition.

Computer-vision system detecting worksite activities in real time. YOLOv9 handles object detection while a CNN-LSTM classifies activity sequences — enabling safety compliance monitoring at 96% accuracy.

96%
Accuracy
Real-time
Latency
Safety
Use case
YOLOv9CNN-LSTMPyTorchOpenCVPython
View on GitHub
Bike/Van/Car/HomeADFdbtAnalytics

Multi-Product Insurance Analytics

Cross-asset analytics across 3M+ policies.

Unified bike, van, car and home insurance data into a cross-asset analytics layer. Built ADF + dbt pipelines, extended 50+ tables, and automated CI/CD via Azure DevOps to support agile delivery for the analytics team.

3M+
Policies
50+
Tables extended
+60%
Pipeline speed
ADFdbtAzure DevOpsSQLPython

05 / Let's build

Get in touch.

Open to data engineering roles in the UK and remote.

Let's design what's next.

Whether it's a medallion lakehouse, a real-time CDC pipeline, or an AI agent — happy to chat about it.