Enqurious logo
Go back

Implementing Medallion Architecture using Databricks

9 Scenarios
3 Hours 55 Minutes
Intermediate
project poster
Industry
e-commerce
Skills
batch-etl
data-storage
data-wrangling
approach
data-understanding
data-modelling
data-quality
programming
code-versioning
git-version-control
problem-understanding
performance-tuning
cloud-management
Tools
databricks
azure
spark
sql
github
google-cloud
airflow

Learning Objectives

Design and implement an ETL/ELT pipeline using Delta Lake, following the Medallion Architecture.
Manage secure access and credentials in the ETL pipeline.
Automate deployment with CI/CD, ensuring seamless integration and testing.
Perform unit testing for data pipelines to ensure data accuracy and reliability.
Orchestrate data pipeline using workflows
Optimize ETL performance using Spark and Delta Lake best practices.
Design and implement incremental data loading across the Medallion Architecture.

Overview

GlobalMart, a rapidly growing e-commerce startup, faces several data management challenges that impact its ability to generate reliable insights and make timely business decisions. Key issues include:

  • Data Quality Issues: Inconsistencies in data formats, missing values, and duplication create inefficiencies in processing.
  • Slow Data Transformations – As data volume increases, sluggish transformation processes delay critical insights.
  • Lack of Streamlined Workflows – Inefficient data processing, handling of invalid data, and the inability to implement incremental data loading disrupt operations and reduce overall efficiency. As data volume grows, processing only the new or updated records is crucial for better resource utilization.
  • Security Risks – Improper handling of sensitive credentials and access keys poses potential security threats.
  • Unreliable Deployment & Testing – The absence of a structured framework makes changes to data pipelines error-prone, increasing operational overhead.

These issues led to a lack of trust in data systems, rendering them useless. In this project, you will be spending time implementing the following architecture that addresses all the problems that Globalmart is currently facing in their data systems

Image-Architectures-Page-3.drawio-(1).png

Prerequisites

  • Familiarity with Databricks, PySpark & Python
  • Knowledge of ETL/ELT Processes & Pipeline Management
  • Basic Knowledge of CI/CD Pipelines
  • Familiarity with Delta Lake
  • Familiarity with Incremental Loading
Redefining the learning experience

Supercharge Your
Data+AI Teams with us!