Request a Demo
See how leading Data + AI teams achieve 34% faster productivity.
Go back

Building ETL pipeline on GCP | Set 2

9 Scenarios
3 Hours 55 Minutes
Advanced
project poster
Industry
retail-and-cpg
e-commerce
Skills
cloud-management
data-understanding
data-storage
batch-etl
programming
data-wrangling
approach
data-modelling
data-quality
code-versioning
git-version-control
problem-understanding
performance-tuning
Tools
google-cloud
spark
sql
airflow
github
databricks

Learning Objectives

Design and implement an ETL/ELT pipeline using Dataproc in GCP, following the Medallion Architecture.
Manage Credentials Securely and IAM
Automate Deployment with CI/CD using GitHub/GitHub Actions
Perform Unit Testing for Data Pipelines
Orchestrate Data Pipelines Efficiently using Cloud Composer
Optimizing Bigquery
Code Quality Checks using pre-commit checks

Overview

Prerequisites

  • Understanding of Google Cloud Platform
  • Knowledge of ETL/ELT Processes & Pipeline Management
  • Familiarity with Big Query, Dataproc, PySpark & Python
  • Basic Knowledge of CI/CD Pipelines
  • Experience with Airflow
  • Experience with GitHub/GitHub Actions
Redefining the learning experience

Supercharge Your
Data+AI Teams with us!