.webp&w=3840&q=90)
Industry
retail-and-cpg
e-commerce
Skills
cloud-management
data-understanding
data-storage
batch-etl
programming
data-wrangling
approach
data-modelling
data-quality
code-versioning
git-version-control
problem-understanding
performance-tuning
Tools
google-cloud
spark
sql
airflow
github
databricks
Learning Objectives
Design and implement an ETL/ELT pipeline using Dataproc in GCP, following the Medallion Architecture.
Manage Credentials Securely and IAM
Automate Deployment with CI/CD using GitHub/GitHub Actions
Perform Unit Testing for Data Pipelines
Orchestrate Data Pipelines Efficiently using Cloud Composer
Optimizing Bigquery
Code Quality Checks using pre-commit checks
Overview
Prerequisites
- Understanding of Google Cloud Platform
- Knowledge of ETL/ELT Processes & Pipeline Management
- Familiarity with Big Query, Dataproc, PySpark & Python
- Basic Knowledge of CI/CD Pipelines
- Experience with Airflow
- Experience with GitHub/GitHub Actions
