Enqurious logo
Back to project

Design and Implement Reliable ETL pipeline for WePlay Sports using Databricks

sports
azure
adls
pyspark
databricks
sql
partitioning
data-governance
data-wrangling
etl-pipeline
architecture-patterns
Design and Implement Reliable ETL pipeline for WePlay Sports using Databricks project poster
This project includes
6 activities

Learning objective

  • Design and Implement Data Storage
  • Design and Implement Data Processing solution
  • Design and Implement Data Governance solution

Overview

This Project will help in validation of skills in working and building reliable pipeline solution on cloud

Story

WePlay is a Sports Analytics company that provides data solutions to a lot of their clients. The clientele of WePlay is typically Sports clubs across the world. They offer data-driven strategies to these sports clubs. They have recently started providing similar solutions to IPL teams that help the team manager and team captain decide on a strategy for the upcoming season. These clubs would like to see the power of the solutions before implementing full length.

 

Congratulations You and your team have been assigned to build a POC for WePlay. Demonstrating a quality outcome that will help your firm win a very high-value contract from IPL clubs

  • WePlay gathered a lot of data from past matches as a data dump and made it available to you
  • WePlay recently has also started monitoring real-time match data across their network such that they can give real-time insights to change the play on the go
  • The structure and schema of the data across sources vary and hence have to be clearly investigated clearly
  • WePlay Management asked you to build a reliable and scalable data solution and demonstrate a POC (proof of concept) before moving on to a full-length pipeline implementation.
  • WePlay insisted that they want a cost-optimal solution without a lot of technology overhead and if possible a maximum use of unified platform solutions
  • They have also highlighted that they need all the data assets discoverable, easy to access, have fine access control, and be highly secure.