Enqurious logo
Go back

Stream Processing through Databricks on Azure

4 Scenarios
12 Hours
project poster
Industry
e-commerce
Skills
stream-etl
data-understanding
data-storage
data-wrangling
approach
data-quality
Tools
azure
databricks
spark

Learning Objectives

Grasp the basics of real-time data ingestion and processing with streaming tools.
Build a data pipeline that processes and stores streaming data efficiently.
Manage stateful and stateless operations in a streaming environment
Use output modes to manage how and when data is written during streaming.
Implement watermarking to handle late-arriving data in real-time streams.
Perform real-time joins between multiple data streams for deeper analysis.

Overview

Globalmart, an ecommerce startup, faces challenges with data inaccuracies, schema inconsistencies, and a lack of trust in data systems from stakeholders. What measures are necessary to address and resolve these issues?


GlobalMart is a startup revolutionizing the shopping experience for its customers, both in the retail landscape and the online marketplace. As GlobalMart continues to expand, it is increasingly relying on data-driven decision making.

For GlobalMart to be data-driven, the stakeholders needs to be provided with accurate and refreshed data. Unfortunately, this has become a great challenge and bottleneck. The journey that started as a way to enhance operational efficiency and decision-making is now leading lot of friction between stakeholders.

Globalmart is now faced with following challenges

  • Data Silos and Absence of a Single Source of Truth
  • Data Inconsistency and Quality Issues
  • Lack of Access Control and Compliance Challenges
  • Complex and Time-Consuming Data Transformation Processes
  • Unclear Data Location and Origin Leading to Redundancy

These issues led to lack of trust in data systems rendering them useless. In this project you will be spending time to implement the following architecture that addresses all the problems that Globalmart is currently facing in their data systems

Image

Prerequisites

  • Understanding the need for real-time data ingestion.
  • Familiarity with streaming data sources like Event Hubs and methods for ingesting data.
  • Basic knowledge of Apache Spark
Redefining the learning experience

Supercharge Your
Data+AI Teams with us!