Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

zBalachandar/Pizza-Sales-Data-Analytics---End-to-End-Azure-Data-Engineering-Production-level-project-01

Folders and files

NameName
Last commit message
Last commit date

Latest commit

History

11 Commits

Repository files navigation

Pizza-Sales-Data-Analytics---End-to-End-Azure-Data-Engineering-Production-level-project1

Banner

    πŸ”§ Analyzing Sales of Pizza sales DataπŸ”Œ

On-prem DB to Azure Cloud Pipeline with Data Factory, Lake Storage, Spark, Databricks, Synapse, PowerBI


πŸ“ Table of Contents

  1. Project Overview
  2. Key Insights
  3. Project Architecture
    3.1. Data Ingestion
    3.2. Data Transformation
    3.3. Data Loading
    3.4. Data Reporting
  4. Credits
  5. Contact

πŸ”¬ Project Overview

This an end-to-end data engineering project on the Azure cloud. Where I did data ingestion from a on-premise SQL Server to Azure Data Lake using Data Factory to transformation using Databricks and Spark, loading to Synapse, and reporting using PowerBI.

πŸ’Ύ Dataset

Dataset link : https://drive.google.com/file/d/1i4aRieq_WDVJDGpqtZq8UW9CH8sCbaBd/view?pli=1

Business Requirement.

image

Project steps to follow:

In this project we are going to create an end to end data platform right from Data Ingestion, Data Transformation, Data Loading and Reporting.

The tools that are covered in this project are,

  1. SQL server migration
  2. Azure Data Factory
  3. Azure Data Lake Storage Gen2
  4. Azure Databricks
  5. PYSPARK
  6. SPARK SQL
  7. Microsoft Power BI

The use case for this project is building an end to end solution by ingesting the tables from on-premise SQL Server database using Azure Data Factory and then store the data in Azure Data Lake. Then Azure databricks is used to transform the RAW data to the most cleanest form of data and finally using Microsoft Power BI to integrate with Azure synapse analytics to build an interactive dashboard. image

🎯 Project Goals

  • Establish a connection between on-premise SQL server and Azure cloud.
  • Ingest tables into the Azure Data Lake.
  • Apply data cleaning and transformation using Azure Databricks.
  • Utilize Azure Synapse Analytics for loading clean data.
  • Create interactive data visualizations and reports with Microsoft Power BI.

πŸ•΅οΈ Key Insights

  • πŸ’Έ Total Revenue by Product Category

  • 🌍 Sales by Pizza Name and size

    • NΒ°1: The L size pizza generated the total revenue of 45%.
    • NΒ°2: The M size pizza generated the total revenue of 30.49%.
  • 🚻 Sales by Pizza category

    • 26.91% of the revenue is generated by Classic pizza category
    • 23.96% of the revenue is generated by Chicken pizza category

This can be explained by males have more interest in doing outdoor activites with the different categories of Bikes than females.

πŸ“ Project Architecture

You can find the detailed information on the diagram below:

image

πŸ“€ Data Ingestion

  • Connected the on-premise SQL Server with Azure using Microsoft Integration Runtime.

  • Setup the Resource group with needed services (Storage Account, Data Factory, Databricks, Synapse Analytics)

  • Migrated the tables from on-premise SQL Server to Azure Data Lake Storage Gen2.

βš™οΈ Data Transformation

  • Mounted Azure Blob Storage to Databricks to retrieve raw data from the Data Lake.
  • Used Spark Cluster in Azure Databricks to clean and refine the raw data And do some aggregations.
  • Saved the cleaned data in a PARQUET format; optimized for further analysis.

πŸ“₯ Data Loading(only required for instant analytics)

  • Used Azure Synapse Analytics to load the refined data efficiently.
  • Created SQL database and connected it to the data lake.

πŸ“Š Data Reporting

  • Connected Microsoft Power BI to Azure Synapse, and used the Views of the DB to create interactive and insightful data visualizations.

PowerBI-dashboard

πŸ› οΈ Technologies Used

  • Data Source: SQL Server
  • Orchestration: Azure Data Factory
  • Ingestion: Azure Data Lake Gen2
  • Storage: Azure Synapse Analytics(if required)
  • Data Visualization: PowerBI

πŸ“‹ Credits

πŸ“¨ Contact Me

LinkedIn β€’ Gmail β€’

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

AltStyle γ«γ‚ˆγ£γ¦ε€‰ζ›γ•γ‚ŒγŸγƒšγƒΌγ‚Έ (->γ‚ͺγƒͺγ‚ΈγƒŠγƒ«) /