GitHub stars License Docker Docker Compose PostgreSQL MinIO Apache Superset Apache Parquet Python Prometheus StatsD Grafana Apache Airflow
A comprehensive data engineering platform for e-commerce including:
- π§© Synthetic data generation with Python Faker
- ποΈ ETL Pipeline orchestrated by Apache Airflow
- π¦ Structured Data Lake (Bronze/Silver/Gold)
- π Interactive dashboards with Apache Superset
- π Real-time monitoring via Grafana
| Component | Technologies | Emoji |
|---|---|---|
| Orchestration | Apache Airflow, Docker | βοΈ |
| Storage | MinIO, PostgreSQL, Parquet | πΎ |
| Transformation | Python, Polars | π |
| Visualization | Apache Superset | π |
| Monitoring | Prometheus, Grafana, cAdvisor | π |
Relational structure optimized for transactions
| Characteristic | Details |
|---|---|
| Type | Relational (PostgreSQL) |
| Tables | - users - addresses - categories - products - orders - order_items - payments - shipments - reviews - product_views |
| Indexes | - idx_orders_user_id - idx_orders_billing_address_id - idx_orders_shipping_address_id - idx_addresses_user_id - idx_order_items_order_id - idx_order_items_product_id - idx_payments_order_id - idx_shipments_order_id - idx_reviews_user_id - idx_reviews_product_id - idx_product_views_user_id - idx_product_views_product_id |
| Optimization | Normalization, Referential integrity constraints |
OLAP Schema
Star schema for business analysis
| Characteristic | Details |
|---|---|
| Type | Data Warehouse (PostgreSQL) |
| Schema | Star Schema |
| Tables | Fact_Sales, Fact_User_Activity, Fact_Product_Performance, Fact_Payment_Analytics, Dim_Products, Dim_Time, Dim_Geography, Dim_User, Dim_Payment_Method |
| Indexes | - idx_fact_sales_time - idx_fact_sales_product - idx_fact_user_geo - idx_fact_payment_method - idx_geography_country - idx_geography_city - idx_product_category - idx_user_registration |
Synthetic data generation workflow with Python Faker
Complete data flow from source to dashboards
ETL task management with Apache Airflow
| Step | Tools | Output |
|---|---|---|
| Extraction | Faker, PostgreSQL | ποΈ Bronze Layer (MinIO) |
| Transformation | Polars, Python | π§Ή Silver Layer (Parquet) |
| Loading | SQL, dbt | π Gold Layer (PostgreSQL) |
Real-time business KPIs with Apache Superset
| Metric | Tool | Emoji |
|---|---|---|
| Sales | Superset | π |
| Performance | Grafana | π |
| Logs | Prometheus | π |
Container and metrics monitoring
| Component | Function | Dashboard |
|---|---|---|
| cAdvisor | Docker Monitoring | Grafana cAdvisor |
| Postgres-Exporter | PostgreSQL Metrics | Grafana Postgres |
| StatsD-Exporter | Airflow Metrics | Grafana Airflow |
| MinIo Server | MinIO Metrics | Grafana MinIO |
| Component | Minimum | Recommended |
|---|---|---|
| CPU | 4 cores | 8 cores |
| RAM | 8GB | 16GB |
| Storage | 50GB SSD | 100GB NVMe |
git clone https://github.com/abrahamkoloboe27/E-Commerce-Data-Pipeline-And-Dashboard-With-Apache-Superset cd e-commerce-pipeline make build # Build Docker images make up # Start containers make build-up # Build and start containers make down # Stop and remove containers make down-volumes # Remove containers and volumes make down-volumes-build-up # Remove containers, volumes, and build new images
| Service | URL | Credentials | Port |
|---|---|---|---|
| Airflow | http://localhost:8080 | admin/admin | 8080 |
| MinIO | http://localhost:9001 | minioadmin/minioadmin | 9001 |
| Superset | http://localhost:8088 | admin/admin | 8088 |
| Grafana | http://localhost:3000 | grafana/grafana | 3000 |
| Feature | Technology | Benefit |
|---|---|---|
| Hierarchical Data Lake | MinIO + Parquet | π·οΈ Raw/transformed data structuring |
| Modular ETL | Airflow + Python | π Workflow reproducibility |
| Unified Monitoring | Grafana + Prometheus | π 360Β° performance view |
π License: MIT
π§ Contact: abklb27@gmail.com
π¨π» Author: Abraham Koloboe
β¬ Back to top
β¨ Made with passion for data engineering!