I am a data-first professional specializing in enterprise-grade cloud architectures, pipeline orchestration, and compute optimization. I build highly reliable reporting systems that accelerate downstream analytics while keeping infrastructure costs lean.
What I do:
- 🚀 Building Pipelines: Developed and maintained ETL pipelines using PySpark and Apache Airflow, reliably processing 250GB+ of daily financial and transaction data while consistently meeting upstream and downstream SLAs.
- 🛡️ Ensuring Reliability: Decreased pipeline failures by 25% by writing rigorous data quality checks with Great Expectations, ensuring data integrity for critical KYC/AML and merchant analytics reporting.
- 💡 Optimizing Compute: Optimized complex PySpark jobs and implemented efficient table partitioning, reducing daily batch processing times by 20% and lowering cluster compute costs.
- 🌱 Currently Exploring: Deepening my knowledge in Databricks Unity Catalog for enhanced data governance and exploring streaming data architectures with Apache Kafka.
Languages & Libraries
Python
Pandas
NumPy
SQL
Big Data & Data Processing
Apache Spark
PySpark
Databricks
Cloud Infrastructure
Azure Data Lake Storage
Azure Data Factory
Azure Databricks
AWS Databricks
Visualization
Power BI
Advanced Excel
Domain Knowledge
Fintech
Payments
BFSI
Merchant Analytics
KYC/AML
| Project | Tools | Description |
|---|---|---|
| 💳 Azure Card Transaction ETL Pipeline | ADF ADLS Databricks PySpark |
End-to-end ETL pipeline for Master/Visa transaction data |
| 🏦 Customer Churn Analysis | SQL Python Power BI |
Churn driver analysis for banking customers |
| 🛒 Merchant Analytics Dashboard | SQL Python Power BI |
Payment gateway and settlement performance tracking |
| 📈 Banking KPI Dashboard | SQL Power BI Excel |
Executive CASA and loan disbursement reporting |