Production-grade, serverless AWS data pipeline simulating large-scale autonomous-vehicle telemetry-processing at fleet scale. This repository demonstrates end-to-end ingestion, Distributed Stream Triage, Columnar Storage Optimization with Apache Parquet, and Data Lakehouse Partitioning — with future Terraform Infrastructure as Code (IaC) modules.
docker-compose autonomous-vehicles data-quality apache-parquet etl-pipeline pyarrow aws-data-pipeline data-contracts pandera data-lakehouse terraform-infrastructure telemetry-processing hive-partitioning python-data-engineering-etl-data-cleaning serverless-etl python-data-engineering defensive-data-engineering containerized-etl
-
Updated
Jun 1, 2026 - Python