A European fintech was hemorrhaging 28,000ドル monthly on AWS for processing 2.3M transactions. Six months later, they were spending 9,800ドル for the same workload with better performance. Here's the engineering breakdown.
The problem: classic cloud cost spiral
The fintech ran 40 microservices across AWS with PCI DSS and GDPR requirements. Their architecture looked standard on paper, but the monthly bills told a different story.
Compute waste everywhere:
- 60 EC2 instances running 24/7
- CPU utilization: 23% peak, 8% overnight
- Only 30% reserved instances (paying on-demand for predictable workloads)
Storage bleeding money:
- 2.4TB monthly PostgreSQL logs with no retention
- 800GB application logs stored indefinitely
- 15TB of accumulated EBS snapshots
Network transfer costs:
- 3,200ドル/month in cross-AZ microservices chatter
- NAT gateway charges for external API calls
The kicker? Their workloads were completely predictable. Payment processing peaked 9 AM to 6 PM weekdays. Fraud detection ran nightly batches. Customer onboarding spiked during monthly marketing campaigns.
The solution: sovereign open source stack
Instead of AWS optimization theater, we built a dedicated stack using:
-
Proxmox: Virtualization and cluster management
-
Ceph: Distributed storage with built-in redundancy
-
OpenStack: Cloud APIs without vendor lock-in
-
Kubernetes: Efficient resource sharing
Implementation highlights
Hardware foundation:
6 bare-metal servers in Frankfurt: 64 cores, 256GB RAM, 4TB NVMe each.
Smart Ceph storage tiering:
# Hot transaction data on NVMe
ceph osd pool create transactions 128 128 replicated
ceph osd pool set transactions size 3
# Cold analytics data with erasure coding
ceph osd pool create analytics 64 64 erasure
ceph osd erasure-code-profile set ec-profile k=4 m=2
Resource-aware Kubernetes scheduling:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: payment-api-hpa
spec:
minReplicas: 2
maxReplicas: 12
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
Migration strategy:
Built in parallel, migrated non-critical services first, then payment processing during a 47-minute maintenance window using PostgreSQL logical replication.
Results that matter
Performance improvements:
- API response times: 180ms → 95ms average
- Same 99.95% uptime SLA maintained
- Sub-200ms latency requirements exceeded
Cost breakdown:
- Before: 28,000ドル/month on AWS
- After: 9,800ドル/month total (4,200ドル hardware + 3,200ドル managed services)
- 65% cost reduction
Operational wins:
- No vendor lock-in
- Full EU data residency
- Predictable monthly costs
- Better resource utilization (65% average vs 23%)
Key takeaways for engineers
-
Audit first: Most "scaling" problems are resource waste problems
-
Predictable workloads don't need cloud premium: If you can forecast it, you can right-size it
-
Open source infrastructure scales: Proxmox + Ceph + K8s handles enterprise workloads
-
Migration risk is manageable: Parallel builds beat big-bang deployments
The real lesson? Sometimes the best cloud optimization is leaving the cloud entirely.
Originally published on binadit.com