Once I had the basic flow working, I launched the Chaos Dashboard, clicked the "DynamoDB Error" experiment... and watched my app fall apart. Literally.
The Lambda function that calls PutItemAsync in DynamoDB started throwing internal server errors like confetti.
What Went Wrong (And Why That’s Good)
This wasn’t a polished app. It had:
- No retries
- No fallback logic
- No graceful error handling
And I got to see all of that immediately once the chaos kicked in.
Takeaway:
If your app can’t tolerate even a brief hiccup from a cloud service, it’s not ready for prod. And without chaos testing, you might not know until it’s too late.
My Favorite Part: Testing in LocalStack > Prod Nightmares
What I loved about this:
- I didn’t need to spin up a real AWS environment
- I didn’t risk breaking anything critical
- I could toggle failure modes with a literal switch
I will be adding retries and proper logging soon. But even just running this gave me a new perspective on what "robust" actually means.
Turning This into a Talk
This experiment started as a blog post, but it’s shaping up to be something bigger. I’m putting together a lightning talk based on this journey—equal parts meme app, chaos engineering, and "what not to do."
My goal?
Take this talk on the road to show other devs that you don’t need to be Netflix to start building resilient systems. You just need a curious mind, a busted app, and some good local chaos.
Coming Soon: More Chaos. More Learning.
Next up, I want to test how my app handles:
- S3 outages (can users still vote on memes?)
- Lambda timeouts (will the frontend hang or recover?)
- Multi-service chaos (because AWS failures don’t happen in isolation)
Still building. Still breaking. Still blogging.
If you’ve dabbled in chaos engineering, or you’re curious about trying it, I’d love to hear about it!
Hit me up on Twitter or LinkedIn and let’s swap stories.
Until next week 👋🏾