Mykhaylo Filip

Home Posts Awards

Major AWS Outage – October 2025

I would first of all like to mention this will be my first current event post on my website. I hope anyone who reads it finds it interesting. :D

On October 20, 2025, Amazon Web Services (AWS) suffered a massive outage in its US‑EAST‑1 region that disrupted hundreds (if not thousands) of websites and services worldwide.

The root cause? A DNS resolution failure in AWS’s DynamoDB service and a cascading chain of dependencies that broke down load‑balancing and compute capacity.

What happened

The issues began around the early hours (US time) with failures in DynamoDB’s DNS system.
Because many other AWS services route through that region and rely on those databases, a domino effect occurred: EC2 instances couldn’t be launched, network load balancers failed, and many client services went offline.
The disruption lasted over 15 hours for full recovery in many areas — a reminder that even large cloud providers are vulnerable.

Why it matters

For someone working in cybersecurity or system administration (like me), the outage illustrates:

The risk of single‑region dependence: even if your system is in another region, if your control plane depends on US‑EAST‑1 you may still be affected.
The importance of multi‑region or multi‑cloud architectures, especially for critical services or infrastructure.
That failures don’t always come from “attackers” — sometimes infrastructure errors cause major damage just as easily.
I was actually doing the New Jersey Space Grand Challange, and some of the rooms/challange wouldnt work during this outage.

My takeaway

This event pushed me to revisit how I deploy lab systems and prepare backups. Things I’m doing:

Checking where my control services run (region, provider).
Ensuring snapshots or backups of critical lab VMs are usable if one region is disrupted.
Practicing resiliency and restoration work so I’m prepared when something fails, not just assuming “it will never happen.”