SlideShare a Scribd company logo
1
AWS
Outage
2
Featured Speaker
Audil Khan
Technical Solutions Architect
3
Before We Begin...
• If you have any questions, please type them in the Questions window.
• If you have any audio problems, please chat us for help.
• A recording of this presentation will be sent to you in a few days.
3
@ThousandEyes
4
Agenda
• About ThousandEyes
• Key Technical Concepts
• Outage Timelines and Details
• Demo – ThousandEyes Outage Analysis
• Lessons & Takeaways
• Q&A
4
@ThousandEyes
5
Actionable Insight for Internet, Cloud, and SaaS
Correlated Insights
Quickly isolate issues to app, network,
or service
Network Visibility
Overlay, hop-by-hop underlay, ISP
performance, and BGP routing
App Experience
SaaS, API, and internal app
performance and user experience
6
Your Network ISP Cloud Provider
See the Internet Like It’s Your Own Network
Moscow, Russia
Paris, France
Chicago, IL
Visualize the link between network
topologies and service delivery
Rapidly isolate problem
domain and owner
7
ThousandEyes Collective Intelligence
20K+
Vantage
Points
Billions
Daily Path
Measurements
Thousands
Digital
Services
110+
Countries
1100+
Cities
8
ThousandEyes Internet Insights: App Outages
Dev Tools
Communication
Tools
Human
Resources
Social
Networking
Finance eCommerce
Sales &
Marketing
Collaboration
Tools
Top Business SaaS Apps
• Global View of SaaS App Availability
• Accelerated & Empowered IT Operations
• Data-driven Vendor Governance
Key Technical
Concepts
10
Amazon Web Services – At a Glance
• Availability Zones
• Key Components
– EC2 – compute
– S3 – storage
– API Gateway
• Ecosystem
– 200+ services
• US-EAST-1 outsized
interdependency
11
Application Programming Interface (API)
• Enables communication
between disparate
applications/systems
• Increased application
complexity
• Interdependencies and
domino effects
12
Amazon API Gateway
• Gatekeeper for backend
APIs in AWS
• Capable of processing
hundreds of thousands of
concurrent API calls
• AWS offers internal services to
customers via API Gateway
Outage Timelines
and Details
14
12/7 - Event Sequence as Observed by ThousandEyes
1532 UTC –
Outage
Begins
1535 UTC -
Server
Response
Failures
1640 UTC -
AWS Status
Page - First
Mention
1712 UTC –
AWS API
Transaction
Times
Increase
0100 UTC –
Return to
Normal
15
12/7 - Event Sequence from Amazon RCA
1530 UTC –
Multiple
services
impacted due
to congestion
from
automated
activity
1533 UTC –
EC2 API
errors and
increased
latency
1728 UTC –
Internal DNS
remediation,
issues still
persist
Ongoing
network
congestion
remediation
measures
2134 UTC –
Significant
alleviation of
network
congestion
2135 UTC –
Container API
begins to
return to
normal
2222 UTC –
Network
devices and
AWS Console
access “all
clear”
2230 UTC -
Route 53 APIs
"all clear"
2240 UTC -
EC2 "all
clear"
0041 UTC –
API Gateways
recovered
16
12/10 - Event Sequence as Observed by ThousandEyes
1305 UTC –
Outage
Begins
Server
Response
Failures
Brief Clear,
Followed by
Resumption
1430 UTC –
Return to
Normal
Demo
18
Lessons and Takeaways
• Understand your network and application interdependencies
– Front-end interfaces often depend on many back-end APIs
• How does your cloud provider work?
– Understand architecture and interdependencies
– Single AZ, multi-AZ, multi-cloud
– AWS ≠ Azure ≠ GCP
• Inform your Incident Response / Outage Management
– Specific guidance when issues take place
– Example: we’re seeing 2x API responses and it is impacting x, y, z across all zones
• Independent visibility and verification is needed
– Don’t just depend on the status page!
19
@ThousandEyes
Learn
more
Free
Trial /
Demo
Next Steps
• Subscribe! https://blog.thousandeyes.com
• Get a real-time view of the health of the Internet
https://thousandeyes.com/outages
• Sign up for a Free Trial:
https://www.thousandeyes.com/signup
• Request a demo:
https://www.thousandeyes.com/request-demo
Q&A
AWS Outage Analysis

More Related Content

PPTX
Building and Operating Clouds
BMC Software
 
PPTX
Finding application problems before they impact users
CA Application Performance Management (APM)
 
PDF
Cloud Wars: Performance Benchmarking AWS, GCP and Azure
ThousandEyes
 
PDF
AWS security monitoring and compliance validation from Adobe.
Splunk
 
PPSX
RISC Networks CloudScape Product Overview
RISC Networks
 
PDF
Monitoring Apps & Networks in a Cloud-Centric World at Gartner IOSS 2016
ThousandEyes
 
PPTX
Simplify Cloud Migration to AWS with RISC Network’s Complete App Analysis
RISC Networks
 
PPTX
Scale Your Load Balancer from 0 to 1 million TPS on Azure
Avi Networks
 
Building and Operating Clouds
BMC Software
 
Finding application problems before they impact users
CA Application Performance Management (APM)
 
Cloud Wars: Performance Benchmarking AWS, GCP and Azure
ThousandEyes
 
AWS security monitoring and compliance validation from Adobe.
Splunk
 
RISC Networks CloudScape Product Overview
RISC Networks
 
Monitoring Apps & Networks in a Cloud-Centric World at Gartner IOSS 2016
ThousandEyes
 
Simplify Cloud Migration to AWS with RISC Network’s Complete App Analysis
RISC Networks
 
Scale Your Load Balancer from 0 to 1 million TPS on Azure
Avi Networks
 

What's hot (14)

PDF
Introduction to RightScale
Akelios
 
PDF
Monitoring IPv6 Networks
ThousandEyes
 
PDF
Petabytes and Nanoseconds
Robert Greiner
 
PDF
Managing Network Performance Within and Beyond Your Enterprise
ThousandEyes
 
PDF
Network monitoring for the modern wan webinar
ThousandEyes
 
PDF
Keynote: Customer Journey with Streaming Data on AWS - Rahul Pathak, AWS
Flink Forward
 
PPTX
Identifying Workloads to Move to the Cloud
RightScale
 
PPTX
RightScale Webinar: Best Practices: Software Development Strategies Using Win...
RightScale
 
PDF
Tagging Best Practices for Cloud Governance
RightScale
 
PDF
Next Level Digital Media with Alibaba Cloud (Part 2)
Alibaba Cloud
 
PDF
How to Allocate and Report Cloud Costs with RightScale Optima
RightScale
 
PDF
Hybrid Cloud Orchestration: How SuperChoice Does It
RightScale
 
PDF
Multi-Cloud Management with RightScale CMP (Demo)
RightScale
 
PDF
Automating Multi-Cloud Policies for AWS, Azure, Google, and More
RightScale
 
Introduction to RightScale
Akelios
 
Monitoring IPv6 Networks
ThousandEyes
 
Petabytes and Nanoseconds
Robert Greiner
 
Managing Network Performance Within and Beyond Your Enterprise
ThousandEyes
 
Network monitoring for the modern wan webinar
ThousandEyes
 
Keynote: Customer Journey with Streaming Data on AWS - Rahul Pathak, AWS
Flink Forward
 
Identifying Workloads to Move to the Cloud
RightScale
 
RightScale Webinar: Best Practices: Software Development Strategies Using Win...
RightScale
 
Tagging Best Practices for Cloud Governance
RightScale
 
Next Level Digital Media with Alibaba Cloud (Part 2)
Alibaba Cloud
 
How to Allocate and Report Cloud Costs with RightScale Optima
RightScale
 
Hybrid Cloud Orchestration: How SuperChoice Does It
RightScale
 
Multi-Cloud Management with RightScale CMP (Demo)
RightScale
 
Automating Multi-Cloud Policies for AWS, Azure, Google, and More
RightScale
 
Ad

Similar to AWS Outage Analysis (20)

PPTX
Netflix0SS Services on Docker
Docker, Inc.
 
PPTX
Ibm cloud nativenetflixossfinal
aspyker
 
PDF
AWS Architecture Fundamentals - Houston
Nicole Maus
 
PDF
New ThousandEyes Product Features and Release Highlights: November 2023
ThousandEyes
 
PPTX
App-First & Cloud-Native: How InterMiles Boosted CX with AWS & Infostretch
Infostretch
 
PPTX
The Top Outages of 2022: Analysis and Takeaways
ThousandEyes
 
PPTX
The Top Outages of 2022: Analysis and Takeaways
ThousandEyes
 
PPTX
Cloud Services Powered by IBM SoftLayer and NetflixOSS
aspyker
 
PPTX
EMEA.23.02.23_Top_Outages_of_2022_Webinar_Slides.pptx
ThousandEyes
 
PPTX
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
Adrian Cockcroft
 
PDF
How to Build a Big Data Application: Serverless Edition
ecobold
 
PPTX
NetflixOSS for Triangle Devops Oct 2013
aspyker
 
PDF
Networking @Scale'19 - Getting a Taste of Your Network - Sergey Fedorov
Sergey Fedorov
 
PDF
How to Build a Big Data Application: Serverless Edition
Lecole Cole
 
PPTX
Introduction to ThousandEyes
ThousandEyes
 
PPTX
Introduction to ThousandEyes
ThousandEyes
 
PPTX
Introduction to ThousandEyes
ThousandEyes
 
PPTX
Introduction to ThousandEyes
ThousandEyes
 
PPTX
Introduction to ThousandEyes
ThousandEyes
 
PDF
AWS User Group Sydney - Meetup #60
PolarSeven Pty Ltd
 
Netflix0SS Services on Docker
Docker, Inc.
 
Ibm cloud nativenetflixossfinal
aspyker
 
AWS Architecture Fundamentals - Houston
Nicole Maus
 
New ThousandEyes Product Features and Release Highlights: November 2023
ThousandEyes
 
App-First & Cloud-Native: How InterMiles Boosted CX with AWS & Infostretch
Infostretch
 
The Top Outages of 2022: Analysis and Takeaways
ThousandEyes
 
The Top Outages of 2022: Analysis and Takeaways
ThousandEyes
 
Cloud Services Powered by IBM SoftLayer and NetflixOSS
aspyker
 
EMEA.23.02.23_Top_Outages_of_2022_Webinar_Slides.pptx
ThousandEyes
 
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
Adrian Cockcroft
 
How to Build a Big Data Application: Serverless Edition
ecobold
 
NetflixOSS for Triangle Devops Oct 2013
aspyker
 
Networking @Scale'19 - Getting a Taste of Your Network - Sergey Fedorov
Sergey Fedorov
 
How to Build a Big Data Application: Serverless Edition
Lecole Cole
 
Introduction to ThousandEyes
ThousandEyes
 
Introduction to ThousandEyes
ThousandEyes
 
Introduction to ThousandEyes
ThousandEyes
 
Introduction to ThousandEyes
ThousandEyes
 
Introduction to ThousandEyes
ThousandEyes
 
AWS User Group Sydney - Meetup #60
PolarSeven Pty Ltd
 
Ad

More from ThousandEyes (20)

PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PPTX
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
PDF
2025-07-15 EMEA Volledig Inzicht Dutch Webinar
ThousandEyes
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PPTX
End-to-end Assurance for SD-WAN & SASE with ThousandEyes
ThousandEyes
 
PPTX
Assurance Best Practices: Unlocking Proactive Network Operations
ThousandEyes
 
PPTX
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
PPTX
How to Optimize Your AWS Environment for Improved Cloud Performance
ThousandEyes
 
PPTX
Assuring Your SD-WAN to Deliver Unparalleled Digital Experiences
ThousandEyes
 
PPTX
Assuring Your SD-WAN to Deliver Unparalleled Digital Experiences
ThousandEyes
 
PPTX
How to Optimize Your AWS Environment for Improved Cloud Performance
ThousandEyes
 
PPTX
How to Optimize Your AWS Environment for Improved Cloud Performance
ThousandEyes
 
PPTX
Introduction to ThousandEyes platform March 2025
ThousandEyes
 
PPTX
What's New? ThousandEyes Product Features and Highlights for February 2025
ThousandEyes
 
PPTX
AMER Introduction to ThousandEyes Webinar
ThousandEyes
 
PPTX
What's New? ThousandEyes Product Features and Highlights
ThousandEyes
 
PPTX
Top Outages of 2024, Explained: Lessons in Digital Resilience
ThousandEyes
 
PPTX
Top Outages of 2024, Explained: Lessons in Digital Resilience
ThousandEyes
 
PPTX
Top Outages of 2024, Explained: Lessons in Digital Resilience
ThousandEyes
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
2025-07-15 EMEA Volledig Inzicht Dutch Webinar
ThousandEyes
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
End-to-end Assurance for SD-WAN & SASE with ThousandEyes
ThousandEyes
 
Assurance Best Practices: Unlocking Proactive Network Operations
ThousandEyes
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
How to Optimize Your AWS Environment for Improved Cloud Performance
ThousandEyes
 
Assuring Your SD-WAN to Deliver Unparalleled Digital Experiences
ThousandEyes
 
Assuring Your SD-WAN to Deliver Unparalleled Digital Experiences
ThousandEyes
 
How to Optimize Your AWS Environment for Improved Cloud Performance
ThousandEyes
 
How to Optimize Your AWS Environment for Improved Cloud Performance
ThousandEyes
 
Introduction to ThousandEyes platform March 2025
ThousandEyes
 
What's New? ThousandEyes Product Features and Highlights for February 2025
ThousandEyes
 
AMER Introduction to ThousandEyes Webinar
ThousandEyes
 
What's New? ThousandEyes Product Features and Highlights
ThousandEyes
 
Top Outages of 2024, Explained: Lessons in Digital Resilience
ThousandEyes
 
Top Outages of 2024, Explained: Lessons in Digital Resilience
ThousandEyes
 
Top Outages of 2024, Explained: Lessons in Digital Resilience
ThousandEyes
 

Recently uploaded (20)

PDF
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
PDF
Brief History of Internet - Early Days of Internet
sutharharshit158
 
PDF
Peak of Data & AI Encore - Real-Time Insights & Scalable Editing with ArcGIS
Safe Software
 
PDF
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
PDF
AI-Cloud-Business-Management-Platforms-The-Key-to-Efficiency-Growth.pdf
Artjoker Software Development Company
 
PDF
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
PDF
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
PDF
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
PPTX
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
PDF
How ETL Control Logic Keeps Your Pipelines Safe and Reliable.pdf
Stryv Solutions Pvt. Ltd.
 
PPTX
Agile Chennai 18-19 July 2025 | Emerging patterns in Agentic AI by Bharani Su...
AgileNetwork
 
PDF
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
PPTX
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
PDF
The Future of Artificial Intelligence (AI)
Mukul
 
PDF
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
PPTX
Introduction to Flutter by Ayush Desai.pptx
ayushdesai204
 
PDF
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
PPTX
The Future of AI & Machine Learning.pptx
pritsen4700
 
PDF
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
PDF
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
Brief History of Internet - Early Days of Internet
sutharharshit158
 
Peak of Data & AI Encore - Real-Time Insights & Scalable Editing with ArcGIS
Safe Software
 
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
AI-Cloud-Business-Management-Platforms-The-Key-to-Efficiency-Growth.pdf
Artjoker Software Development Company
 
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
How ETL Control Logic Keeps Your Pipelines Safe and Reliable.pdf
Stryv Solutions Pvt. Ltd.
 
Agile Chennai 18-19 July 2025 | Emerging patterns in Agentic AI by Bharani Su...
AgileNetwork
 
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
The Future of Artificial Intelligence (AI)
Mukul
 
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
Introduction to Flutter by Ayush Desai.pptx
ayushdesai204
 
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
The Future of AI & Machine Learning.pptx
pritsen4700
 
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 

AWS Outage Analysis

  • 3. 3 Before We Begin... • If you have any questions, please type them in the Questions window. • If you have any audio problems, please chat us for help. • A recording of this presentation will be sent to you in a few days. 3 @ThousandEyes
  • 4. 4 Agenda • About ThousandEyes • Key Technical Concepts • Outage Timelines and Details • Demo – ThousandEyes Outage Analysis • Lessons & Takeaways • Q&A 4 @ThousandEyes
  • 5. 5 Actionable Insight for Internet, Cloud, and SaaS Correlated Insights Quickly isolate issues to app, network, or service Network Visibility Overlay, hop-by-hop underlay, ISP performance, and BGP routing App Experience SaaS, API, and internal app performance and user experience
  • 6. 6 Your Network ISP Cloud Provider See the Internet Like It’s Your Own Network Moscow, Russia Paris, France Chicago, IL Visualize the link between network topologies and service delivery Rapidly isolate problem domain and owner
  • 7. 7 ThousandEyes Collective Intelligence 20K+ Vantage Points Billions Daily Path Measurements Thousands Digital Services 110+ Countries 1100+ Cities
  • 8. 8 ThousandEyes Internet Insights: App Outages Dev Tools Communication Tools Human Resources Social Networking Finance eCommerce Sales & Marketing Collaboration Tools Top Business SaaS Apps • Global View of SaaS App Availability • Accelerated & Empowered IT Operations • Data-driven Vendor Governance
  • 10. 10 Amazon Web Services – At a Glance • Availability Zones • Key Components – EC2 – compute – S3 – storage – API Gateway • Ecosystem – 200+ services • US-EAST-1 outsized interdependency
  • 11. 11 Application Programming Interface (API) • Enables communication between disparate applications/systems • Increased application complexity • Interdependencies and domino effects
  • 12. 12 Amazon API Gateway • Gatekeeper for backend APIs in AWS • Capable of processing hundreds of thousands of concurrent API calls • AWS offers internal services to customers via API Gateway
  • 14. 14 12/7 - Event Sequence as Observed by ThousandEyes 1532 UTC – Outage Begins 1535 UTC - Server Response Failures 1640 UTC - AWS Status Page - First Mention 1712 UTC – AWS API Transaction Times Increase 0100 UTC – Return to Normal
  • 15. 15 12/7 - Event Sequence from Amazon RCA 1530 UTC – Multiple services impacted due to congestion from automated activity 1533 UTC – EC2 API errors and increased latency 1728 UTC – Internal DNS remediation, issues still persist Ongoing network congestion remediation measures 2134 UTC – Significant alleviation of network congestion 2135 UTC – Container API begins to return to normal 2222 UTC – Network devices and AWS Console access “all clear” 2230 UTC - Route 53 APIs "all clear" 2240 UTC - EC2 "all clear" 0041 UTC – API Gateways recovered
  • 16. 16 12/10 - Event Sequence as Observed by ThousandEyes 1305 UTC – Outage Begins Server Response Failures Brief Clear, Followed by Resumption 1430 UTC – Return to Normal
  • 17. Demo
  • 18. 18 Lessons and Takeaways • Understand your network and application interdependencies – Front-end interfaces often depend on many back-end APIs • How does your cloud provider work? – Understand architecture and interdependencies – Single AZ, multi-AZ, multi-cloud – AWS ≠ Azure ≠ GCP • Inform your Incident Response / Outage Management – Specific guidance when issues take place – Example: we’re seeing 2x API responses and it is impacting x, y, z across all zones • Independent visibility and verification is needed – Don’t just depend on the status page!
  • 19. 19 @ThousandEyes Learn more Free Trial / Demo Next Steps • Subscribe! https://blog.thousandeyes.com • Get a real-time view of the health of the Internet https://thousandeyes.com/outages • Sign up for a Free Trial: https://www.thousandeyes.com/signup • Request a demo: https://www.thousandeyes.com/request-demo
  • 20. Q&A