A widespread disruption at Amazon’s cloud computing service led to app and website interruptions globally on Monday, impacting customers and businesses. Amazon Web Services confirmed the outage, which commenced early in the morning, was resolved by Monday evening.
Amazon Web Services, known as AWS, is Amazon’s cloud services division that offers digital services to various entities. Shion Guha, a human-centered data science professor at the University of Toronto, highlighted the significance of AWS in Amazon’s revenue stream, noting its centralized server network and software offerings attract businesses.
Businesses utilize AWS for web hosting and computing power for data analysis. Amazon dominates the cloud services market, holding more than 41% share, surpassing competitors like Google and Microsoft.
The outage on Monday morning affected popular social media platforms such as Snapchat and Reddit, as well as gaming sites like Fortnite and financial services including Venmo and Chime. Additionally, food delivery apps, rideshare services, and communication platforms like Signal and video streaming services like Netflix and Disney+ were impacted.
Canadian users were also affected as telecom firms and government agencies rely on AWS infrastructure. The outage caused disruptions in services like bill payments and license renewals. The incident garnered over 13 million reports globally on Downdetector, with a significant number from Canada.
The root cause was traced to a data center cluster in northern Virginia, affecting AWS’s DynamoDB API due to a Domain Name System (DNS) issue. DNS translates IP addresses into domain names for web access, and when disrupted, services cannot communicate.
This was not the first incident at AWS’s US-EAST-1 cluster, with previous outages affecting internet services. Following hours of disruptions, most applications gradually resumed operation before AWS officially resolved the issue in the evening. The problem’s origin could range from physical issues like wiring damage to software glitches or cyberattacks, requiring a meticulous resolution process.
Cybersecurity experts like Patrick Burgess indicated the outage was likely a technology fault rather than a cyberattack, with established procedures for managing such incidents at AWS, Google, and Microsoft, typically concluding within hours.
