Amazon Web Services (AWS) was down today and yesterday, Monday, October 22, 2012 in North Virginia. The outage has caused problems with Reddit, GitHub, imgur, Pocket, HipChat, Coursera, Heroku, Minecraft, Pinterest, Foursquare, Airbnb and others are down or not operating at maximum capacity — delays and connectivity issues..
What started as a small issue affecting some instances of Amazon’s Elastic Cloud Compute (EC2) in North Virginia, eventually became a full-blown outage of AWS.
Some services have gone down, come back up, and gone back down again in the past 36 hours.
Airbnb is an online service that matches people seeking vacation rentals and other short-term accommodations with hosts who have an unused space to rent, generally private parties that are not professional hoteliers.
Foursquare is a location-based social networking website for mobile devices.
Heroku is a cloud platform as a service (PaaS) supporting several programming languages. According to their Twitter account there API (Application Program Interface) was disabled and then set to read-only mode.
Heroku currently reports elevated error rates for both the production service and the development service. Many smaller services are affected by the outage.
GitHub is a web-based hosting service (social coding) for software development projects that use the Git revision control system — a distributed revision control and source code management (SCM) system with an emphasis on speed.
HipChat is an application service provider that launched in January 2010 that allows users — for a fee — to create and participate in chat room and send one-on-one messages with SMS. HipChat is market as a “Business Collaboration” service.
Minecraft is a video game for PC, Android, and iOS.
Pinterest is a pinboard-style social photo collecting and sharing website.
Pocket (getpocket.com), previously known as Read It Later, is an application for managing a reading list of articles from the Internet. Pocket is available for the iOS, Android, and BlackBerry OS
Many other smaller or lesser known websites that use Amazon Web Services were also down. Amazon’s RDS database instances and Elastic Beanstalk were also down in North Virginia.
Amazon AWS Message About the Outage
22nd Oct 11:03 AM PDT We are currently experiencing connectivity issues and degraded performance for a small number of RDS DB Instances in a single Availability Zone in the US-EAST-1 Region.
22nd Oct 11:45 AM PDT A number of Amazon RDS DB Instances in a single Availability Zone in the US-EAST-1 Region are experiencing connectivity issues or degraded performance. New instance create requests in the affected Availability Zone are experiencing elevated latencies. We are investigating the root cause.
22nd Oct 12:53 PM PDT We have recovered a number of affected RDS instances and are working on recovering the remaining impacted RDS DB instances a single Availability Zone in the US-EAST-1 Region. New instance creation requests in the affected Availability Zone continue to experience elevated latencies.
22nd Oct 2:48 PM PDT We continue to work to resolve the issue affecting RDS instances in a single availability zone in the US-EAST-1 region. We have recovered the majority of the RDS DB instances in the affected AZ and are working to recover the remaining affected DB instances. RDS DB instances in other availability zones in the region are operating normally.Customers with automated backups turned on for an affected Database Instance do have the option of initiating a Point-in-Time Restore operation. This will launch a new Database Instance using a backup of the affected Database Instance from before the event.
To do this, follow these steps:
1) Log into the AWS Management console
2) Access the RDS tab, and select DB Instances on the left-side navigation
3) Select the affected database instance
4) Click on the “Restore to Point in Time” button
5) Select “Use Latest Restorable Time
6) Select a DB instance class that is at least the same size as the original DB instance
7) Make sure “No Preference” is selected for Availability Zone
8) Launch the DB Instance and connect your application.
22nd Oct 3:51 PM PDT We are making steady progress in recovering the affected RDS instances in a single availability zone in the US-EAST-1 region. RDS instances in other availability zones in the region are operating normally. Customers with automated backups turned on for an affected Database Instance do have the option of initiating a Point-in-Time Restore operation as per our previous post.
22nd Oct 5:37 PM PDT We have recovered almost all of the affected Multi-AZ RDS instances and we are making good progress in recovering the remaining Multi-AZ and Single-AZ instances in the affected availability zone in the US-EAST-1 region. RDS instances in other availability zones in the region continue to operate normally.
22nd Oct 7:35 PM PDT We continue to make progress bringing the remaining RDS instances back on-line in the affected AZ. Customers can launch new database instances. RDS instances in other unaffected AZs in the region continue to operate normally.
22nd Oct 9:10 PM PDT We are continuing to bring the RDS instances back on-line in the affected AZ. As noted before, customers with impaired RDS instances do have the option to initiate a Point-in-Time Restore operation or launch new database instances. RDS instances in other unaffected AZs in the region continue to operate normally.
23rd Oct 12:09 AM PDT We continue to make steady progress in restoring connectivity to the remaining DB Instances in the affected AZ. As a reminder, customers with automated backups turned on for an affected DB Instance have the option of initiating a Point-in-Time Restore operation. This will launch a new DB Instance using a backup of the affected DB Instance from before the event. To do this, follow these steps:1) Log into the AWS Management console
2) Access the RDS tab, and select DB Instances on the left-side navigation
3) Select the affected database instance
4) Click on the “Restore to Point in Time” button
5) Select “Use Latest Restorable Time
6) Select a DB instance class that is at least the same size as the original DB instance
7) Make sure “No Preference” is selected for Availability Zone
8) Launch the DB Instance and connect your application.
23rd Oct 2:25 AM PDT Our recovery process to bring remaining RDS instances back on-line in the affected AZ is continuing at a steady pace. Customers can launch new database instances. As noted before, customers with impaired DB instances do have the option of initiating a Point in Time Restore operation.
23rd Oct 4:00 AM PDT We are making steady progress bringing back remaining RDS instances back on-line in the affected AZ. RDS instances in other unaffected AZs in the region continue to operate normally.
23rd Oct 6:37 AM PDT We continue to make progress towards restoring access to the remaining DB Instances in the affected Availability Zone. The service continues to operate normally for the rest of DB Instances in the Region.
23rd Oct 2:53 PM PDT The RDS service is now operating normally. Some Single-AZ database instances could not be restored and are being placed in a “failed” status. We are in the process of contacting customers who own these instances that could not be restored. Customers with automated backups turned on for an affected Single-AZ database instance can initiate a Point-in-Time Restore operation as per the instructions provided before in this post. This will launch a new database instance using a backup of the affected database instance from before the event. We will post back here with an update once we have details on the root cause analysis.
See more on Amazon’s AWS status page …
status.aws.amazon.com