A Very Cloudy Day for Amazon

If Amazon.com wanted to pull a Mick and Keith and get everyone
off of its cloud, it probably could have found a better way to do it.

Yesterday,
Amazon Web Services (AWS) suffered an outage that knocked client sites off
the web for hours. Outages, according to several reports, began around 5 a.m.
EST and continued for hours later. As of 5 p.m., according to a Bloomberg
News
report, Amazon claimed that customers in all but one time zone were
back online.

Among those taking a hit were Foursquare, Formspring.me, Quora,
Reddit and Salesforce.com. Another AWS customer BigDoor Media was not as
lucky.

At 11:40 p.m. BigDoor had this message posted on its home page: "We’re
still experiencing issues due to the current AWS outage. Our publisher account
site and API are recovering now, but apparently AWS thinks our corporate site
is too awesome for you to see right now." (On the plus side for visitors,
BigDoor had a YouTube video of a Honey Badger that was roughly equal parts
profane, visually disturbing and funny.)

Keith Smith, CEO of BigDoor, wrote
on GeekWire, "Starting at 1:41
a.m. PST, Amazon’s updates read as if they were written by their attorneys
and accountants who were hedging against their stated SLA rather than being
written by a tech guy trying to help another tech guy. We aren’t just
sitting around waiting for systems to recover. We are actively moving instances
to areas within the AWS cloud that are actually functioning. If Amazon had
been more forthcoming with what they are experiencing, we would have been able
to restore our systems sooner."

Amazon’s issues raise questions about the
ability of cloud computing services to meet the needs of organizations both
large and small. While there is much to recommend cloud services, the AWS outage
highlights vulnerabilities, as well.

Vanessa Alvarez, an analyst at Forrester
Research, told Bloomberg, "Customers
need to start asking tough questions and not assume everything will be taken
care of in the cloud, because it will not. They shouldn’t be counting
on a cloud service provider like Amazon to provide disaster recovery."

Michael
Hussey, chief executive of the search company PeekYou, told Dow
Jones Newswires 
that the company uses 12 Amazon servers but does not have
applications that are "mission critical" on the cloud.

"The interesting thing about this outage is it took down massive sites,
but we’ve been seeing problems all year long at this East Coast facility," Mr.
Hussey said.

William Marler, a lawyer for the law firm Marler Clark, told Dow Jones, "It
is simply amazing how dependent we have all become on the web — that we do
not control."

Matthew McKenzie on AllBusiness.com wrote, "Even
a minor outage riles up the usual bunch of cloud-hating suspects. A major outage
like this one whips them into a mouth-foaming, rug-chewing frenzy. They think
this proves that cloud computing is unreliable and dangerous. As usual, they’re
wrong. … Life will go on, and cloud computing will still be the safest and
most efficient option, by far, for the vast majority of the businesses that
use it."

Discussion Questions

Discussion Questions: Will the Amazon Web Services outage cause companies to rethink using cloud computing services? What are the best uses of cloud services for companies in the retailing business?

Poll

10 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Max Goldberg
Max Goldberg
13 years ago

Why all the hysteria? Amazon’s issues don’t raise questions about the ability of cloud computing services to meet the needs of organizations, as stated in the article. Rather it points to the need for redundancy and back up systems. Cloud computing is safer, more reliable and more economical that building and maintaining your own systems.

Ryan Mathews
Ryan Mathews
13 years ago

One storm does not a cloud break.

Look, all technology is subject to breakdowns either through system failure or operator error.

You don’t stop driving for life simply because your car failed once due to a faulty fuel pump.

Cloud computing is a tool, nothing more, nothing less and, on occasion, all tools break or fail.

Supermarket operators on the East Coast didn’t quit illuminating their stores or move their frozen food sections back to blocks of ice after the New York City Blackout several years ago and I doubt very much if retailers are going to desert cloud computing en masse after Amazon’s crash.

The problem isn’t with technology, rather it’s with our primitive faith that the built can never, or should never, fail.

Paula Rosenblum
Paula Rosenblum
13 years ago

“The Cloud” (I really don’t like the term, but if you can’t lick ’em, join ’em) is not perfect. If I were still a CIO, I would have a VERY hard time trusting the cloud with all my data. Even in our small company, I have a multiple back-up scenario.

1) We use dropbox for our corporate shared data. So it resides both in the cloud and on our PCs.
2) I have a Windows Home Server in my house. The data (and my own) also goes on there. Windows Home Server might well be the best home product MSFT ever made. You can completely recover and rebuild a PC in a couple of hours.

I still don’t think I could be persuaded to install network-centric POS (which is an old fashioned description for POS over the cloud). I’d want each store to be an island if it needs to be. In fact, way back in the days of y2k (man, where is time going, anyway?), I knew that worst case, I could set my stores back to 1900 and keep ringing sales until I figured out what went wrong. Luckily, nothing really did go wrong, but that was my fall-back.

Steve Montgomery
Steve Montgomery
13 years ago

Like Ms. Rosenblum we have data residing in a shared data site and on our PCs. While I understand the appeal of the cloud, I have experienced enough internet outages that I want the ability to continued to work should we lose connectivity. We would also have a hard time recommending any client rely solely on being connected to the internet to be able to record sales or conduct business (with the obvious exception of internet-based companies).

Camille P. Schuster, Ph.D.
Camille P. Schuster, Ph.D.
13 years ago

There is no system that will never fail. Success depends upon back-ups and redundancy–not just the cloud provider but the companies using the cloud and individual consumers. Back-up and redundancy is not fun and we would all rather ignore the necessity. These kinds of problems emphasize how important it is to address the issue before there is a problem.

David Dorf
David Dorf
13 years ago

Considering it was back in July 2008 when the last similar failure occurred, Amazon is doing pretty well. Perfection cannot be expected, and anyone using a cloud service needs to understand the risks and mitigation. Failures like this one happen all the time, the difference here is that it happened to multiple companies at the same time. When there’s a car accident, its a tiny story–when a bus crashes its on every channel.

Every retail CIO needs to think about how they could have been affected had they been using AWS:

Planning down for a day? – Hurts, but survivable.
Merchandising down for a day? – Perhaps that leads to some inventory issues next week.
All POS systems down for a day? – You’re fired.

Daryle Hier
Daryle Hier
13 years ago

Cloudy days, it should read. What’s interesting is that so many businesses still don’t have back up situations for instances like this. I had my wake up call years ago and have had backups on backups–multiple redundancies. This problem does prove that no matter who you are (Amazon), situations like this will arise and cloud computing is no different. By the way, a day later (today), there are still companies out of commission including social site Reddit. And everything is not “back online” in all time zones but one. Cloudy days.

Fabien Tiburce
Fabien Tiburce
13 years ago

Not only is Amazon doing well by industry average, let’s remember the outage affected a single data centre. For CIO and cloud practitioners, the solution to this predicament is obvious: don’t put your core systems and their backup (or redundant servers) in the same subnet and data centres! An act of God such as an earthquake or tornado can take down even the most protected data centre. As long as you load balance and that your infrastructure is distributed across different subnets and data centres, individual failures should not affect you.

The cloud is safe but is no magic bullet. You have to build redundancy into your infrastructure, cloud or no cloud.

Gene Detroyer
Gene Detroyer
13 years ago

There is no issue here. It was a hiccup in the system and whatever the cause of the hiccup will be fixed. Funny, I was more concerned about the night my Netflix went down for several hours and I couldn’t watch the movie I wanted, but it didn’t make a blurb on CNN. This is a “sky is falling” issue.

Larry Negrich
Larry Negrich
13 years ago

Basic lessons for any retailer looking at almost any technology. Know your vendor and understand the SLA. Hire qualified team/consultants that understand the limitations and ramifications of unforeseen issues that may arise with the technology. Make sure that redundancies are in place, varying by critical importance of operations being performed. Don’t put anything into production mode around the holidays. (Which this wasn’t) And hope for the best….

BrainTrust