Sooner or later, every website or service is going to experience an outage of some kind. It doesn’t matter if you’re running a modest blog with a few thousand daily visitors or high-traffic web service with hundreds of thousands of visitors, no site is immune to downtime, whether it be planned or unscheduled.
Of course, when it comes to planned downtime, it’s easy to do the right thing by a) scheduling it to occur when it will cause the least inconvenience to your users and, b) letting your users know in advance that the downtime will occur.
But what about the anomalous outage or slowdown? By definition something like that is both unscheduled and, unfortunately, ordinarily occurs at the worst possible time. How should this be handled?
There’s a right way and a wrong way to handle just about everything, and in this case most of us are familiar with the wrong way: the site goes down or gets sluggish and no explanation or updates are available, leaving the user to wonder if it’s really the site or if, in fact, it could simply be a connection problem.
A great example of the right way to handle things occurred today, when Mailchimp began experiencing a crush of combined traffic from Valentine’s Day and President’s day promotions – by their estimate as much as 80X the volume from last year. Instead of leaving customers in the dark, they have really demonstrated that you can have a problem and not alienate your customers.
I first noticed the service was having problems when I checked my email in the morning. I run an automated RSS-to-Email campaign each day for one of my clients and I discovered that the email had not gone out. So, I logged on to my Mailchimp account and was greeted with the following:
Okay, that’s not good, but at least I know it isn’t anything I’ve done wrong, and obviously they’re working the issue. But how long is it going to take? Switching to Twitter, I found that regular and informative answers were provided:
This is fantastic. Instead of the customary, “We’re experiencing problems. Please stand by,” I’m provided with a running commentary on Twitter and their blog about how they’re drilling into the issue, along with estimates for when things may be back to normal. I can even get these updates sent to me in real-time via Twitter’s SMS capability. To top it off, Mailchimp even offers to make things right with their customers, presumably with a service credit of some kind.
So what are the lessons learned from Mailchimp’s example?
- Let people know up-front that there is a problem; don’t leave them in the dark, wondering what’s happening and whether it is their own fault.
- Provide frequent updates. If you know how long it’s going to take, say it. If you don’t, say that, too. Knowing bad news is better than not knowing anything at all.
- Offer a reasonable service credit of some kind to affected users.
I tip my hat to the Mailchimp people. They’ve handled today’s outage in an exemplary fashion. And I believe in rewarding good behavior: I have no intention of applying for whatever service credit they intend to offer.


