One of the biggest mistakes that buyers make when outsourcing for the first time is assuming that their internal performance standards (usually tuned to “net” server uptime) are what the provider is promising, whereas most of the time what they’re getting is a “gross” site uptime that excludes pre-scheduled maintenance and force majeure.
We covered force majeure in a previous post, but a reasonable estimate of maintenance time and putting boundaries around it is a good part of due diligence for a true 24×7 operation with strong uptime monitoring. Failure to understand this tidbit from the supplier playbook can result in loss of 1%-2% of project revenue due to server unavailability of online ordering or other mission critical services — which can amount to 7 or 8 figure losses for a top-30 online retailer. It can also affect standard business hours operations, but in a much less severe / quantifiable way.
Maintenance is a frequently overlooked footnote in most hosting contracts. For a business-hours operation, the biggest problem usually is not too much maintenance, but an overly restrictive schedule — for example only providing a window that requires the payment of overtime to internal IT staff or a time that interferes with critical processes. These are not typically deal-breakers, but something that works better when mediated up front in the contract negotiation process.
For a 24×7 operation, however, regular, required, and enforced maintenance downtime can be an unpleasant surprise. As the Uptime Institute notes, Tier III and IV data centers are supposed to be concurrently maintainable. You will also notice that there are surprisingly few data centers certified as Tier III or IV, but most providers we deal with do actually fall into Tier III or close enough to make no difference to most customers, so the data center itself should not need maintenance that takes equipment offline. But that doesn’t mean that your supplier won’t find it more convenient to take your site down — and with a blank check written into the contract, it’s much more likely to do so. Similar considerations apply to patches, network equipment upgrades, etc. — you can theoretically maintain equipment that has some redundancy without taking sites offline, but it’s more cumbersome and resource-intensive.
So if you can question the exclusion of maintenance from your uptime metrics (or other SLAs), you can often save yourself a lot of headaches and potential revenue losses. Better yet, in a round-the-clock operating scenario you should consider multisourcing, with both geographic and supplier diversity that can limit impact of maintenance that can’t be negotiated away.