What is an internet outage?

Mon, 6th Sep 2021

FYI, this story is more than a year old

By Mike Hicks, Principal Solutions Architect, Cisco ThousandEyes

It's not always clear if there are more outages today than years ago. This may be due to the role digital platforms now play in our lives — but when they glitch or go down, there are convenience and practicality repercussions.

We see on a regular cadence the impact of backend outages on payment systems and the impact felt by consumers at the point-of-sale. In a cashless society, if this happens, there is no way to pay for it.

As cities, states and territories move in and out of lockdown, outages to home broadband are also a concern. Government research in Australia says most users can expect a minutes-long outage every three days, which it argues is ‘likely to have little material impact on end-user experience.'

There's no room for brands to deliver bad digital experiences in today's digital-first world. Ninety-two per cent of respondents to a recent survey said they expect digital services to have reliable, consistent performance. In comparison, 57% said a brand has one shot to impress. If the digital service does not perform to expectations, they won't bother to use it again.

That level of digital expectation makes it imperative that businesses learn quickly when something is afoot that might cause their users pain in a digital experience or interaction. It's critical to recognise and act on that pain to stem its flow, reduce friction and keep users onside.

The brand wears the blame, no matter the cause

Increasingly, that digital experience relies on a long chain of dependencies that can downgrade usability severely. Built to deliver best-of-breed functionality, apps increasingly rely on APIs and microservices architectures hosted in third-party operated cloud environments. They are made up of several different parts that are stitched together to perform a specific task.

That diverse architecture increases the risk of outages since the unavailability of one component breaks the entire orchestration of components.

In addition, some industry sectors such as banking and energy are subject to regulated availability standards. Designed to provide some service assurance guarantee, these standards often only focus on the service itself and do not consider the entire service delivery chain.

This leads to a situation where an organisation's services are rendered inaccessible by an outage along the chain, such as a third-party dependency going down, despite the core systems themselves still available and running.

This may not count as downtime in a strict regulatory definition of the term. While that may help an organisation avoid regulatory trouble, it's unlikely to appease its customer base. Customers do not differentiate when a service is unavailable or inaccessible.

Sixty per cent of survey respondents blame the application and brand when they encounter a problem with a digital service, irrespective of the issue. That is, the brand wears the blame for an outage publicly, even if a third-party service or component is the root cause.

Organisations can lessen the blowback if they quickly pinpoint the third-party cause and communicate to users that it is a third-party issue and that they are working closely with the operator to resolve the issue.

The case for redundancy

Once confronted with an outage, the questions are: what will it cost, and what steps can the organisation take?

Redundancy or backup options may be an option. For consumers, having a backup cellular internet service that they can turn to if their fixed service goes down may help them route around problems and stay connected.

Some internet providers bundle 4G backup with fixed-line broadband in recognition that continuity of service is a vital characteristic of a modern internet service. Similarly, network redundancy has traditionally also worked for larger organisations, enabling them to route around connectivity issues.

Outside of connectivity, redundancy may be costly or impractical. While a company using a single content delivery network (CDN) may find it beneficial to diversify its commitment across multiple CDN providers, duplicate DDoS mitigation services may prove unaffordable. However, an automated recovery process may allow them to route around the problem and recover the service for their customers with limited impact.

Front-foot communication and updates, underpinned by backend visibility, is the best way for an impacted organisation or provider to soften the impact.

A word on degradation

So far, we have dealt with outages as distinct windows of service unavailability. However, service or user experience degradation could also be a source of pain. It's not an ‘outage' in the purest sense, but still, an experience best avoided.

Some optimisation of the underlying architecture or infrastructure might be possible in this instance, such as additional burst capacity in peak periods, load balancing or code changes. Having visibility into how the experience is performing is crucial to understanding how to improve it.

As digital transformation projects advance, there will undoubtedly be an increased reliance on the internet as the primary delivery mechanism serving both employees and customers alike. In that landscape, the business best placed to ensure maximum availability will win —, no matter where an outage or degradation occurs.