Last week, the digital world experienced a power outage. A major distributed edge computing platform experienced a software bug that led to the collapse of their Domain Name System (DNS), or the Internet's way to map domain names (i.e. united.com) to IP addresses. The consequences were costly. So what is the real cost of downtime?
When a DNS system within a globally recognized edge platform fails, it causes downtime for a huge amount of international enterprises. All of these organizations rely on continuous, uninterrupted uptime for their customers. Edge DNS services are also responsible for maintaining the security of these enterprises by protecting against cyberattacks such as distributed denial-of-service attacks (DDoS).
How Downtime Can Impact the Bottom Line
When these types of outages happen, users can lose their ability to access assets of extreme importance, such as financial and personal data, and even government entities can risk serious disaster. From a business perspective, companies, unfortunately, lose credibility with their audience, which can lead to loss of customers and revenue to better-equipped competitors.
One of the main goals of engineering and product teams is to develop applications that experience the least amount of downtime possible. Perhaps the most crucial element of app and web connectivity is Application Programming Interfaces (APIs), the backbone of modern applications. What the DNS system is to edge platforms and uptime, APIs are to the connection of critical pieces of software that enable the functioning of all the services available online.
Causes of API Downtime
API downtime, or when your APIs fail to do their job, is the biggest threat to site uptime. The functioning of the APIs in your system depends on many technical factors and requires tools to ensure their health. We believe that the two key culprits of API downtime are poor performance and ineffective security. Let's dive in:
Effective performance of APIs requires both availability and low latency. If an API is consistently making numerous database queries instead of taking advantage of a cache, for example, this can lead to avoidable and detrimental inefficiencies in your infrastructure. In addition, these excessive queries can drown your server with request overwhelm, a problem that is exacerbated during site spikes and can lead to outages.
For some organizations, managing a large load of queries is unavoidable. Simply taking advantage of a cache system may not be enough, and even a system supported by perfectly designed APIs can suffer. That's when engineers should start considering ways to increase the speed and reliability of their connectivity to improve their overall performance and avoid critical mistakes.
Supporting your APIs with a scalable and performant API gateway that provides high availability can increase uptime overall. In short, it is critical to make sure that the heavy traffic your site will experience is routed and supported in a speedy and efficient way.
The bottom line? Performance needs to be optimized, observed and managed consistently so that your system does not fail just when your customers are eager to make a buying decision.
Security is one of the most important elements of API design, and without it, a site can not only experience downtime but also privacy breaches which can ruin any organization's reputation. The challenge is to strike a balance between APIs that are governed by policies that not only protect the system's security but also provide ease of use and consumption for developers.
The latter requirements are the ones that are most vulnerable to bad actors that aim to perform cybersecurity attacks like DDoS, which lead to downtime and lost revenue.
Some of the biggest vulnerabilities that can plague APIs are data exposure mistakes caused by ineffective authorization and authentication, and a lack of request restrictions. An API may be designed without considering the sensitivity of the object properties and data exposed, and injection attacks by bad actors can lead to them accessing sensitive data and creating a serious security breach.
Lack of restrictions around the amount of resources a user can request through an API (rate limiting) can also lead to server overwhelm, site failure and even unauthorized access. For example, a person performing a brute force cyberattack on a system with this vulnerability can have unlimited password attempts - an obvious security risk.
These API implementation mistakes can lead to both security breaches and site downtime, as these high priority security emergencies often require services to go offline to be fixed.
If APIs fail or even run too slow, the consequences are just as bad as the outage experienced last week. On top of perfecting API design, technology teams should be ever more focused on delivering top notch connectivity: meaning fast, reliable and scalable systems that minimize site downtime.
API gateways and broad service connectivity platforms are essential technology investments for this reason. With them, your team can deliver reliable, secure and performant connectivity across every API, service and application in your organization. They do this by acting as a reverse proxy that safely exposes your APIs and creating an ingress point to the services in your application for your internal team.
Standout solutions can achieve this regardless of where your application runs or how it is implemented. One of these is Kong, the most widely adopted API gateway in the world, with over 1.7M instances running across organizations of any size and in every vertical, with more than 257M downloads.
The added benefit of API management solutions is scalability: teams want their API security to grow at pace with the size of their codebase and application. The API gateway’s flexible and extensible nature ensures its adaptability to the changes in functionality and architecture design that your application will inevitably go through as it grows.
Ensuring Effective API Security
The consequences of bad security systems are evident, especially its implications for site downtime. It is important for any team prioritizing security to be aware of these two tasks: access control management and traffic management. By simply configuring authentication plugins to the Kong API gateway, teams can eliminate unwanted exposure to their APIs and grant access only to select users with proper authorization within their organization.
Automated bots are often used to execute automated attacks that can overflow traffic within a system. Mitigating these bots with rate limiting plugins can reduce the risk of DDoS attacks by putting a cap on how many times a user can perform a given action on your site. Avoiding these traffic management pitfalls can reduce the risk of site failure and downtime.