Operating APIs and microservices at scale relies on the right processes and company culture. APIOps is the name for this overall approach. If done right, organizations can accelerate digital business with APIOps. It combines modern automation tooling with best practice engineering principles to ensure that the organization delivers with continuous quality as well as speed.
In this blog series, we explore the triggers for adopting APIOps, and its impact, through the story of "Acme."
Acme is a large bank with a sprawling tech landscape. The company has been around for several decades, so they have a lot of legacy systems and tools, as well as multiple siloed engineering teams. They've built up many APIs and services of differing standards, but no one has a single view of them all.
Acme's Current Efforts for Digital Transformation
As part of Acme's digital transformation, they are migrating most of their workloads to the cloud and Kubernetes and adopting more of a consistent API and microservices-driven approach. The teams are constantly trying to balance these ongoing modernization efforts with the increasing demand for innovation, which Acme needs in order to stay relevant as a business.
In the Mortgages team, they're refactoring their legacy code as they migrate it to the cloud, and they've just identified the next API they need to build. Emily has just finished designing that API, and she is reviewing the spec with her team. They all agree the spec looks great, so as per their normal process, she sends it off to the API Platform team for review and moves on to her next task.
The API Platform team owns Acme's API platform, as well as the overall architecture. They host and manage the platform on behalf of the rest of Acme, with the goal of raising the overall engineering standards across the organization. They need to make sure that whatever they accept for deployment into the platform meets the standards of everything else already there.
To do this, a group of them meet once a week to go through all the new APIs that have been submitted and check them for standards.
Sadly, in this case, Emily's spec is not approved.
It turns out there's a whole set of standards that Emily just doesn't know about. It's probably documented somewhere and it might be updated from time to time, but it's not very well communicated, and it's definitely not done in a developer-friendly way. You can't really blame her for getting something wrong, especially when she's not given any help to get things right.
So a week after she submits it for review, the Platform team rejects Emily's spec, and it gets pushed back down to her.
This is pretty embarrassing for Emily. She's getting called out in front of her peers for not doing a good enough job and told to redo her work. And this is really damaging to her morale.
Emily's Project is Delayed
This is also a huge waste of everyone's time. Emily's going to have to repeat the same work unnecessarily, and the Platform team is doing these reviews manually - at scheduled intervals - so there are several days wasted even just waiting for that review.
And this is just the first iteration. How many more cycles is it going to take before the spec is good enough? How much is that going to delay the end go-live?
Manual Efforts Cannot Scale
It's not just Emily and the Mortgages team that suffers here. Acme is following best practice and using a single API platform for global discovery and re-use across the business. This means as adoption grows, the Platform team needs to onboard and support more and more teams across the organization, and then have more and more APIs coming in for review on top of all the other work they have to do. Their backlog just keeps growing.
The Platform team ends up being stretched very thin, so rather than spending enough time fully reviewing every API, they end up having to prioritize and rush things to make sure that everything's done in time.
Remaining Compliant Is Hard
Compliance is a nightmare - with different APIs built by different teams in Acme, it's just too much to manually validate that every single one of them complies with local regulations. Costs keep rising.
So they hire as many test engineers as they can justify the budget for, and this helps a little bit. But as well as getting expensive, it's not long before they're also overwhelmed and stretched too thin as the API adoption grows.
Instability Challenges Ops
This means that things fall through the cracks, which isn't so good for the Operations team, which is responsible for maintaining the overall IT estate.
Enough has fallen through the cracks that there are a lot of bugs and errors in production. Nothing's guaranteed to be consistent, and deployments are pretty painful - in fact, they refuse to deploy new code more than once a week because it causes so much instability.
These poor folks regularly get called in at weekends to fight whatever the latest fire is and try to minimize the impact to Acme's customers.
Inconsistency Prevents Reuse
Elsewhere in Acme, the Mobile team operates a little differently. The team here is very autonomous - they've been given a lot of freedom so that they can get customer-facing applications out as quickly as possible. The Mobile team is a newer team at Acme, and it was created with the sole purpose of building rich, digital experiences for Acme's customers as a reaction to the mobile-only banks that were threatening to displace them.
As usual, the team's rushing to get something out. They're about to release their latest Open Banking application, and this one's a big deal for Acme because it's the first time they're exposing actual API endpoints to customers, as well as releasing a mobile app.
Since the Mobile team doesn't follow the same governance processes as the others and has seen the delays getting APIs live elsewhere in Acme, they've decided to do things their own way and bypass the API Platform team altogether. But they were in such a hurry to go live on time that they just focused on the implementation code and missed some API best practices.
And this means their APIs are inconsistent. They're hard to find, hard to access, and hard to use - which puts people off, whether that's internal or external consumers. Their prospects of this Open Banking application are much more likely to go to one of their FinTech competitors who knows how to treat APIs as products because this is what makes an API consumable.
Security is a Growing Risk
Making matters worse is the fact that someone in the Mobile team forgot to secure one of his APIs when he published it. This then got exploited, and Acme detected a data breach affecting 15 million customer accounts.
The Painful Aftermath
Acme started off with all the right intentions - they need to modernize and API-ify the business so that they can continually innovate and delight their customers through differentiated digital experiences. But they've ended up in this situation where they're trading off between speed and quality. They aren't seeing much of the business benefits of APIs, and in fact, they're creating just as many problems. And this is because whilst they're modernizing the technology in their estate, they aren't modernizing their processes.
If Acme had just a couple of APIs, the laborious, manual processes they follow might be ok. But they just don't scale, and no matter how much Acme invests in technology or QA, they cannot effectively adopt APIs and microservices in a sustainable way.
This is where APIOps comes in.
Accelerating Digital Business With APIOps
APIOps is the automation of the full API lifecycle. It combines DevOps philosophies when it comes to iterative design and continuous testing, with GitOps philosophies in terms of automated, declarative deployments.
We know the API lifecycle; this is nothing new. Best practice indicates that we design an API before we build it. Then once it's deployed, we add governance and operational policies to manage it before making it discoverable to consumers in a portal. Then there are all the ongoing operations, and this lifecycle continues going round until we retire the API.
In my next blog post, we will walk through what this lifecycle looks like when we follow APIOps to accelerate digital business. Where before, we saw manual, costly and error-prone activities at Acme, we will now automate all processes.