Enabling Multi-Region for Kong Konnect Cloud
Since the initial launch of Kong Konnect Cloud, one common feature request has (unsurprisingly) been Multi-Region support. Many customers look for SaaS solutions that support a distributed service architecture. Even at its inception, our goal was to support more than a single region. Today, we’re happy to announce Multi-Region Support for Kong Konnect Cloud.
Multi-Region support allows you to manage your Konnect entities using a “single pane of glass”, regardless of where they reside. Coupling this feature with Custom Teams and Per-Entity Permissions, organizations can now restrict roles assigned for either a user or a team to a specific region.
Today, we’re going to take a look at how we enabled Multi-Region support for Kong Konnect Cloud.
The Lands Between
Before getting into some of the technical aspects of Konnect we first need to understand some of the constraints around entity data. Like many service providers, entities can either be scoped or unscoped. As an example, when viewing the `Runtime Manager` dashboard we see our region picker has both US (North America)
and EU (Europe)
.
However, on our organization dashboard, we see the region picker has been disabled and defaulted to Global
.
This is because teams are unscoped in nature whereas runtime groups are scoped to a specific region.
Ashes of War
In the initial planning for scaling the identity service, the first question we asked ourselves was, “How do we replicate the data across regions?”
Then an almost more important question was asked, “Should we?”
When looking at the performance and initial scale of authorization by itself, we realized that on average we were fielding roughly 750k requests in a given day between authorizing actions and updating access policies. Trying to field even a relatively small scale like this for requests between the US
and EU
would result in a 100ms latency hit we weren’t willing to pass on to our users.
Our identity service was comprised of functionality for both authentication and authorization. One key factor in replicating this data was data residency. However, authorization data doesn’t quite have the same requirements as identity data. In short, separating the two pieces of functionality would allow us to define data replication behavior for each independently.
Now that these two services were separated, the next hurdle we had to tackle was ensuring user actions, such as adding a user to a team or assigning a role to a user, not only generated the appropriate authorization policies but replicated those very policies across all of our supported regions.
Our solution to the age-old CAP problem was to leverage replication in order to maintain the closeness of internal services while maintaining felt performance.
Stonesword Keys
Coming up with a solution that involves distributed data undoubtedly comes with a choice between strong consistency or eventual consistency.
We elected to choose eventual consistency by leveraging active-active database replication. As soon as an action is taken against an entity that affects policy data, we store it in our database. The database layer then handles replicating that data to our other regions.
On average, replicating the data across Konnect regions has a latency of roughly 750ms. What this might look like for automated clients who immediately attempt to utilize a created role or access policy is going to be a Forbidden
response.
Large cloud providers get around eventual consistency in their automated clients with retries. Nominal workflows in the Konnect UI likely will not experience this given a user would need to navigate elsewhere in order to perform a given action, taking less time to replicate than it would to navigate.
Let’s look at this in practice. Say an admin adds a user to a new team that only has access to EU
resources. The identity service, currently in the US
, manages the correlation between Users and Teams and stores this reference in its database. At the same time, we send a change event to the authorization service. It then handles constructing the necessary policies and stores those policies in its database. Now, the user can then make the necessary calls in EU
as an authorized user.
Golden Order
Tackling Multi-Region support was an extremely engaging problem to solve. The teams learned a great deal as a part of this feature delivery. The learnings that came from Multi-Region support will likely drive improvements across the board for months to come.
We understand the future of APIs is a distributed one. If you haven’t tried the new features today, give them a whirl. We’re happy to take any feedback to improve the product.