Data Plane Elasticity on Amazon EKS 1.29: Pod Autoscaling with VPA

February 5, 2024

9 min read

Claudio Acquaviva

Principal Architect, Kong

In this series of posts, we will look closely at how Kong Konnect Data Planes can take advantage of Autoscalers running on Amazon Elastic Kubernetes Services (EKS) 1.29 to support the throughput the demands API consumers impose on it at the lowest cost. The series comprises four parts:

Introduction

One of the primary drivers for companies to migrate their infrastructure to the cloud is elasticity, the ability to dynamically scale, up and down, compute and container resources according to their business needs. The "Overview of Amazon Web Services" whitepaper outlines the advantages of cloud computing related to it: variable expense and no more capacity guessing.

In summary, companies can reduce their costs with the "pay-as-you-go" model provided by cloud elasticity.

It's important to clarify some fundamental concepts. Given an application, in order to support different workloads, basically, we have two main options:

Vertical scalability (scale up/down): you are adding or subtracting hardware resources (memory, CPU, storage) to/from your application.
Horizontal scalability (scale out/in): you are adding or subtracting more nodes or servers and distributing the workload across them.

Elasticity goes beyond that. It provides the capacity to automatically provision and deprovision hardware resources based on demand. In other words, you will be able to vertically scale up/down or horizontally scale in/out your applications in an automatic way. This is also called "autoscaling."

Autoscaling and Kubernetes

Autoscaling is one of the most compelling features of the Kubernetes platform. Basically, it is available in two different perspectives:

1. Pod Autoscaling

Pod Vertical Scalability with VPA (Vertical Pod Autoscaler): it increases and decreases the CPU and memory allocation for your Pods.
Pod Horizontal Scalability with HPA (Horizontal Pod Autoscaler): it adds and subtracts the number of Pods of a given Kubernetes Deployment.

The following diagram shows both Vertical and Horizontal mechanisms. Note that, they work with a single Node since they are Pod Autoscaling options.

2. Cluster Scalability

Also called Node Autoscaling, this adds and subtracts Kubernetes Nodes of your Cluster. lt can be achieved with two different autoscalers:

Cluster Autoscaler (CAS) is a Kubernetes-native tool that adds or removes Nodes, based on pending Pods and Node usage metrics. Cluster Autoscaler lacks the capacity to provision Nodes based on the requirements a Pod requests to run. It just identifies an unscheduled Pod and replicates an existing Node from the NodeGroup with identical specifications. It relies on EC2 and Auto Scaling Groups (ASG)
Karpenter is an open source node provisioning project built for Kubernetes. Differently from CAS, Karpenter doesn't manage NodeGroups, instantiating EC2s directly and adding them as regular group-less Nodes. The main benefit is that it chooses the right Instance Type to support the required throughput. It can also consolidate multiple Nodes into larger ones.

The following diagram compares the two Cluster Scalability options. The main difference highlighted here is the capacity provided by Karpenter to choose Instant Types other than the original one created.

All these components are maintained by the Kubernetes Autoscaling Special Interest Group (SIG).

Konnect Data Plane and Autoscaling

Kong Konnect is an API lifecycle management platform delivered as a service. The Management Plane, also called the Control Plane, is hosted in the cloud by Kong, while the Gateway nodes, called Data Planes, are deployed in your environments.

The Control Plane enables customers to securely execute API management activities such as create API routes, define services, etc. Runtime environments connect with the management plane using mutual transport layer authentication (mTLS), receive the updates and take customer-facing API traffic.

The diagram below illustrates the architecture:

This blog post describes basic Konnect Data Planes deployments on an Amazon EKS 1.29 Cluster, taking advantage of these 4 different autoscaling technologies.

Before we get started, it's important to make two critical comments:

The use cases we are going to run are not meant to be applied for production environments. The main purpose is to demonstrate how to leverage the Autoscaling capabilities we have available for our workloads and applications.
The Konnect Kong Gateway Data Plane deployments are not tuned to deliver the best performance they can get. For optimal Konnect Kong Gateway performance configuration check the official documentation.

Amazon EKS Cluster Creation

The first thing to do is create the EKS Cluster along with a NodeGroup. For the basic autoscaling use cases we are going to run, I have chosen the t3.2xlarge Instance Type which provides plenty of space for CPU and memory allocation.

You can use eksctl CLI like this:

The --asg-access is not used for VPA and HPA configurations but for the Cluster Autoscaler settings we are going to explore later on.

Now, create the Managed NodeGroup for the Cluster with:

Metrics Server

Both VPA and HPA require Kubernetes Metrics Server to work, so let's install it with

Kong Konnect Control Plane Creation

The next step is to create a Konnect Control Plane (CP). We are going to use the Konnect Rest Admin API to do so. Later on, we will spin up the Konnect Data Plane (CP) in our EKS Cluster.

The CP/DP communication is based on mTLS so we need a Private Key and Digital Certification pair. To make that easy, let's use openssl to issue them.

To use the Konnect Rest Admin API, we need a Konnect PAT (Personal Access Token) in order to send requests to Konnect. Read the Konnect PAT documentation page to learn how to generate one.

Create a Konnect Control Plane with the following command. It configures the Pinned Mode for the CP and DP communication, meaning we are going to use the same Public Key to both CP and DP.

The following command should return the Konnect Control Plane Id:

Get the CP's Endpoints with:

Now we need to Pin the Digital Certificate. Use the CP Id in your request:

Kong Service and Route

With the Konnect Control Plane defined, it's time to create a Kong Service and Route, so the Data Plane can have an application to consume. For this Autoscaling exploration, The Konnect Data Plane will consume an Upstream service (based on the httpbin echo application), running on the same Cluster, we are going to deploy later on. In fact, the Kong Service refers to the Kubernetes Service's FQDN.

The following command creates a Kong Service:

The following command should return the Kong Service Id:

With the Kong Service Id, create a Kong Route:

Kong Konnect Data Plane deployment

We are ready to deploy the Konnect Data Plane (DP) in the EKS Cluster. Create a specific namespace and a secret with the Private Key and Digital Certificate pair.

Next, create the values.yaml we are going to use to deploy the Konnect Data Plane. Use the Control Plane endpoints you got when you created it.

Note we are going to expose the Data Plane with a Network Load Balancer, which provides the best performance for it. Besides, with the nodeSelector config we are explicitly referring to the NodeGroup we created previously.

Again, keep in mind this blog post describes a simple environment for autoscaling tests. That's the reason we are using nodeSelector for Pod isolation. As a best practice, for a production environment, it should be used as an exception.

From the VPA perspective, the first autoscaling technology we are going to explore, the resources section is the main configuration. It sets the CPU and memory requests and limits for the Data Plane.

Deploy the Konnect Data Plane with the Helm Charts:

You should see the Data Plane running:

For a basic consumption test send a request to the Data Plane. Get the NLB DNS name with:

You can also check the Data Plane's Pod resources:

VPA

VPA Architecture

The diagram below was taken from the VPA's project repository. VPA is fully documented in a specific vertical-pod-autoscaler directory. Please check the repo to learn more about VPA.

VPA consists of three main components:

Recommender monitors the current and past resource consumption and, based on it, provides recommended values for the containers' CPU and memory requests.
Updater checks which of the managed Pods have correct resources set and, if not, kills them so that they can be recreated by their controllers with the updated requests.
Admission Plugin sets the correct resource requests on new Pods (either just created or recreated by their controller due to Updater's activity).

VPA is configured with a new CRD ("Custom Resource Definition") object called VerticalPodAutoscaler. It allows to specify which pods should be vertically autoscaled as well as if/how the resource recommendations are applied.

VPA Installation

VPA is not available in Kubernetes by default, so we need to install it manually. To install VPA in your EKS Cluster clone the repo and run the vpa-up.sh script.

Among typical Kubernetes objects, such as CRDs, the process installs three new Deployments in the kube-system namespace: Recommender, Updater and a new Admission Controller.

You should see three new Pods running, one per VPA component:

Upstream and Load Generation Nodes

To have better control over the Kubernetes resources we are going to deploy each component (Load Generator, Konnect Data Plane and Upstream Service) in a specific NodeGroup.

Create a new NodeGroup and install the Upstream Service

To create the NodeGroup for the Upstream Service run the following command. The c5.2xlarge Instance Type has enough resources to run the service, ensuring it doesn't become a bottleneck.

Now, install the Upstream Application. Note we are deploying 5 replicas of the application to provide a better performance. We also refer to the new NodeGroup with the nodeSelector configuration.

You can check the application exposing its Service:

Consume the Data Plane with the LoadGenerator

As you can see the specific NodeGroup for the Load Generator is quite similar to the Upstream Service:

Now, let's consume the Data Plane. The Load Generator deployment is based on the Fortio load testing tool. Fortio is particularly interesting because it can be run at a fixed Query per Second (QPS) rate. For example, here is a Fortio deployment to send requests to the Data Plane. Fortio is configured to keep, for 3 minutes, a 1000 QPS rate across 800 parallel connections.

Also, note Fortio is consuming the Data Plane Kubernetes Service's FQDN:

You can check Fortio's output with:

Eventually, Fortio presents a summary of the load test. The main data are:

The P99 latency: for example, # target 99% 1.81244
The number of requests sent along with the QPS: All done 52007 calls (plus 800 warmup) 2775.391 ms avg, 287.6 qps

You can stop Fortio, simply deleting its Pod:

As you can see, the single Konnect Data Plane instance was not able to handle a 1000 QPS rate as requested. The main reason for this is that its Pod is restricted to consume up to 0.2 units of CPU and 300Mi for memory as set in the resources section of the Data Plane deployment declaration.

That's a nice opportunity to put VPA to run and see what it recommends.

VPA Policy

The VPA policy is described using the VertificalPodAutoscaler CRD. For a basic test, declare a policy like this:

A couple of comments here:

The updateMode, configured as Auto, requests VPA to assign resources on Pod creation and updatesate mechanism. Check the documentation to learn more.
The default behavior of VPA Updater is to allow Pod eviction only if there are at least 2 live replicas, in order to avoid temporary total unavailability of a workload under VPA in Auto mode. This has been changed with minReplicas configuration (of course, this is not recommended for production environments).
minAllowed and maxAllowed configs set the range of resources (CPU and memory) to be considered.

Consume the Data Plane again

Let's run the same load test now and see what happens.

If you check the Updater with, for example, kubectl logs -f vpa-updater-884d4d7d9-mrdd6, you should see messages saying it has evicted and recreated the Pod with new resource configurations:

This is due to the new recommendations provided by the Recommender. For example, kubectl logs -f vpa-recommender-597b7c765d-vljms, should show messages like these:

As usual, Fortio should report the test results. In my case, I got the following:

The P99 latency: target 99% 1.02964
The number of requests sent along with the QPS: All done 180000 calls (plus 800 warmup) 650.192 ms avg, 993.4 qps

As we can see, now the DP was able to handle the QPS we requested with a much better latency time.

Let's check the VPA policy now:

You can get a more detailed view with the following command. The output shows that VPA generates some recommendations. Target is the actual recommendation as Upper and Lower bounds are indicators of the recommendation.

Most importantly, we should be able to see the VPA effects over the Data Plane Pod. Note that VPA recommendation keeps the proportionality of the resource requests and limits.

In general, VPA is not expected to provide fast actions. In fact, it is more useful to get general trends. For quick reactions, Horizontal Pod Autoscaler (HPA) is a better approach. Check the next post in this series to see Konnect Data Plane working along HPA.

Topics:API Testing

Open Source

AWS

Kong Konnect Data Plane Elasticity on Amazon EKS 1.29: Pod Autoscaling with VPA