Kong Konnect Data Plane Elasticity on Amazon EKS 1.29: Pod Autoscaling with VPA
Claudio Acquaviva
Principal Architect, Kong
In this series of posts, we will look closely at how Kong Konnect Data Planes can take advantage of Autoscalers running on Amazon Elastic Kubernetes Services (EKS) 1.29 to support the throughput the demands API consumers impose on it at the lowest cost. The series comprises four parts:
One of the primary drivers for companies to migrate their infrastructure to the cloud is elasticity, the ability to dynamically scale, up and down, compute and container resources according to their business needs. The "Overview of Amazon Web Services" whitepaper outlines the advantages of cloud computing related to it: variable expense and no more capacity guessing.
In summary, companies can reduce their costs with the "pay-as-you-go" model provided by cloud elasticity.
It's important to clarify some fundamental concepts. Given an application, in order to support different workloads, basically, we have two main options:
Vertical scalability (scale up/down): you are adding or subtracting hardware resources (memory, CPU, storage) to/from your application.
Horizontal scalability (scale out/in): you are adding or subtracting more nodes or servers and distributing the workload across them.
Elasticity goes beyond that. It provides the capacity to automatically provision and deprovision hardware resources based on demand. In other words, you will be able to vertically scale up/down or horizontally scale in/out your applications in an automatic way. This is also called "autoscaling."
Autoscaling and Kubernetes
Autoscaling is one of the most compelling features of the Kubernetes platform. Basically, it is available in two different perspectives:
1. Pod Autoscaling
Pod Vertical Scalability with VPA (Vertical Pod Autoscaler): it increases and decreases the CPU and memory allocation for your Pods.
Pod Horizontal Scalability with HPA (Horizontal Pod Autoscaler): it adds and subtracts the number of Pods of a given Kubernetes Deployment.
The following diagram shows both Vertical and Horizontal mechanisms. Note that, they work with a single Node since they are Pod Autoscaling options.
2. Cluster Scalability
Also called Node Autoscaling, this adds and subtracts Kubernetes Nodes of your Cluster. lt can be achieved with two different autoscalers:
Cluster Autoscaler (CAS) is a Kubernetes-native tool that adds or removes Nodes, based on pending Pods and Node usage metrics. Cluster Autoscaler lacks the capacity to provision Nodes based on the requirements a Pod requests to run. It just identifies an unscheduled Pod and replicates an existing Node from the NodeGroup with identical specifications. It relies on EC2 and Auto Scaling Groups (ASG)
Karpenter is an open source node provisioning project built for Kubernetes. Differently from CAS, Karpenter doesn't manage NodeGroups, instantiating EC2s directly and adding them as regular group-less Nodes. The main benefit is that it chooses the right Instance Type to support the required throughput. It can also consolidate multiple Nodes into larger ones.
The following diagram compares the two Cluster Scalability options. The main difference highlighted here is the capacity provided by Karpenter to choose Instant Types other than the original one created.
Kong Konnect is an API lifecycle management platform delivered as a service. The Management Plane, also called the Control Plane, is hosted in the cloud by Kong, while the Gateway nodes, called Data Planes, are deployed in your environments.
The Control Plane enables customers to securely execute API management activities such as create API routes, define services, etc. Runtime environments connect with the management plane using mutual transport layer authentication (mTLS), receive the updates and take customer-facing API traffic.
The diagram below illustrates the architecture:
This blog post describes basic Konnect Data Planes deployments on an Amazon EKS 1.29 Cluster, taking advantage of these 4 different autoscaling technologies.
Before we get started, it's important to make two critical comments:
The use cases we are going to run are not meant to be applied for production environments. The main purpose is to demonstrate how to leverage the Autoscaling capabilities we have available for our workloads and applications.
The Konnect Kong Gateway Data Plane deployments are not tuned to deliver the best performance they can get. For optimal Konnect Kong Gateway performance configuration check the official documentation.
Amazon EKS Cluster Creation
The first thing to do is create the EKS Cluster along with a NodeGroup. For the basic autoscaling use cases we are going to run, I have chosen the t3.2xlarge Instance Type which provides plenty of space for CPU and memory allocation.
The next step is to create a Konnect Control Plane (CP). We are going to use the Konnect Rest Admin API to do so. Later on, we will spin up the Konnect Data Plane (CP) in our EKS Cluster.
The CP/DP communication is based on mTLS so we need a Private Key and Digital Certification pair. To make that easy, let's use openssl to issue them.
To use the Konnect Rest Admin API, we need a Konnect PAT (Personal Access Token) in order to send requests to Konnect. Read the Konnect PAT documentation page to learn how to generate one.
Create a Konnect Control Plane with the following command. It configures the Pinned Mode for the CP and DP communication, meaning we are going to use the same Public Key to both CP and DP.
With the Konnect Control Plane defined, it's time to create a Kong Service and Route, so the Data Plane can have an application to consume. For this Autoscaling exploration, The Konnect Data Plane will consume an Upstream service (based on the httpbin echo application), running on the same Cluster, we are going to deploy later on. In fact, the Kong Service refers to the Kubernetes Service's FQDN.
We are ready to deploy the Konnect Data Plane (DP) in the EKS Cluster. Create a specific namespace and a secret with the Private Key and Digital Certificate pair.
kubectl create namespace kong
kubectl create secret tls kong-cluster-cert -n kong --cert=./kongcp1.crt --key=./kongcp1.key
Next, create the values.yaml we are going to use to deploy the Konnect Data Plane. Use the Control Plane endpoints you got when you created it.
Note we are going to expose the Data Plane with a Network Load Balancer, which provides the best performance for it. Besides, with the nodeSelector config we are explicitly referring to the NodeGroup we created previously.
Again, keep in mind this blog post describes a simple environment for autoscaling tests. That's the reason we are using nodeSelector for Pod isolation. As a best practice, for a production environment, it should be used as an exception.
From the VPA perspective, the first autoscaling technology we are going to explore, the resources section is the main configuration. It sets the CPU and memory requests and limits for the Data Plane.
Deploy the Konnect Data Plane with the Helm Charts:
helm repo add kong https://charts.konghq.comhelm repo update
helm install kong kong/kong -n kong --values ./values.yaml
You should see the Data Plane running:
% kubectl get pod -n kong
For a basic consumption test send a request to the Data Plane. Get the NLB DNS name with:
% kubectl get service -n kong -o json | jq -r ".items[].status.loadBalancer.ingress[].hostname"<your_nlb_dnsname>
http <your_nlb_dnsname>
You can also check the Data Plane's Pod resources:
% kubectl get pod -n kong -o json | jq ".items[].spec.containers[].resources"{"limits":{"cpu":"200m","memory":"300Mi"},"requests":{"cpu":"100m","memory":"200Mi"}}
Recommender monitors the current and past resource consumption and, based on it, provides recommended values for the containers' CPU and memory requests.
Updater checks which of the managed Pods have correct resources set and, if not, kills them so that they can be recreated by their controllers with the updated requests.
Admission Plugin sets the correct resource requests on new Pods (either just created or recreated by their controller due to Updater's activity).
VPA is configured with a new CRD ("Custom Resource Definition") object called VerticalPodAutoscaler. It allows to specify which pods should be vertically autoscaled as well as if/how the resource recommendations are applied.
VPA Installation
VPA is not available in Kubernetes by default, so we need to install it manually. To install VPA in your EKS Cluster clone the repo and run the vpa-up.sh script.
Among typical Kubernetes objects, such as CRDs, the process installs three new Deployments in the kube-system namespace: Recommender, Updater and a new Admission Controller.
To have better control over the Kubernetes resources we are going to deploy each component (Load Generator, Konnect Data Plane and Upstream Service) in a specific NodeGroup.
Create a new NodeGroup and install the Upstream Service
To create the NodeGroup for the Upstream Service run the following command. The c5.2xlarge Instance Type has enough resources to run the service, ensuring it doesn't become a bottleneck.
Now, install the Upstream Application. Note we are deploying 5 replicas of the application to provide a better performance. We also refer to the new NodeGroup with the nodeSelector configuration.
Now, let's consume the Data Plane. The Load Generator deployment is based on the Fortio load testing tool. Fortio is particularly interesting because it can be run at a fixed Query per Second (QPS) rate. For example, here is a Fortio deployment to send requests to the Data Plane. Fortio is configured to keep, for 3 minutes, a 1000 QPS rate across 800 parallel connections.
Also, note Fortio is consuming the Data Plane Kubernetes Service's FQDN:
Eventually, Fortio presents a summary of the load test. The main data are:
The P99 latency: for example, # target 99% 1.81244
The number of requests sent along with the QPS: All done 52007 calls (plus 800 warmup) 2775.391 ms avg, 287.6 qps
You can stop Fortio, simply deleting its Pod:
kubectl delete pod fortio
As you can see, the single Konnect Data Plane instance was not able to handle a 1000 QPS rate as requested. The main reason for this is that its Pod is restricted to consume up to 0.2 units of CPU and 300Mi for memory as set in the resources section of the Data Plane deployment declaration.
That's a nice opportunity to put VPA to run and see what it recommends.
VPA Policy
The VPA policy is described using the VertificalPodAutoscaler CRD. For a basic test, declare a policy like this:
The updateMode, configured as Auto, requests VPA to assign resources on Pod creation and updatesate mechanism. Check the documentation to learn more.
The default behavior of VPA Updater is to allow Pod eviction only if there are at least 2 live replicas, in order to avoid temporary total unavailability of a workload under VPA in Auto mode. This has been changed with minReplicas configuration (of course, this is not recommended for production environments).
minAllowed and maxAllowed configs set the range of resources (CPU and memory) to be considered.
Consume the Data Plane again
Let's run the same load test now and see what happens.
If you check the Updater with, for example, kubectl logs -f vpa-updater-884d4d7d9-mrdd6, you should see messages saying it has evicted and recreated the Pod with new resource configurations:
I0124 20:13:47.9530351 pods_eviction_restriction.go:219] overriding minReplicas from global 2 to per-VPA 1 for VPA kong/kong-vpa
I0124 20:13:47.9530821 update_priority_calculator.go:143] pod accepted for update kong/kong-kong-84964cbc8d-6sbc6 with priority 1.8132136917114257I0124 20:13:47.9530991 updater.go:215] evicting pod kong-kong-84964cbc8d-6sbc6
I0124 20:13:47.9908981 event.go:285] Event(v1.ObjectReference{Kind:"Pod", Namespace:"kong", Name:"kong-kong-84964cbc8d-6sbc6", UID:"04baab83-9d75-4f14-8887-cb79c88e8cfd", APIVersion:"v1", ResourceVersion:"2350", FieldPath:""}): type: 'Normal' reason: 'EvictedByVPA' Pod was evicted by VPA Updater to apply resource recommendation.
I0124 20:14:06.3456721 reflector.go:559] k8s.io/client-go/informers/factory.go:150: Watch close - *v1.LimitRange total 7 items received
This is due to the new recommendations provided by the Recommender. For example, kubectl logs -f vpa-recommender-597b7c765d-vljms, should show messages like these:
I0124 20:12:49.0472811 recommender.go:155] Recommender Run
I0124 20:12:49.0473151 cluster_feeder.go:317] Start selecting the vpaCRDs.
I0124 20:12:49.0473201 cluster_feeder.go:352] Fetched 1 VPAs.
I0124 20:12:49.0473841 cluster_feeder.go:362] Using selector app.kubernetes.io/component=app,app.kubernetes.io/instance=kong,app.kubernetes.io/name=kong for VPA kong/kong-vpa
I0124 20:12:49.0547071 metrics_client.go:74]15 podMetrics retrieved for all namespaces
I0124 20:12:49.0548201 cluster_feeder.go:440] ClusterSpec fed with #36 ContainerUsageSamples for #18 containers. Dropped #0 samples.
I0124 20:12:49.0548351 recommender.go:165] ClusterState is tracking 15 PodStates and 1 VPAs
I0124 20:12:49.1016471 checkpoint_writer.go:114] Saved VPA kong/kong-vpa checkpoint for proxy
I0124 20:12:49.1016761 recommender.go:175] ClusterState is tracking 11 aggregated container states
As usual, Fortio should report the test results. In my case, I got the following:
The P99 latency: target 99% 1.02964
The number of requests sent along with the QPS: All done 180000 calls (plus 800 warmup) 650.192 ms avg, 993.4 qps
As we can see, now the DP was able to handle the QPS we requested with a much better latency time.
Let's check the VPA policy now:
% kubectl get vpa kong-vpa -n kong
NAME MODE CPU MEM PROVIDED AGE
kong-vpa Auto 548m 548861636 True 6m17s
You can get a more detailed view with the following command. The output shows that VPA generates some recommendations. Target is the actual recommendation as Upper and Lower bounds are indicators of the recommendation.
% kubectl get vpa kong-vpa -n kong -o json | jq ".status.recommendation"{"containerRecommendations":[{"containerName":"proxy","lowerBound":{"cpu":"200m","memory":"300Mi"},"target":{"cpu":"548m","memory":"548861636"},"uncappedTarget":{"cpu":"548m","memory":"548861636"},"upperBound":{"cpu":"2","memory":"2Gi"}}]}
Most importantly, we should be able to see the VPA effects over the Data Plane Pod. Note that VPA recommendation keeps the proportionality of the resource requests and limits.
% kubectl get pod -n kong -o json | jq ".items[].spec.containers[].resources"{"limits":{"cpu":"1096m","memory":"823292454"},"requests":{"cpu":"548m","memory":"548861636"}}
In general, VPA is not expected to provide fast actions. In fact, it is more useful to get general trends. For quick reactions, Horizontal Pod Autoscaler (HPA) is a better approach. Check the next post in this series to see Konnect Data Plane working along HPA.
In the rapidly evolving landscape of API management, understanding the raw performance and reliability of your API gateway is not just an expectation — it's a necessity. At Kong, we're dedicated to ensuring our users have access to concrete, action
Kong
Kong Konnect DP Node Autoscaling with Karpenter on Amazon EKS 1.29
In this post, we're going to explore Karpenter, the ultimate solution for Node Autoscaling. Karpenter provides a cost-effective capability to implement your Kong Konnect Data Plane layer using the best EC2 Instances Types options available for your
Claudio Acquaviva
Kong Konnect DP Node Autoscaling with Cluster Autoscaler on AWS EKS 1.29
After getting our Konnect Data Planes vertically and horizontally scaled, with VPA and HPA , it's time to explore the Kubernete Node Autoscaler options. In this post, we start with the Cluster Autoscaler mechanism. (Part 4 in this series is dedic
Claudio Acquaviva
AI Agent with Strands SDK, Kong AI/MCP Gateway & Amazon Bedrock
In one of our posts, Kong AI/MCP Gateway and Kong MCP Server technical breakdown, we described the new capabilities added to Kong AI Gateway to support MCP (Model Context Protocol). The post focused exclusively on consuming MCP server and MCP tool
Jason Matis
Kong Simplifies Multicloud Cloud Gateways with Managed Redis Cache
Managed Redis cache is a turnkey "Shared State" add-on for Kong Dedicated Cloud Gateways. It is designed to combine the performance of an in-memory data store with the simplicity of a SaaS product. When you spin up a Dedicated Cloud Gateway in Kong
Key Takeaways API testing is crucial for ensuring the reliability, security, and performance of modern applications. Different types of testing, such as functional, security, performance, and integration testing, should be employed to cover all aspe
Adam Bauman
6 Reasons Why Kong Insomnia Is Developers' Preferred API Client
So, what exactly is Kong Insomnia? Kong Insomnia is your all-in-one platform for designing, testing, debugging, and shipping APIs at speed. Built for developers who need power without bloat, Insomnia helps you move fast whether you’re working solo,
Juhi Singh
Ready to see Kong in action?
Get a personalized walkthrough of Kong's platform tailored to your architecture, use cases, and scale requirements.