# Kong Konnect Data Plane Elasticity on Amazon EKS 1.29: Pod Autoscaling with VPA
Claudio Acquaviva
Principal Architect, Kong
In this series of posts, we will look closely at how Kong Konnect Data Planes can take advantage of Autoscalers running on Amazon Elastic Kubernetes Services (EKS) 1.29 to support the throughput the demands API consumers impose on it at the lowest cost. The series comprises four parts:
In summary, companies can reduce their costs with the "pay-as-you-go" model provided by cloud elasticity.
It's important to clarify some fundamental concepts. Given an application, in order to support different workloads, basically, we have two main options:
- Vertical scalability (scale up/down): you are adding or subtracting hardware resources (memory, CPU, storage) to/from your application.
- Horizontal scalability (scale out/in): you are adding or subtracting more nodes or servers and distributing the workload across them.
Elasticity goes beyond that. It provides the capacity to automatically provision and deprovision hardware resources based on demand. In other words, you will be able to vertically scale up/down or horizontally scale in/out your applications in an automatic way. This is also called "autoscaling."
### Autoscaling and Kubernetes
Autoscaling is one of the most compelling features of the Kubernetes platform. Basically, it is available in two different perspectives:
The following diagram shows both Vertical and Horizontal mechanisms. Note that, they work with a single Node since they are Pod Autoscaling options.
#### 2. Cluster Scalability
Also called Node Autoscaling, this adds and subtracts Kubernetes Nodes of your Cluster. lt can be achieved with two different autoscalers:
- [Cluster Autoscaler (CAS) ](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler)Cluster Autoscaler (CAS) is a Kubernetes-native tool that adds or removes Nodes, based on pending Pods and Node usage metrics. Cluster Autoscaler lacks the capacity to provision Nodes based on the requirements a Pod requests to run. It just identifies an unscheduled Pod and replicates an existing Node from the NodeGroup with identical specifications. It relies on EC2 and Auto Scaling Groups (ASG)
- [Karpenter ](https://karpenter.sh/)Karpenter is an open source node provisioning project built for Kubernetes. Differently from CAS, Karpenter doesn't manage NodeGroups, instantiating EC2s directly and adding them as regular group-less Nodes. The main benefit is that it chooses the right Instance Type to support the required throughput. It can also consolidate multiple Nodes into larger ones.
The following diagram compares the two Cluster Scalability options. The main difference highlighted here is the capacity provided by Karpenter to choose Instant Types other than the original one created.
The Control Plane enables customers to securely execute API management activities such as create API routes, define services, etc. Runtime environments connect with the management plane using mutual transport layer authentication (mTLS), receive the updates and take customer-facing API traffic.
Before we get started, it's important to make two critical comments:
- The use cases we are going to run are not meant to be applied for production environments. The main purpose is to demonstrate how to leverage the Autoscaling capabilities we have available for our workloads and applications.
The CP/DP communication is based on mTLS so we need a Private Key and Digital Certification pair. To make that easy, let's use `openssl` to issue them.
We are ready to deploy the Konnect Data Plane (DP) in the EKS Cluster. Create a specific namespace and a secret with the Private Key and Digital Certificate pair.
kubectl create namespace kong
kubectl create secret tls kong-cluster-cert -n kong --cert=./kongcp1.crt --key=./kongcp1.key
Next, create the `values.yaml` we are going to use to deploy the Konnect Data Plane. Use the Control Plane endpoints you got when you created it.
Note we are going to expose the Data Plane with a Network Load Balancer, which provides the best performance for it. Besides, with the `nodeSelector` config we are explicitly referring to the NodeGroup we created previously.
Again, keep in mind this blog post describes a simple environment for autoscaling tests. That's the reason we are using `nodeSelector` for Pod isolation. As a best practice, for a production environment, it should be used as an exception.
Deploy the Konnect Data Plane with the Helm Charts:
helm repo add kong https://charts.konghq.comhelm repo update
helm install kong kong/kong -n kong --values ./values.yaml
You should see the Data Plane running:
% kubectl get pod -n kong
For a basic consumption test send a request to the Data Plane. Get the NLB DNS name with:
% kubectl get service -n kong -o json | jq -r ".items[].status.loadBalancer.ingress[].hostname"<your_nlb_dnsname>
http <your_nlb_dnsname>
You can also check the Data Plane's Pod resources:
% kubectl get pod -n kong -o json | jq ".items[].spec.containers[].resources"{"limits":{"cpu":"200m","memory":"300Mi"},"requests":{"cpu":"100m","memory":"200Mi"}}
VPA is configured with a new CRD ("Custom Resource Definition") object called `VerticalPodAutoscaler`. It allows to specify which pods should be vertically autoscaled as well as if/how the resource recommendations are applied.
### VPA Installation
VPA is not available in Kubernetes by default, so we need to install it manually. To install VPA in your EKS Cluster clone the repo and run the vpa-up.sh script.
Among typical Kubernetes objects, such as CRDs, the process installs three new Deployments in the `kube-system` namespace: Recommender, Updater and a new Admission Controller.
To have better control over the Kubernetes resources we are going to deploy each component (Load Generator, Konnect Data Plane and Upstream Service) in a specific NodeGroup.
### Create a new NodeGroup and install the Upstream Service
To create the NodeGroup for the Upstream Service run the following command. The c5.2xlarge Instance Type has enough resources to run the service, ensuring it doesn't become a bottleneck.
Now, install the Upstream Application. Note we are deploying 5 replicas of the application to provide a better performance. We also refer to the new NodeGroup with the nodeSelector configuration.
Now, let's consume the Data Plane. The Load Generator deployment is based on the [Fortio](https://fortio.org/)Fortio load testing tool. Fortio is particularly interesting because it can be run at a fixed Query per Second (QPS) rate. For example, here is a Fortio deployment to send requests to the Data Plane. Fortio is configured to keep, for 3 minutes, a 1000 QPS rate across 800 parallel connections.
Also, note Fortio is consuming the Data Plane Kubernetes Service's FQDN:
Eventually, Fortio presents a summary of the load test. The main data are:
- The P99 latency: for example, `# target 99% 1.81244`
- The number of requests sent along with the QPS: `All done 52007 calls (plus 800 warmup) 2775.391 ms avg, 287.6 qps`
You can stop Fortio, simply deleting its Pod:
kubectl delete pod fortio
As you can see, the single Konnect Data Plane instance was not able to handle a 1000 QPS rate as requested. The main reason for this is that its Pod is restricted to consume up to 0.2 units of CPU and 300Mi for memory as set in the resources section of the Data Plane deployment declaration.
That's a nice opportunity to put VPA to run and see what it recommends.
### VPA Policy
The VPA policy is described using the `VertificalPodAutoscaler` CRD. For a basic test, declare a policy like this:
- The default behavior of VPA Updater is to allow Pod eviction only if there are at least 2 live replicas, in order to avoid temporary total unavailability of a workload under VPA in Auto mode. This has been changed with `minReplicas` configuration (of course, this is not recommended for production environments).
- `minAllowed` and `maxAllowed` configs set the range of resources (CPU and memory) to be considered.
### Consume the Data Plane again
Let's run the same load test now and see what happens.
If you check the Updater with, for example, `kubectl logs -f vpa-updater-884d4d7d9-mrdd6`, you should see messages saying it has evicted and recreated the Pod with new resource configurations:
I0124 20:13:47.9530351 pods_eviction_restriction.go:219] overriding minReplicas from global 2 to per-VPA 1 for VPA kong/kong-vpa
I0124 20:13:47.9530821 update_priority_calculator.go:143] pod accepted for update kong/kong-kong-84964cbc8d-6sbc6 with priority 1.8132136917114257I0124 20:13:47.9530991 updater.go:215] evicting pod kong-kong-84964cbc8d-6sbc6
I0124 20:13:47.9908981 event.go:285] Event(v1.ObjectReference{Kind:"Pod", Namespace:"kong", Name:"kong-kong-84964cbc8d-6sbc6", UID:"04baab83-9d75-4f14-8887-cb79c88e8cfd", APIVersion:"v1", ResourceVersion:"2350", FieldPath:""}): type: 'Normal' reason: 'EvictedByVPA' Pod was evicted by VPA Updater to apply resource recommendation.
I0124 20:14:06.3456721 reflector.go:559] k8s.io/client-go/informers/factory.go:150: Watch close - *v1.LimitRange total 7 items received
This is due to the new recommendations provided by the Recommender. For example, `kubectl logs -f vpa-recommender-597b7c765d-vljms`, should show messages like these:
I0124 20:12:49.0472811 recommender.go:155] Recommender Run
I0124 20:12:49.0473151 cluster_feeder.go:317] Start selecting the vpaCRDs.
I0124 20:12:49.0473201 cluster_feeder.go:352] Fetched 1 VPAs.
I0124 20:12:49.0473841 cluster_feeder.go:362] Using selector app.kubernetes.io/component=app,app.kubernetes.io/instance=kong,app.kubernetes.io/name=kong for VPA kong/kong-vpa
I0124 20:12:49.0547071 metrics_client.go:74]15 podMetrics retrieved for all namespaces
I0124 20:12:49.0548201 cluster_feeder.go:440] ClusterSpec fed with #36 ContainerUsageSamples for #18 containers. Dropped #0 samples.
I0124 20:12:49.0548351 recommender.go:165] ClusterState is tracking 15 PodStates and 1 VPAs
I0124 20:12:49.1016471 checkpoint_writer.go:114] Saved VPA kong/kong-vpa checkpoint for proxy
I0124 20:12:49.1016761 recommender.go:175] ClusterState is tracking 11 aggregated container states
As usual, Fortio should report the test results. In my case, I got the following:
- The P99 latency: `target 99% 1.02964`
- The number of requests sent along with the QPS: `All done 180000 calls (plus 800 warmup) 650.192 ms avg, 993.4 qps`
As we can see, now the DP was able to handle the QPS we requested with a much better latency time.
Let's check the VPA policy now:
% kubectl get vpa kong-vpa -n kong
NAME MODE CPU MEM PROVIDED AGE
kong-vpa Auto 548m 548861636 True 6m17s
You can get a more detailed view with the following command. The output shows that VPA generates some recommendations. Target is the actual recommendation as Upper and Lower bounds are indicators of the recommendation.
% kubectl get vpa kong-vpa -n kong -o json | jq ".status.recommendation"{"containerRecommendations":[{"containerName":"proxy","lowerBound":{"cpu":"200m","memory":"300Mi"},"target":{"cpu":"548m","memory":"548861636"},"uncappedTarget":{"cpu":"548m","memory":"548861636"},"upperBound":{"cpu":"2","memory":"2Gi"}}]}
Most importantly, we should be able to see the VPA effects over the Data Plane Pod. Note that VPA recommendation keeps the proportionality of the resource requests and limits.
% kubectl get pod -n kong -o json | jq ".items[].spec.containers[].resources"{"limits":{"cpu":"1096m","memory":"823292454"},"requests":{"cpu":"548m","memory":"548861636"}}
In general, VPA is not expected to provide fast actions. In fact, it is more useful to get general trends. For quick reactions, Horizontal Pod Autoscaler (HPA) is a better approach. Check the next post in this series to see Konnect Data Plane working along HPA.
In the rapidly evolving landscape of API management, understanding the raw performance and reliability of your API gateway is not just an expectation — it's a necessity. At Kong, we're dedicated to ensuring our users have access to concrete, action
In this post, we're going to explore Karpenter, the ultimate solution for Node Autoscaling. Karpenter provides a cost-effective capability to implement your Kong Konnect Data Plane layer using the best EC2 Instances Types options available for your
After getting our Konnect Data Planes vertically and horizontally scaled, with VPA and HPA , it's time to explore the Kubernete Node Autoscaler options. In this post, we start with the Cluster Autoscaler mechanism. (Part 4 in this series is dedic
In one of our posts, Kong AI/MCP Gateway and Kong MCP Server technical breakdown, we described the new capabilities added to Kong AI Gateway to support MCP (Model Context Protocol). The post focused exclusively on consuming MCP server and MCP tool
Free collaboration with Postman — a myth On March 1st, 2026, Postman discontinued free collaboration for small teams. Now , Git or Cloud-native collaboration requires a Team plan starting at $19 per person per month. That means even a 3-person team
Managed Redis cache is a turnkey "Shared State" add-on for Kong Dedicated Cloud Gateways. It is designed to combine the performance of an in-memory data store with the simplicity of a SaaS product. When you spin up a Dedicated Cloud Gateway in Kong
Key Takeaways API testing is crucial for ensuring the reliability, security, and performance of modern applications. Different types of testing, such as functional, security, performance, and integration testing, should be employed to cover all aspe