AI Gateway Benchmark: Kong AI Gateway, Portkey, and LiteLLM

July 7, 2025

5 min read

Claudio Acquaviva

Principal Architect, Kong

In February 2024, Kong became the first API platform to launch a dedicated AI gateway, designed to bring production-grade performance, observability, and policy enforcement to GenAI workloads.

At its core, Kong’s AI Gateway provides a universal API to enable platform teams to centrally secure and govern traffic to LLMs, AI agents, and MCP servers. Additionally, as AI adoption in your organization begins to skyrocket, so do AI usage costs. With Kong, teams are able to keep their AI costs in check with several techniques, including applying token rate limiting per consumer, caching responses to redundant prompts, and automatically routing requests to the best model for the prompt.

In the latest 3.11 release, Kong continues to help customers deliver production-ready GenAI projects with comprehensive support for multi-modal and agentic use cases.

From playground to production: The real test for AI gateways

Experimenting with AI is easy, but when it comes to safely and efficiently rolling out AI projects into production, this is a far greater challenge. Many new AI gateways can handle basic GenAI use cases, but most have never been tested under the demanding, high-throughput conditions enterprises require. Additionally, once your organization begins working with natural language processing (NLP), embeddings, and unstructured data like images, performance isn’t optional — it’s critical.

Kong stands apart by extending its battle-tested API infrastructure to power modern AI workloads.
Kong AI Gateway is built on the same highly performant runtime — Kong Gateway — that already supports mission-critical APIs across the world’s largest organizations.

In this blog post, we'll share performance benchmark results comparing Kong AI Gateway to newer offerings like Portkey and LiteLLM. We’ll walk through the test setup, execution, and what the data reveals about each offering’s performance at scale.

Benchmark architecture

The benchmark tests were executed in AWS. The server infrastructure ran on an Amazon Elastic Kubernetes Service (EKS) cluster, 1.32. In order to have better control over the AI Gateways and remove the native LLM infrastructure variables, such as latency time and throughput, we mocked an LLM with WireMock to expose OpenAI-based endpoints. WireMock is an open-source tool used to simulate API responses.

The AI Gateways were exposed to the consumers through a Network Load Balancer (NLB) to protect them from external interference. Similarly, the mocking LLM was exposed with an NLB. In order to not compete for the same hardware (HW) resources, the AI Gateways and WireMock ran in their own EKS Nodes based on the c5.4xlarge instance type with 16 vCPUs and 32GiB of memory.

Lastly, K6 played the load generator role, running on an EC2 instance deployed in the same VPC as the EKS Cluster.

Here's a diagram illustrating the benchmark architecture:

The following versions were used for the benchmark tests:

Kong Gateway 3.10
Portkey OSS 1.9.19
LiteLLM 1.63.7

This Kong's GitHub repository presents all instructions and artifacts necessary for creating the same environment used in the performance tests.

Baseline

The tests took the WireMock results as the baseline. WireMock was deployed in a single Pod with a configuration to consume all resources available in the EKS Node.

By submitting the WireMock EKS node to the same throughput, we were able to achieve the following results:

29005.51 RPS
P95 of 24.07ms
P99 of 30.35ms

Here's an AWS CloudWatch picture, during a 10-minute run, showing CPU utilization. The orange line is K6's EC2 as the blue line is WireMock EKS Node. As you can see, we were able to stress WireMock while K6 still had some resources available. That's important to make sure that WireMock does not become a bottleneck during the actual AI gateway runs.

AI gateway consumption and policies

The main goal was to compare the AI gateways in similar scenarios consuming the same resources. All deployments had the resource configuration with an upper limit of 12 CPUs.

For the comparison tests, all gateways were configured to play the proxy role only, meaning no policies, like caching or API key-based authentication mechanisms, were configured. As we start adding policies, we expect to see new and different results.

For all benchmark tests, K6 ran for 3 minutes and always injected the same throughput to all AI Gateways with 400 VUs (each VU representing a consumer) sending requests with 1000 prompt tokens.

Requests per second

The benchmark results presented the following results for RPS (requests per second).

Observations:

For Kong, we had two deployments:
- Self-managed Kong Enterprise, with Control Plane and Data Plane separation, fully deployed in EKS.
- Kong Data Plane deployed in EKS connected to a Konnect Control Plane running in Kong Cloud.

Kong Konnect Data Planes showed a performance increase of over 200% when compared to PortKey, and over 800% against LiteLLM.

p95 and p99

Here are the p95 latency times:

The p99 latency times had proportional similarities:

EKS nodes consumption

For a visual representation of the gateways, here are the CloudWatch screenshots for each one of them during a 10-minute run:

Orange line: K6's EC2
Blue line: WireMock EKS Node
Green line: AI Gateway EKS Node

The most notable aspect is that Kong was not just able to consume the HW resources provided, but was also capable of stressing the LLM layer implemented by WireMock. The same behaviour was not observed with LiteLLM or Portkey. Note that all AI Gateways, as they had a maximum of 12 CPUs for allocation settings, showed a rate of 70–80% for CPU usage.

Conclusion

This blog post has presented a benchmark scenario comparing Kong AI Gateway with Portkey and LiteLLM.

With all AI Gateways allocating the same amount of 12 CPUs, the results demonstrated that Kong Konnect Data Planes were over 228% faster than Portkey and 859% faster than LiteLLM. At the same time, Kong had 65% lower latency compared to Portkey and 86% lower latency than LiteLLM.

This was a basic comparison where AI Gateway played the proxy role only, with no policies defined to implement Rate Limiting or Authentication, for example.

It’s important to keep in mind that although AI gateways are effective for managing core GenAI use cases, APIs are also a critical part of the equation. APIs are the connective tissue that power agentic workflows. This means that implementing robust API infrastructure is not just important, but that API infrastructure is mission-critical for enabling secure and scalable AI experiences.

Kong is able to unify API and AI management in a single platform, giving platform teams the control, visibility, and automation they need to scale AI-driven workloads. With 100+ enterprise-grade capabilities, teams can apply authentication, token quotas, and observability to LLM traffic, just as they would with APIs. A self-service portal and catalog are also available to enable developers and agents to discover and subscribe to services

Finally, the architectural flexibility of Kong’s infrastructure enables customers to choose between fully self-hosted deployments, hybrid deployments, dedicated cloud gateways, or serverless gateways. All of this can be done within monolithic, microservices, Kubernetes, or multi-cloud environments.

You can discover all the features available on the Kong AI Gateway product page.

Learn More Get a Demo

Topics:AI Gateway

Performance

Governance