Kong AI Gateway supports a wide range of use cases to help accelerate the adoption and rollout of new AI applications into production.
Support for the existing OpenAI SDK
Kong AI Gateway provides one API to access all of the LLMs it supports. To accomplish this, we've standardized on the OpenAI API specification. This will help developers to onboard more quickly by providing them with an API specification that they’re already familiar with.
In this new release, we're making it even easier to build AI agents and applications using Kong AI Gateway by natively supporting the OpenAI SDK client library. You can start using LLMs behind the AI Gateway simply by redirecting your requests to a URL that points to a route of the AI Gateway.
If you have existing business logic written using the OpenAI SDK, you can re-use it to consume every LLM supported by Kong AI Gateway, removing the need to alter your code, given it will be 100% compatible.
Introducing AI streaming support (OSS + Enterprise)
Streaming in the “ai-proxy” plugin when consuming every LLM provider is now natively supported by Kong AI Gateway. This unlocks more real-time experiences, rather than having to wait for the full response to be processed by the LLM before sending it back to the client.
The response will now be sent token-by-token in HTTP response chunks (SSE). The capability can be enabled in the plugin configuration by setting the following property of “ai-proxy”:
Which then allows the clients to request streaming by making requests like:
With this capability, Kong AI Gateway users can create more compelling and interactive AI experiences.
New plugin: AI token rate limiting advanced (Enterprise)
We're introducing a new enterprise-only AI capability to rate-limit the usage of any LLM by the number of request tokens. By enabling the new “ai-rate-limiting-advanced” plugin, customers can better manage AI spend across the board by specifying different levels of consumption for different teams in the organization. For self-hosted LLM providers, customers will be able to better scale their traffic on the AI infrastructure when the AI traffic increases across the applications.
Kong already provides API rate-limiting capabilities which rate-limits based on the number of requests that are being sent to an API. The new ai-rate-limiting-advanced plugin instead focuses on the number of AI tokens requested, regardless of the number of raw HTTP requests being sent to it. If the customer wants to rate-limit both raw requests and AI tokens specifically, the ai-rate-limiting-advanced plugin can work in combination with the standard Kong rate-limiting plugin.
The ai-rate-limiting-advanced plugin is the only rate-limiting plugin available today for AI.