Kong AI Gateway Goes GA, New Enterprise Capabilities Added
More easily manage AI spend, build AI agents and chatbots, get real-time AI responses, and ensure content safety
We're introducing several new Kong AI Gateway capabilities in Kong Gateway 3.7 and Kong Gateway Enterprise 3.7, including enterprise-only and OSS improvements. Read on for a full rundown of the new AI-focused features.
AI Gateway becomes a GA capability (OSS + Enterprise)
With the new Kong Gateway 3.7 release, we're promoting Kong AI Gateway to GA status.
Starting today, AI developers can now focus on building AI-specific use cases — like LLM RAG chatbots or AI integrations — without having to build the underlying infrastructure to establish a secure and observable lifecycle for AI applications in production. This is fully supported by Kong at scale on both Kong Konnect and Kong Gateway Enterprise.
Kong AI Gateway can also be provisioned entirely in the cloud as a dedicated SaaS service with Kong’s new Konnect Dedicated Cloud Gateways offering.
Kong AI Gateway supports a wide range of use cases to help accelerate the adoption and rollout of new AI applications into production.
Support for the existing OpenAI SDK
Kong AI Gateway provides one API to access all of the LLMs it supports. To accomplish this, we've standardized on the OpenAI API specification. This will help developers to onboard more quickly by providing them with an API specification that they’re already familiar with.
In this new release, we're making it even easier to build AI agents and applications using Kong AI Gateway by natively supporting the OpenAI SDK client library. You can start using LLMs behind the AI Gateway simply by redirecting your requests to a URL that points to a route of the AI Gateway.
If you have existing business logic written using the OpenAI SDK, you can re-use it to consume every LLM supported by Kong AI Gateway, removing the need to alter your code, given it will be 100% compatible.
Introducing AI streaming support (OSS + Enterprise)
Streaming in the “ai-proxy” plugin when consuming every LLM provider is now natively supported by Kong AI Gateway. This unlocks more real-time experiences, rather than having to wait for the full response to be processed by the LLM before sending it back to the client.
The response will now be sent token-by-token in HTTP response chunks (SSE). The capability can be enabled in the plugin configuration by setting the following property of “ai-proxy”:
Which then allows the clients to request streaming by making requests like:
With this capability, Kong AI Gateway users can create more compelling and interactive AI experiences.
New plugin: AI token rate limiting advanced (Enterprise)
We're introducing a new enterprise-only AI capability to rate-limit the usage of any LLM by the number of request tokens. By enabling the new “ai-rate-limiting-advanced” plugin, customers can better manage AI spend across the board by specifying different levels of consumption for different teams in the organization. For self-hosted LLM providers, customers will be able to better scale their traffic on the AI infrastructure when the AI traffic increases across the applications.
Kong already provides API rate-limiting capabilities which rate-limits based on the number of requests that are being sent to an API. The new ai-rate-limiting-advanced plugin instead focuses on the number of AI tokens requested, regardless of the number of raw HTTP requests being sent to it. If the customer wants to rate-limit both raw requests and AI tokens specifically, the ai-rate-limiting-advanced plugin can work in combination with the standard Kong rate-limiting plugin.
The ai-rate-limiting-advanced plugin is the only rate-limiting plugin available today for AI.
New plugin: New AI Azure Content Safety (Enterprise)
The new enterprise plugin “ai-azure-content-safety” allows customers to seamlessly integrate with the Azure AI Content Safety service to validate prompts that are being requested via the AI Gateway across every LLM supported (not only Azure AI).
For example, the customer may want to detect and filter out all violence, hate, sexual, and self-harm content across all prompts sent to any LLM provider in Kong AI Gateway using Azure’s native services.
Dynamic URL-sourced LLM model in ai-proxy (OSS + Enterprise)
It's now possible to configure the requested model dynamically via the URL path requested by the client. Additionally, users can consume a model by hard coding its name in the plugin configuration. By enabling this capability, it becomes easier to scale Kong AI Gateway across the teams that want to experiment with a wide variety of models, without having to pre-configure them in the “ai-proxy” plugin.
By allowing “ai-proxy” to set up the LLM route using the URL requested by the client, it's possible to apply the “ai-proxy” plugin once and then support all models available by the underlying AI provider by parsing the URL path requested instead.
This capability can be configured with the new “config.route_source” configuration parameter in “ai-proxy”.
Support for Anthropic Claude 2.1 Messages API (OSS + Enterprise)
Kong AI Gateway provides one API interface to consume models across both cloud and self-hosted providers. We've expanded our unified API interface to also support the Anthropic Claude 2.1 Messages API typically used to create chatbots or virtual assistant applications. The API manages the conversational exchanges between a user and an Anthropic Claude model (assistant).
Kong AI Gateway will continuously add support for more LLMs and models based on user demand.
Updated AI analytics format (OSS + Enterprise)
With Kong AI Gateway going into GA, we've updated our analytics logging format for all AI requests processed by Kong.
With this new logging format, we can now measure consumption across every model that has been requested by “ai-proxy,” “ai-request-transformer,” and “ai-response-transformer.”
This new analytics log format replaces the old one.
Get started today with Kong AI Gateway
Get started today with Kong AI Gateway and accelerate the rollout of AI applications in production in a secure, observable, and scalable way.