# Govern the Full AI Data Path with Kong AI Gateway 3.14
Greg Peranich
Staff Product Manager, Kong
The shift from single-model AI features to multi-agent pipelines is no longer a future concern — it's running in production today. MCP has become the de facto protocol for tool-calling, agent-to-agent (A2A) communication patterns are proliferating, and enterprise teams are wiring together complex AI workflows that span multiple providers, services, and agents. Every hop in that data path is an opportunity for something to go wrong.
The challenge is governance. When auth logic, rate limits, and tool permissions live scattered across application code, platform teams lose visibility and control. Policy becomes inconsistent. Audits become painful. And every new model integration requires another round of code changes.
Kong AI Gateway 3.14 is built around closing that gap at the gateway layer — governing the full AI data path, from the first model request to the last agent hop, without requiring application teams to change their code. In this release: native A2A traffic management, token exchange and scope-based tool filtering, enhanced rate limiting and guardrails, and support for Databricks, DeepSeek, and vLLM.
**This content contains a video which can not be displayed in Agent mode**
## A2A Support: Route and govern agent-to-agent traffic
Agent-to-agent communication is the next frontier of AI infrastructure. As teams decompose monolithic AI workflows into specialized agents — a research agent, a booking agent, a summarization agent — the calls between those agents become as important to govern as the calls from end users.
With 3.14, Kong AI Gateway adds native support for A2A communication patterns. Kong sits in the path of agent-to-agent traffic, applying the same authentication, rate limiting, and observability policies it applies to client-to-agent calls. This positions Kong as a central hub for all AI traffic in your environment, regardless of whether that traffic originates from a human or another agent.
This release also ships structured logging for A2A calls — capturing payloads and statistics on every interaction and surfacing them through Kong's standard log plugins. Platform teams finally have the visibility to debug multi-agent pipelines, identify bottlenecks, and prove compliance without instrumenting every agent individually.
# KongAir: Route A2A traffic from orchestrator to a specialized booking agentservices:-name: booking-agent
url: http://booking-agent.internal
routes:-name: a2a-booking
paths:["/agents/booking"]plugins:-name: ai-a2a-proxy
service: booking-agent
config:logging:log_statistics:truelog_payloads:truemax_payload_size:1048576max_request_body_size:1048576
## Token Exchange and Scope-based Tool Filtering: Zero-trust for agentic workflows
Governing what agents can do — and on whose behalf — is one of the hardest problems in enterprise AI. 3.14 ships two features that close this gap together.
**Token Exchange** (RFC 8693) allows Kong to intercept an incoming token and exchange it for a new one with different scopes, audience, or delegation claims before forwarding to a downstream service. This unlocks token downscoping (agents carry only the minimum permissions they need), audience restriction (tokens bound to specific services), and on-behalf-of delegation (preserving the original human's identity across agent hops). Supported in both OpenID Connect and MCP OAuth2 flows.
**Scope-based Tool Filtering** extends the MCP tool access control introduced in 3.13. Where 3.13 required consumer group mapping, 3.14 lets you restrict tool access natively using OAuth2 scopes from the incoming token — no consumer group management required. An agent carrying a ***flights:read*****scope gets read-only tools. An agent with ***flights:write*****gets more. The policy lives at the gateway, not in your agent code.
# KongAir: Token exchange and scope-based tool ACL for the flights serviceplugins:-name: ai-mcp-oauth2
service: kongair-flights-api
config:resource:"https://flights.kongair.internal"authorization_servers:-"https://auth.kongair.internal"passthrough_credentials:true# required when token_exchange is enabledtoken_exchange:enabled:truetoken_endpoint:"https://auth.kongair.internal/oauth/token"client_id:"kong-gateway"client_secret:"{vault://kv/kong/client-secret}"request:audience:-"https://flights.kongair.internal"scopes:-"flights:read"# downscoped from orchestrator's broader permissions-name: ai-mcp-proxy
service: kongair-flights-api
config:mode: conversion-listener
acl_attribute_type: oauth_access_token
access_token_claim_field: scope
tools:-name: search_flights
description: Search available flights between two cities
method: GET
path: /search
acl:allow:["flights:read","flights:write"]-name: book_flight
description: Book a new flight reservation
method: POST
path: /book
acl:allow:["flights:write"]
## JWK-based MCP Token Validation: Auth for every authorization server
Token introspection works well when your authorization server supports it — but not all do. Some providers publish JWKs and expect consumers to validate tokens locally. Until now, that left a gap for teams whose authorization server doesn't expose an introspection endpoint.
3.14 adds JWK-based token validation to the MCP OAuth2 plugin. Kong fetches the public keys from the authorization server's JWKs endpoint, caches them, and validates incoming MCP tokens locally on every request — no round-trip to the auth server required. This covers the full auth model: teams using introspection-capable servers can continue as before; teams using JWK-only servers now have a first-class path.
# KongAir: Validate MCP tokens locally using Entra ID's published JWK Setplugins:-name: ai-mcp-oauth2
service: kongair-flights-api
config:resource:"api://kongair-flights"authorization_servers:-"https://login.microsoftonline.com/kongair.onmicrosoft.com/v2.0"scopes:[]jwks_endpoint:"https://login.microsoftonline.com/kongair.onmicrosoft.com/discovery/v2.0/keys"jwks_ttl:3600
## Route by Model in Body: Dynamic model routing without client changes
Platform teams have wanted this for a while: route AI traffic to different upstream providers based on the model field in the request body — without requiring clients to know anything about backend topology.
3.14 adds body-based model routing to ***ai-proxy-advanced*** via a ***model_alias*** field on each target. When a request arrives, Kong inspects the ***model*** field in the body, matches it against the ***model_alias*** configured on each target, and routes to the right provider. The client never needs to know which provider sits behind the alias.
This decouples the model naming convention your teams use from the provider-specific model IDs your infrastructure runs. A client requests powerful, Kong routes to GPT-4o on your Azure enterprise tenant. A client requests cheap, Kong routes to a self-hosted Llama instance via vLLM. Different providers, different rate limits, same API surface.
# KongAir: Route by model alias to different providers — clients use friendly namesplugins:-name: ai-proxy-advanced
config:balancer:algorithm: priority
targets:-route_type: llm/v1/chat
model:provider: azure
name: gpt-4o
model_alias: powerful # client sends "model": "powerful"options:azure_instance: kongair-openai
azure_deployment_id: gpt-4o-prod
azure_api_version:"2024-02-01"auth:header_name: Authorization
header_value:"{vault://env/azure-openai-key}"logging:log_statistics:truelog_payloads:false-route_type: llm/v1/chat
model:provider: openai
name: gpt-4o-mini
model_alias: fast # client sends "model": "fast"auth:header_name: Authorization
header_value:"{vault://env/openai-key}"logging:log_statistics:truelog_payloads:false-route_type: llm/v1/chat
model:provider: vllm
name: llama-3-70b-instruct
model_alias: cheap # client sends "model": "cheap"options:upstream_url: http://vllm.internal/v1
auth:header_name: Authorization
header_value:"{vault://env/vllm-key}"logging:log_statistics:truelog_payloads:false
## Rate Limiting Enhancements: Precision token budgets at scale
***ai-rate-limiting-advanced ***now supports global rate limits that aggregate across all providers and models, plus granular per-model caps. Previously, you could rate limit by provider. Now you can express policies like: "this consumer group gets 1M tokens per day globally, with no more than 200K against GPT-4o." That's the level of precision enterprise cost and capacity management actually requires.
# KongAir: 1M daily token budget for premium users, with a 200K GPT-4o sub-capplugins:-name: ai-rate-limiting-advanced
config:identifier: consumer-group
strategy: redis
sync_rate:10policies:# Global daily budget: 1M tokens/day for the premium consumer group-match:-type: consumer_group
values:["premium"]limits:-limit:1000000window_size:86400tokens_count_strategy: total_tokens
# GPT-4o sub-cap: no more than 200K/day against this model-match:-type: consumer_group
values:["premium"]-type: model
values:["gpt-4o"]limits:-limit:200000window_size:86400tokens_count_strategy: total_tokens
redis:host: redis.internal
port:6379
## Guardrail Enhancements: Consistent safety across every model
We've standardized guardrail functionality and analytics across all supported plugins, so your observability dashboards show consistent metrics regardless of provider. We're also launching a **custom guardrails integration** — a configuration-driven approach to connect any third-party guardrails API to Kong. Define your endpoint, request mapping, and response evaluation logic; Kong handles the invocation on every request.
# KongAir: Input/output content safety via self-hosted NVIDIA NeMo Guardrailsplugins:-name: ai-custom-guardrail
service: kongair-flights-api
config:guarding_mode: BOTH
text_source: last_message
timeout:5000stop_on_error:truerequest:url:"http://nemo-guardrails.internal/v1/guardrail/checks"headers:Content-Type:"application/json"body:model:"meta/llama3-70b-instruct"messages:"$(nemo_messages)"guardrails:'{"config_id":"kongair-safety"}'response:block:"$(check_nemo.block)"block_message:"$(check_nemo.block_message)"functions:nemo_messages:| return function(content)
return {{ role = "user", content = content }}
endcheck_nemo:| return function(resp)
return {
block = resp.status == "blocked",
block_message = resp.status == "blocked"
and "Request blocked by KongAir content policy"
or "ok",
}
endmetrics:block_reason:"$(check_nemo.block_message)"
## New providers and vector search
**Databricks**: Route and govern traffic to Mosaic AI-hosted models alongside the rest of your AI infrastructure — a natural fit for enterprises already standardized on Databricks for data and ML workloads.
**DeepSeek**: Access highly cost-efficient, high-performance open-weight models through the same gateway layer. A straightforward option for teams looking to reduce AI spend without sacrificing capability.
**vLLM**: Organizations self-hosting open-source models with vLLM can now apply Kong's full policy stack — traffic management, observability, auth — to their on-prem models alongside cloud-hosted ones.
**Valkey Vector Search**: Kong now supports the Valkey vector search API, handling the request/response transformation required for semantic search operations. This extends Kong's governance to the retrieval layer — a key building block for RAG pipelines and context-augmented agents.
All three LLM providers are supported through the standard ***ai-proxy*** and ***ai-proxy-advanced*** plugin configuration.
Traditional APIs are, in a word, predictable. You know what you're getting: Compute costs that don't surprise you Traffic patterns that behave themselves Clean, well-defined request and response cycles AI APIs, especially anything that runs on LLMs
The Shifting Economic Landscape: The AI token economy in 2026 is evolving, and enterprise leaders must distinguish between low-cost input tokens and high-premium output tokens to maintain profitability. Agentic AI Financial Risks: The transition t
Managed Redis cache is a turnkey "Shared State" add-on for Kong Dedicated Cloud Gateways. It is designed to combine the performance of an in-memory data store with the simplicity of a SaaS product. When you spin up a Dedicated Cloud Gateway in Kong
Agents are ultimately decision makers. They make those decisions by combining intelligence with context, ultimately meaning they are only ever as useful as the context they can access. An agent that can't check inventory levels, look up customer his
The evolution of AI agents and autonomous systems has created new challenges for enterprise organizations. While securing API endpoints is well-understood, controlling access to individual AI agent tools presents a unique authorization problem. Toda
Claude Code is Anthropic's agentic coding and agent harness tool. Unlike traditional code-completion assistants that suggest the next line in an editor, Claude Code operates as an autonomous agent that reads entire codebases, edits files across mult
Kong Agent Gateway Is Here — And It Completes the AI Data Path
Kong Agent Gateway is a new capability within Kong AI Gateway that extends our platform to more robustly cover agent-to-agent (A2A) communication. With this release, Kong AI Gateway n