[AI Gateway](/blog/ai-gateway)AI Gateway

April 14, 2026

4 min read

Greg Peranich

Staff Product Manager, Kong

The shift from single-model AI features to multi-agent pipelines is no longer a future concern — it's running in production today. MCP has become the de facto protocol for tool-calling, agent-to-agent (A2A) communication patterns are proliferating, and enterprise teams are wiring together complex AI workflows that span multiple providers, services, and agents. Every hop in that data path is an opportunity for something to go wrong.

The challenge is governance. When auth logic, rate limits, and tool permissions live scattered across application code, platform teams lose visibility and control. Policy becomes inconsistent. Audits become painful. And every new model integration requires another round of code changes.

Kong AI Gateway 3.14 is built around closing that gap at the gateway layer — governing the full AI data path, from the first model request to the last agent hop, without requiring application teams to change their code. In this release: native A2A traffic management, token exchange and scope-based tool filtering, enhanced rate limiting and guardrails, and support for Databricks, DeepSeek, and vLLM.

**This content contains a video which can not be displayed in Agent mode**

## A2A Support: Route and govern agent-to-agent traffic

Agent-to-agent communication is the next frontier of AI infrastructure. As teams decompose monolithic AI workflows into specialized agents — a research agent, a booking agent, a summarization agent — the calls between those agents become as important to govern as the calls from end users.

With 3.14, Kong AI Gateway adds native support for A2A communication patterns. Kong sits in the path of agent-to-agent traffic, applying the same authentication, rate limiting, and observability policies it applies to client-to-agent calls. This positions Kong as a central hub for all AI traffic in your environment, regardless of whether that traffic originates from a human or another agent.

This release also ships structured logging for A2A calls — capturing payloads and statistics on every interaction and surfacing them through Kong's standard log plugins. Platform teams finally have the visibility to debug multi-agent pipelines, identify bottlenecks, and prove compliance without instrumenting every agent individually.

# KongAir: Route A2A traffic from orchestrator to a specialized booking agent
services:
  - name: booking-agent
    url: http://booking-agent.internal
    routes:
      - name: a2a-booking
        paths: ["/agents/booking"]
plugins:
  - name: ai-a2a-proxy
    service: booking-agent
    config:
      logging:
        log_statistics: true
        log_payloads: true
        max_payload_size: 1048576
      max_request_body_size: 1048576

## Token Exchange and Scope-based Tool Filtering: Zero-trust for agentic workflows

Governing what agents can do — and on whose behalf — is one of the hardest problems in enterprise AI. 3.14 ships two features that close this gap together.

**Token Exchange** (RFC 8693) allows Kong to intercept an incoming token and exchange it for a new one with different scopes, audience, or delegation claims before forwarding to a downstream service. This unlocks token downscoping (agents carry only the minimum permissions they need), audience restriction (tokens bound to specific services), and on-behalf-of delegation (preserving the original human's identity across agent hops). Supported in both OpenID Connect and MCP OAuth2 flows.

**Scope-based Tool Filtering** extends the MCP tool access control introduced in 3.13. Where 3.13 required consumer group mapping, 3.14 lets you restrict tool access natively using OAuth2 scopes from the incoming token — no consumer group management required. An agent carrying a ***flights:read**** *scope gets read-only tools. An agent with ***flights:write**** *gets more. The policy lives at the gateway, not in your agent code.

# KongAir: Token exchange and scope-based tool ACL for the flights service
plugins:
  - name: ai-mcp-oauth2
    service: kongair-flights-api
    config:
      resource: "https://flights.kongair.internal"
      authorization_servers:
        - "https://auth.kongair.internal"
      passthrough_credentials: true  # required when token_exchange is enabled
      token_exchange:
        enabled: true
        token_endpoint: "https://auth.kongair.internal/oauth/token"
        client_id: "kong-gateway"
        client_secret: "{vault://kv/kong/client-secret}"
        request:
          audience:
            - "https://flights.kongair.internal"
          scopes:
            - "flights:read"  # downscoped from orchestrator's broader permissions

  - name: ai-mcp-proxy
    service: kongair-flights-api
    config:
      mode: conversion-listener
      acl_attribute_type: oauth_access_token
      access_token_claim_field: scope
      tools:
        - name: search_flights
          description: Search available flights between two cities
          method: GET
          path: /search
          acl:
            allow: ["flights:read", "flights:write"]
        - name: book_flight
          description: Book a new flight reservation
          method: POST
          path: /book
          acl:
            allow: ["flights:write"]

## JWK-based MCP Token Validation: Auth for every authorization server

Token introspection works well when your authorization server supports it — but not all do. Some providers publish JWKs and expect consumers to validate tokens locally. Until now, that left a gap for teams whose authorization server doesn't expose an introspection endpoint.

3.14 adds JWK-based token validation to the MCP OAuth2 plugin. Kong fetches the public keys from the authorization server's JWKs endpoint, caches them, and validates incoming MCP tokens locally on every request — no round-trip to the auth server required. This covers the full auth model: teams using introspection-capable servers can continue as before; teams using JWK-only servers now have a first-class path.

# KongAir: Validate MCP tokens locally using Entra ID's published JWK Set
plugins:
  - name: ai-mcp-oauth2
    service: kongair-flights-api
    config:
      resource: "api://kongair-flights"
      authorization_servers:
        - "https://login.microsoftonline.com/kongair.onmicrosoft.com/v2.0"
      scopes: []
      jwks_endpoint: "https://login.microsoftonline.com/kongair.onmicrosoft.com/discovery/v2.0/keys"
      jwks_ttl: 3600

## Route by Model in Body: Dynamic model routing without client changes

Platform teams have wanted this for a while: route AI traffic to different upstream providers based on the `model` field in the request body — without requiring clients to know anything about backend topology.

3.14 adds body-based model routing to ***ai-proxy-advanced*** via a ***model_alias*** field on each target. When a request arrives, Kong inspects the ***model*** field in the body, matches it against the ***model_alias*** configured on each target, and routes to the right provider. The client never needs to know which provider sits behind the alias.

This decouples the model naming convention your teams use from the provider-specific model IDs your infrastructure runs. A client requests powerful, Kong routes to GPT-4o on your Azure enterprise tenant. A client requests cheap, Kong routes to a self-hosted Llama instance via vLLM. Different providers, different rate limits, same API surface.

# KongAir: Route by model alias to different providers — clients use friendly names
plugins:
  - name: ai-proxy-advanced
    config:
      balancer:
        algorithm: priority
      targets:
        - route_type: llm/v1/chat
          model:
            provider: azure
            name: gpt-4o
            model_alias: powerful          # client sends "model": "powerful"
            options:
              azure_instance: kongair-openai
              azure_deployment_id: gpt-4o-prod
              azure_api_version: "2024-02-01"
          auth:
            header_name: Authorization
            header_value: "{vault://env/azure-openai-key}"
          logging:
            log_statistics: true
            log_payloads: false

        - route_type: llm/v1/chat
          model:
            provider: openai
            name: gpt-4o-mini
            model_alias: fast             # client sends "model": "fast"
          auth:
            header_name: Authorization
            header_value: "{vault://env/openai-key}"
          logging:
            log_statistics: true
            log_payloads: false

        - route_type: llm/v1/chat
          model:
            provider: vllm
            name: llama-3-70b-instruct
            model_alias: cheap            # client sends "model": "cheap"
            options:
              upstream_url: http://vllm.internal/v1
          auth:
            header_name: Authorization
            header_value: "{vault://env/vllm-key}"
          logging:
            log_statistics: true
            log_payloads: false

## Rate Limiting Enhancements: Precision token budgets at scale

***ai-rate-limiting-advanced ***now supports global rate limits that aggregate across all providers and models, plus granular per-model caps. Previously, you could rate limit by provider. Now you can express policies like: "this consumer group gets 1M tokens per day globally, with no more than 200K against GPT-4o." That's the level of precision enterprise cost and capacity management actually requires.

# KongAir: 1M daily token budget for premium users, with a 200K GPT-4o sub-cap
plugins:
  - name: ai-rate-limiting-advanced
    config:
      identifier: consumer-group
      strategy: redis
      sync_rate: 10
      policies:
        # Global daily budget: 1M tokens/day for the premium consumer group
        - match:
            - type: consumer_group
              values: ["premium"]
          limits:
            - limit: 1000000
              window_size: 86400
              tokens_count_strategy: total_tokens
        # GPT-4o sub-cap: no more than 200K/day against this model
        - match:
            - type: consumer_group
              values: ["premium"]
            - type: model
              values: ["gpt-4o"]
          limits:
            - limit: 200000
              window_size: 86400
              tokens_count_strategy: total_tokens
      redis:
        host: redis.internal
        port: 6379

## Guardrail Enhancements: Consistent safety across every model

We've standardized guardrail functionality and analytics across all supported plugins, so your observability dashboards show consistent metrics regardless of provider. We're also launching a **custom guardrails integration** — a configuration-driven approach to connect any third-party guardrails API to Kong. Define your endpoint, request mapping, and response evaluation logic; Kong handles the invocation on every request.

# KongAir: Input/output content safety via self-hosted NVIDIA NeMo Guardrails
plugins:
  - name: ai-custom-guardrail
    service: kongair-flights-api
    config:
      guarding_mode: BOTH
      text_source: last_message
      timeout: 5000
      stop_on_error: true
      request:
        url: "http://nemo-guardrails.internal/v1/guardrail/checks"
        headers:
          Content-Type: "application/json"
        body:
          model: "meta/llama3-70b-instruct"
          messages: "$(nemo_messages)"
          guardrails: '{"config_id":"kongair-safety"}'
      response:
        block: "$(check_nemo.block)"
        block_message: "$(check_nemo.block_message)"
      functions:
        nemo_messages: |
          return function(content)
            return {{ role = "user", content = content }}
          end
        check_nemo: |
          return function(resp)
            return {
              block         = resp.status == "blocked",
              block_message = resp.status == "blocked"
                              and "Request blocked by KongAir content policy"
                              or  "ok",
            }
          end
      metrics:
        block_reason: "$(check_nemo.block_message)"

## New providers and vector search

**Databricks**: Route and govern traffic to Mosaic AI-hosted models alongside the rest of your AI infrastructure — a natural fit for enterprises already standardized on Databricks for data and ML workloads.

**DeepSeek**: Access highly cost-efficient, high-performance open-weight models through the same gateway layer. A straightforward option for teams looking to reduce AI spend without sacrificing capability.

**vLLM**: Organizations self-hosting open-source models with vLLM can now apply Kong's full policy stack — traffic management, observability, auth — to their on-prem models alongside cloud-hosted ones.

**Valkey Vector Search**: Kong now supports the Valkey vector search API, handling the request/response transformation required for semantic search operations. This extends Kong's governance to the retrieval layer — a key building block for RAG pipelines and context-augmented agents.

All three LLM providers are supported through the standard ***ai-proxy*** and ***ai-proxy-advanced*** plugin configuration.

## Get started

Kong AI Gateway 3.14 is available now.

- Try it free on [Kong Konnect](https://konghq.com/products/kong-konnect)Kong Konnect
- Read the [full changelog](https://docs.konghq.com/gateway/changelog/)full changelog
- Browse the [AI plugin documentation](https://docs.konghq.com/hub/?category=ai)AI plugin documentation
- [Contact us](https://konghq.com/contact-sales)Contact us to discuss your AI governance strategy

**Topics**

- [AI Gateway](/blog/tag/ai-gateway)AI Gateway- [Governance](/blog/tag/governance)Governance- [Agentic AI](/blog/tag/agentic-ai)Agentic AI- [API Gateway](/blog/tag/api-gateway)API Gateway- [Rate Limiting](/blog/tag/rate-limiting)Rate Limiting- [MCP](/blog/tag/mcp)MCP

Greg Peranich

Staff Product Manager, Kong

# Practical Strategies to Monetize AI APIs in Production

[Engineering](/blog/tag)EngineeringMarch 27, 2026

Traditional APIs are, in a word, predictable. You know what you're getting: Compute costs that don't surprise you Traffic patterns that behave themselves Clean, well-defined request and response cycles AI APIs, especially anything that runs on LLMs

Deepanshu Pandey

[](https://konghq.com/blog/engineering/monetize-ai-apis)

# AI Input vs. Output: Why Token Direction Matters for AI Cost Management

[Enterprise](/blog/tag)EnterpriseMarch 10, 2026

The Shifting Economic Landscape: The AI token economy in 2026 is evolving, and enterprise leaders must distinguish between low-cost input tokens and high-premium output tokens to maintain profitability. Agentic AI Financial Risks: The transition t

Dan Temkin

[](https://konghq.com/blog/enterprise/ai-input-vs-output-cost-management)

# Kong Simplifies Multicloud Cloud Gateways with Managed Redis Cache

[Product Releases](/blog/tag)Product ReleasesMarch 12, 2026

Managed Redis cache is a turnkey "Shared State" add-on for Kong Dedicated Cloud Gateways. It is designed to combine the performance of an in-memory data store with the simplicity of a SaaS product. When you spin up a Dedicated Cloud Gateway in Kong

Amit Shah

[](https://konghq.com/blog/product-releases/multicloud-cloud-gateways-managed-redis-cache)

# From APIs to Agentic Integration: Introducing Kong Context Mesh

[Product Releases](/blog/tag)Product ReleasesFebruary 10, 2026

Agents are ultimately decision makers. They make those decisions by combining intelligence with context, ultimately meaning they are only ever as useful as the context they can access. An agent that can't check inventory levels, look up customer his

Alex Drag

[](https://konghq.com/blog/product-releases/introducing-kong-context-mesh)

# A Unified Gateway for APIs + Agentic Applications on VMware VKS with Kong Konnect

[Engineering](/blog/tag)EngineeringMay 20, 2026

Built on top of Kong API Gateway, the Kong AI Gateway is designed to address key challenges in enterprise AI adoption. Modern AI applications rarely rely on a single model; instead, they orchestrate multiple GenAI providers, agent frameworks, Age

Anika Suri

[](https://konghq.com/blog/engineering/kong-konnect-api-ai-gateway-vmware-vks)

# Introducing MCP Tool ACLs: Fine-Grained Authorization for AI Agent Tools

[Product Releases](/blog/tag)Product ReleasesJanuary 14, 2026

The evolution of AI agents and autonomous systems has created new challenges for enterprise organizations. While securing API endpoints is well-understood, controlling access to individual AI agent tools presents a unique authorization problem. Toda

Michael Field

[](https://konghq.com/blog/product-releases/mcp-tool-acls-ai-gateway)

# AI Agent Integration: Gartner Research Confirms Need for AI Control Layer

[Enterprise](/blog/tag)EnterpriseMay 8, 2026

An AI control layer is the governance and observability infrastructure that sits between AI agents and enterprise applications, handling authentication, routing, rate limiting, and auditability to ensure secure, managed access. Unlike traditional in

Heather Halenbeck

[](https://konghq.com/blog/enterprise/ai-agent-integration-gartner-ai-control-layer)