Blog
  • AI Gateway
  • AI Security
  • AIOps
  • API Security
  • API Gateway
|
    • API Management
    • API Development
    • API Design
    • Automation
    • Service Mesh
    • Insomnia
    • View All Blogs
  1. Home
  2. Blog
  3. Product Releases
  4. Kong AI Gateway 3.11: Reduce Token Spend, Unlock Multimodal Innovation
Product Releases
July 9, 2025
5 min read

Kong AI Gateway 3.11: Reduce Token Spend, Unlock Multimodal Innovation

Marco Palladino
CTO and Co-Founder of Kong

New Multimodal Capabilities, New AI Prompt Compression, Integration with AWS Bedrock Guardrails, and More

Today, I'm excited to announce one of our largest Kong AI Gateway releases (3.11), which ships with several new features critical in building modern and reliable AI agents in production. We strongly recommend updating to this version to get access to the latest and greatest that AI infrastructure has to offer.

The full change log can be found here.

Introducing 10+ GenAI capabilities, including multimodal endpoints

This new release of Kong AI Gateway is quite significant in the vastness of new GenAI capabilities that we're supporting out of the box.

Azure | OpenAI

Batch, Assistants, and Files:

  • Batch enables efficient parallel execution of multiple LLM calls, reducing latency and cost at scale.
  • Assistants simplify orchestration of multistep AI workflows, enabling developers to build stateful, tool-augmented agents with memory.
  • Files provide persistent storage for documents and context, allowing richer, more informed interactions with LLMs across sessions.
Azure | OpenAI

Audio Transcription, Translation, and Speech API:

  • Speech-to-text: Transcribe audio input to text for call summarization, voice agents, and meeting analysis.
  • Real-time translation: Convert spoken input across languages, enabling multilingual voice interfaces.
  • Text-to-speech: Synthesize natural-sounding audio from LLM responses to power voice-based agents.
Azure | OpenAI | Gemini | AWS Bedrock

Image Generation and Edits API:

  • Image generation: Generate images from text prompts for creative, marketing, and design applications.
  • Image editing: Modify existing images using instructions and masks, useful for dynamic content workflows.
  • Multimodal agents: Equip agents with visual input/output capabilities to enhance UX and task range.
Azure | OpenAI

Realtime API:

  • Streaming completions: Stream token-by-token output for fast, interactive user experiences.
  • Low latency: Reduce time-to-first-token and improve perceived responsiveness in chat UIs.
  • Analytics: Monitor streaming behavior and performance metrics.
Azure | OpenAI

Responses API: Enhanced response introspection

  • Response metadata: Access logprobs, function calls, and tool usage for each LLM output.
  • Debugging and evaluation: Enable advanced observability and response-level quality checks.
  • Control and tuning: Use metadata to build reranking, retries, or hybrid generation strategies.
AWS Bedrock | Cohere

Rerank API:

  • Contextual reranking: Improve relevance of retrieved documents and results in RAG pipelines.
  • Flexible inputs: Send any list of candidates to be re-ordered based on prompt context.
  • Improved accuracy: Boost final LLM response quality through better grounding.
AWS Bedrock

AWS Bedrock Agent APIs:

  • Converse / ConverseStream: Execute step-by-step agent plans with or without streaming for advanced orchestration.
  • RetrieveAndGenerate: Combine retrieval with generation in one API call for simplified RAG.
  • RetrieveAndGenerateStream: Stream RAG results for real-time agent experiences.
Hugging Face

Generate and Generate_Stream API:

  • Generate: Use open-source models for text generation across tasks and industries.
  • Generate Stream: Stream text outputs in real-time for chat and live inference use cases.
  • Open model ecosystem: Leverage the flexibility of Hugging Face’s vast library of models.
Azure | OpenAI | Gemini | AWS Bedrock | Mistral | Cohere

Embeddings API:

  • Text-to-embedding conversion: Transform text into vector representations for semantic search, clustering, recommendations, and RAG.
  • Multivendor support: Use OpenAI, Azure, Cohere, Mistral, Gemini, and Bedrock embeddings with a unified interface, including all OpenAI-compatible models.
  • Analytics: Track token usage, similarity scoring, and latency metrics for observability.

Introducing a new prompt compression plugin

With generative AI applications becoming more pervasive, the volume of requests to LLMs increases, and costs rise in proportion. As with any cost to our business, we must look for efficiency savings. LLM costs are typically based on token usage — the longer the prompt, the more tokens are consumed per request. Prompts will often contain padding or redundant words or phrases that can be removed or shortened while retaining the semantic intent of the request.

Youtube thumbnail

Effectively, we have halved the token count; you can control the level of compression or target token count. Our testing has shown that this approach can achieve up to 5x cost reduction, while keeping 80% of the intended semantic meaning of the original prompt.

Take a look at the docs for more examples.

In real-world usage, prompts are much larger and are made even more so by automatic context injection — whether that be system prompts or injecting Retrieval Augmented Generation (RAG) context. This additional context can also be compressed. In fact, our testing has shown that compressing the context while retaining the original prompt fidelity can provide an optimal balance between cost reduction and intent retention.

This complements other cost-saving measures already available in Kong, such as Semantic Caching, which avoids hitting the LLM service when a similar request has already been answered, and AI Rate Limiting, which can set time-based token or cost limits per application, team, or user.

Introducing AWS Bedrock Guardrails support

It is well understood that generative AI applications can sometimes produce unpredictable outputs – confidence in applications can quickly be eroded by a few missteps. You need to be able to keep your AI-driven applications “on topic”, block profanity or other undesirable language, redact personally identifiable information, and reduce hallucinations. You need guardrails.

Today, with Kong AI Gateway, you can already implement policies that can redact PII data with our built-in PII Sanitizer and Semantic Prompt Guard plugins. We also support policies that enable you to use Azure AI Content Safety to reach out to Azure’s managed guardrails service.

Today, we're announcing support for AWS Bedrock Guardrails to help safeguard your AI applications from a wide range of both malicious and unintended consequences. You can find more examples in the docs.

As a product owner with Kong AI Gateway, you can continue to monitor applications and provide incremental improvements in quality, and react immediately by adjusting policies without any changes to your application code. Kong AI Gateway helps you keep risks in check and increase confidence in the rollout of AI-driven applications and innovation.

Visualize your AI traffic with the new AI Manager

We also recently introduced a new AI Manager in Konnect, enabling you to easily expose LLMs for consumption by your AI agents, and additionally govern, secure, and observe LLM traffic using a brand-new user interface straight from your browser. 

With AI Manager you can:

  • Manage AI policies via Konnect: Govern, secure, accelerate, and observe AI traffic in a self-managed — or fully managed — AI infrastructure that's easy to deploy.
  • Curate your LLM catalog: See what LLMs are available for consumption by AI agents and applications, with custom tiers of access and governance controls.
  • Visualize the agentic map: Observe at any given time what agents are consuming the LLMs you've decided to expose to the organization.
  • Observe LLM analytics: Measure token, cost, and request consumption with custom dashboards and insights for fine-grained understanding of your AI traffic.

Read more about the new AI manager here. 

Get started with Kong AI Gateway today

Ready to try out the new release of Kong AI Gateway? You can get started for FREE with Konnect Plus. If you already have a Konnect account, visit the official product page or dive straight into the demos and tutorials. 

Want to learn more about moving past the AI experimentation phase and into production-ready AI systems? Check out this webinar on how to drive real AI value with state-of-the-art AI infrastructure.

AI GatewayAWSLLM

More on this topic

Videos

API Cost Management in the Age of LLMs

Videos

Agentic AI at Scale: AI Gateway + MCP Governance

See Kong in action

Accelerate deployments, reduce vulnerabilities, and gain real-time visibility. 

Get a Demo
Topics
AI GatewayAWSLLM
Share on Social
Marco Palladino
CTO and Co-Founder of Kong

Recommended posts

Announcing Kong AI Gateway 3.8 With Semantic Caching and Security, 6 New LLM Load-Balancing Algorithms, and More LLMs

Kong Logo
Product ReleasesSeptember 11, 2024

Today at API Summit , we're introducing one of the biggest new releases of our AI Gateway technology : a new class of intelligent semantic plugins, new advanced load balancing capabilities for LLMs, and the official support for AWS Bedrock and GCP

Marco Palladino

Move More Agentic Workloads to Production with AI Gateway 3.13

Kong Logo
Product ReleasesDecember 18, 2025

MCP ACLs, Claude Code Support, and New Guardrails New providers, smarter routing, stronger guardrails — because AI infrastructure should be as robust as APIs We know that successful AI connectivity programs often start with an intense focus on how

Greg Peranich

Make MCP Production-Ready: Introducing Kong’s Enterprise MCP Gateway

Kong Logo
Product ReleasesOctober 14, 2025

What does the solution space look like so far? The solution landscape is complicated by the fact that MCP is still finding its footing, and there are many various OSS projects and vendors that are rapidly shipping “MCP support” in an attempt to take

Alex Drag

AI Voice Agents with Kong AI Gateway and Cerebras

Kong Logo
EngineeringNovember 24, 2025

Kong Gateway is an API gateway and a core component of the Kong Konnect platform . Built on a plugin-based extensibility model, it centralizes essential functions such as proxying, routing, load balancing, and health checking, efficiently manag

Claudio Acquaviva

Kong Konnect: Introducing HashiCorp Vault Support for LLMs

Kong Logo
Product ReleasesJune 26, 2025

If you're a builder, you likely keep sending your LLM credentials on every request from your agents and applications. But if you operate in an enterprise environment, you'll want to store your credentials in a secure third-party like HashiCorp Vault

Marco Palladino

Kong AI Manager: Govern & Observe Agentic Traffic to Thousands of LLMs

Kong Logo
Product ReleasesMay 27, 2025

Today, we're excited to announce the general availability of AI Manager in Kong Konnect, the platform to manage all of your API, AI, and event connectivity across all modern digital applications and AI agents. Kong already provides the fastest and m

Marco Palladino

Introducing LLM Analytics in Kong Konnect for GenAI Traffic

Kong Logo
Product ReleasesSeptember 11, 2024

We’re pleased to announce the new LLM Usage reporting feature in Advanced Analytics, which aims to help organizations better manage their large language model (LLM) usage. This feature offers insights into token consumption, costs, and latency, allo

Christian Heidenreich

Ready to see Kong in action?

Get a personalized walkthrough of Kong's platform tailored to your architecture, use cases, and scale requirements.

Get a Demo
Powering the API world

Increase developer productivity, security, and performance at scale with the unified platform for API management, AI gateways, service mesh, and ingress controller.

Sign up for Kong newsletter

    • Platform
    • Kong Konnect
    • Kong Gateway
    • Kong AI Gateway
    • Kong Insomnia
    • Developer Portal
    • Gateway Manager
    • Cloud Gateway
    • Get a Demo
    • Explore More
    • Open Banking API Solutions
    • API Governance Solutions
    • Istio API Gateway Integration
    • Kubernetes API Management
    • API Gateway: Build vs Buy
    • Kong vs Postman
    • Kong vs MuleSoft
    • Kong vs Apigee
    • Documentation
    • Kong Konnect Docs
    • Kong Gateway Docs
    • Kong Mesh Docs
    • Kong AI Gateway
    • Kong Insomnia Docs
    • Kong Plugin Hub
    • Open Source
    • Kong Gateway
    • Kuma
    • Insomnia
    • Kong Community
    • Company
    • About Kong
    • Customers
    • Careers
    • Press
    • Events
    • Contact
    • Pricing
  • Terms
  • Privacy
  • Trust and Compliance
  • © Kong Inc. 2025