July 31, 2025

6 min read

Claudio Acquaviva

Principal Architect, Kong

Topics

Share on Social

In the last two parts of this series, we discussed How to Strengthen a ReAct AI Agent with Kong AI Gateway and How to Build a Single-LLM AI Agent with Kong AI Gateway and LangGraph. In this third and final part, we're going to evolve the AI Agent with multiple LLMs and Semantic Routing policies across them. In this blog post, we'll also explore new capabilities introduced in Kong AI Gateway 3.11 that support other GenAI infrastructures.

Multi-LLM ReAct AI Agent

In this section of the blog post, we're going to evolve the architecture one more time to add two new LLM infrastructures sitting behind the Gateway: Mistral and Anthropic, in addition to OpenAI.

Multi-LLM scenarios and use cases

In the main scenario, the Agent needs to communicate to multiple LLMs selectively, depending on its needs. Having the Kong AI Gateway intermediating the communication, provides several benefits:

Decide which LLM to use based on the cost, latency times, reliability, and mainly on semantics (some LLMs are better at a specific topic, others at coding, etc.).
Route queries to the appropriate LLM(s).
Act based on the results.
Fallback and redundancy: If one LLM fails or is slow, use another.

Semantic Routing Architecture

Kong AI Gateway offers a range of semantic capabilities including Caching and Prompt Guard. To implement the Multi-LLM Agent infrastructure, we're going to use the Semantic Routing capability provided by the AI Proxy Advanced plugin we've been using for the entire series of blog posts.

The AI Proxy Advanced Plugin has the ability to implement various load balancing policies, including distributing requests based on semantics or similarity between the prompts and description of each model. For example, consider that you have three models: the first one has been trained in sports, the second in music and the third one in science. What we want to do is route the requests accordingly, based on the topic each prompt has presented.

What happens is that, during configuration time, done by, for example, submitting decK declarations to Konnect Control Plane, the plugin hits the embeddings model for each description and stores the embeddings into the vector database.

Then, for each incoming request, the plugin submits a VSS (or Virtual Similarity Search) to the vector database to decide to which LLM the request should be routed to.

Semantic Routing configuration and request processing times

Redis

To implement the Semantic Routing architecture, we're going to use the Redis-stack Helm Charts to Redis as our vector database.

Ollama

As our Embedding model, we're going to consume the “mxbai-embed-large:latest” model handled locally by Ollama. Use the Ollama Helm Charts to install it.

Python Script

In this final AI Agent Python script, we have two main changes:

We have replaced the tools with new functions.

“get_music”: consumes the Event Registry service looking for music concerts.
“get_traffic”: it sends requests to Tavily service for traffic information.
“get_weather”: it remains the same, related to the OpenWeather public service.

Replaces the LangGraph calls to build the graph with another LangGraph pre-built function, “create_react_agent”.

The pre-built function “create_react_agent” is very helpful to implement the fundamental ReAct graph that we created programmatically before. That is, the agent is composed by:

A Node sending requests to the LLM
A “conditional_edge” associated with this Node and making decisions about how the Agent should proceed when getting a response from the LLM.
A Node to call tools

In fact, if you print the output of the graph with “graph.get_graph().draw_ascii())” function again, you'll see the same graph structure we'd in the previous version of the agent.

For this execution, the AI Proxy Advanced Plugin will route the request to Mistral, since it's related to music.

decK Declaration

Below you can check the new decK declaration for the Semantic Routing use case. The AI Proxy Advanced plugin has the following sections configured:

embeddings: where the plugin should go to generate embeddings related to the LLM models
vectordb: responsible for storing the embeddings and handling the VSS queries
targets: an entry for each LLM model. The most important setting is the description, which defines where the plugin should route the requests to.

Besides, the declaration applies the AI Prompt Decorator plugin so the Gateway asks the LLM to convert temperatures to Celsius.

Grafana Dashboards

Download and install the Grafana Dashboard available in the GitHub repository. It has two tiles:

Counter of requests for each Kong Route
Counter of requests for each LLM model

The dashboard is totally based on the metrics generated by the Prometheus plugin. The configuration is divided into two parts:

AI Proxy Advanced plugin with the following parameters
Prometheus plugin with the parameter
Grafana Dashboard based on the metrics generated by the Prometheus plugin

LangGraph Server

Now that we have our final version of the AI Agent, it's time to build a LangGraph Server based on it. You have multiple deployment options to run your LangGraph Server but we're going to use our own Minikube cluster in a deployment called Standalone Container.

For details, you can refer to the links below:

Agent Docker Image

The first step is to create the Docker image for the server. The code below removes the lines where we execute the graph. Another change is for the Kong Data Plane address, referring to the Kubernetes FQDN Service.

langgraph.json

The Docker image requires a “langgraph.json” file with the dependencies and the name of the graph variable inside the code, in our case “graph”.

Docker image creation

Create the image with the “langgraph” CLI command. It requires Docker installed in your environment.

Push it to Docker Hub:

Agent Deployment

Install your LangGraph Service using the Helm Chart available:

The “values.yaml” defines the service as “LoadBalancer” to make it available. Currently, only Postgres is supported as a database for LangGraph Server and Redis as the task queue. The file specifies Postgres resources for its Kubernetes deployment. Finally, LangGraph Server requires a LangSmith API Key. LangSmith is a platform used to monitor your server. Log to LangSmith and create your API Key.

Deploy the LangGraph Server:

If you want to uninstall it, run:

LangGraph Server API

If the LangGraph Server is deployed, you can use its API to send requests to your graph.

Look for your assistants with:

The expected response is:

Use the assistant's name to invoke graph.

The expected response is:

Kong AI Gateway 3.11 and Support for New GenAI Models

With Kong AI Gateway 3.11, we'll be able to support other GenAI infrastructures besides LLMs - which include video, images, etc. The following diagram lists the new modes supported:

Here's an example of a Kong Route declaration with the AI Proxy Advanced plugin enabled to protect the text-to-image OpenAI's Dall-E 2 model,

In order to do it, Kong AI Gateway 3.11 defines new configuration parameters like:

genai-category: is used to configure the GenAI infrastructure that the gateway protects. Besides image/generation, it supports, for example, text/generation and text/embeddings for regular LLMs and embedding models, audio/speech and audio/transcription for audio based models implementing speech recognition, audio-to-text, etc.
route_type: this existing parameter has been extended to support new types, such as:

LLM: llm/v1/responses, llm/v1/assistants, llm/v1/files and llm/v1/batches
Image: image/v1/images/generations, image/v1/images/edits
Audio: audio/v1/audio/speech, audio/v1/audio/transcriptions and audio/v1/audio/translations
Realtime: realtime/v1/realtime

Conclusion

This blog post has presented a basic AI Agent using Kong AI Gateway and LangGraph. Redis was used as a vector database and a local Ollama was the infrastructure that provided the Embedding Model.

Behind the Gateway, we've three LLM infrastructures (OpenAI, Mistral and Anthropic) and three external functions were used as tools by the AI Agent.

The Gateway was responsible for abstracting the LLM infrastructures and protecting the external functions with specific policies including Rate Limiting and API Keys.

You can discover all the features available on the Kong AI Gateway page.

Learn More

Topics

Kong Gateway AI Gateway LLM

Share on Social

Claudio Acquaviva

Principal Architect, Kong

DUPLICATE

Learning CenterJanuary 1, 1970

Kong

The Postman Data Breach: How to Stay Ahead with Kong

EngineeringJanuary 1, 1970

On December 23, 2024, the security research team at CloudSEK completed a year-long investigation of the cloud-based API testing tool Postman. CloudSEK’s findings revealed that more than 30,000 publicly accessible Postman workspaces had been leakin

Adam Jiroun

Federated Deployments with Control Plane Groups

EngineeringSeptember 24, 2025

In this blog post, we'll talk about the significant challenge of managing and governing a growing number of APIs across multiple teams in an organization — and how Control Plane Groups are a clear solution to avoid the chaos of inconsistent policies

Declan Keane

Agentic AI Adoption Soars, Tech Job Growth Stalls, Study Shows

EnterpriseSeptember 17, 2025

How is agentic AI impacting the enterprise and the workforce? New research looks at agentic adoption and potential impacts The promise of agentic AI is huge. But how is it impacting the enterprise and the developers and IT professionals most likely

Amit Dey

Unlocking API Analytics for Product Managers

EngineeringSeptember 9, 2025

Meet Emily. She’s an API product manager at ACME, Inc., an ecommerce company that runs on dozens of APIs. One morning, her team lead asks a simple question: “Who’s our top API consumer, and which of your APIs are causing the most issues right now?”

Christian Heidenreich

Level Up Your Digital Health Platform with Kong, SMART on FHIR, Okta

EngineeringSeptember 2, 2025

The healthcare industry is buzzing about FHIR (Fast Healthcare Interoperability Resources). Pronounced “fire,” this widely adopted data standard has been revolutionizing how healthcare information is exchanged. But building a truly modern, secure, a

Biswa Mohanty

Guide to API Testing: Understanding the Basics

EngineeringSeptember 1, 2025

Behind every smooth user experience is a maze of APIs quietly handling requests, responses, and data flows. This makes APIs critical connectors that enable applications to communicate and share data seamlessly. When these vital conduits fail, the

Adam Bauman

How to Build a Multi-LLM AI Agent with Kong AI Gateway and LangGraph