DUPLICATE


In the last two parts of this series, we discussed How to Strengthen a ReAct AI Agent with Kong AI Gateway and How to Build a Single-LLM AI Agent with Kong AI Gateway and LangGraph. In this third and final part, we're going to evolve the AI Agent with multiple LLMs and Semantic Routing policies across them. In this blog post, we'll also explore new capabilities introduced in Kong AI Gateway 3.11 that support other GenAI infrastructures.
In this section of the blog post, we're going to evolve the architecture one more time to add two new LLM infrastructures sitting behind the Gateway: Mistral and Anthropic, in addition to OpenAI.
In the main scenario, the Agent needs to communicate to multiple LLMs selectively, depending on its needs. Having the Kong AI Gateway intermediating the communication, provides several benefits:
Kong AI Gateway offers a range of semantic capabilities including Caching and Prompt Guard. To implement the Multi-LLM Agent infrastructure, we're going to use the Semantic Routing capability provided by the AI Proxy Advanced plugin we've been using for the entire series of blog posts.
The AI Proxy Advanced Plugin has the ability to implement various load balancing policies, including distributing requests based on semantics or similarity between the prompts and description of each model. For example, consider that you have three models: the first one has been trained in sports, the second in music and the third one in science. What we want to do is route the requests accordingly, based on the topic each prompt has presented.
What happens is that, during configuration time, done by, for example, submitting decK declarations to Konnect Control Plane, the plugin hits the embeddings model for each description and stores the embeddings into the vector database.
Then, for each incoming request, the plugin submits a VSS (or Virtual Similarity Search) to the vector database to decide to which LLM the request should be routed to.

Semantic Routing configuration and request processing times
To implement the Semantic Routing architecture, we're going to use the Redis-stack Helm Charts to Redis as our vector database.
As our Embedding model, we're going to consume the “mxbai-embed-large:latest” model handled locally by Ollama. Use the Ollama Helm Charts to install it.
In this final AI Agent Python script, we have two main changes:
The pre-built function “create_react_agent” is very helpful to implement the fundamental ReAct graph that we created programmatically before. That is, the agent is composed by:
In fact, if you print the output of the graph with “graph.get_graph().draw_ascii())” function again, you'll see the same graph structure we'd in the previous version of the agent.
For this execution, the AI Proxy Advanced Plugin will route the request to Mistral, since it's related to music.
Below you can check the new decK declaration for the Semantic Routing use case. The AI Proxy Advanced plugin has the following sections configured:
Besides, the declaration applies the AI Prompt Decorator plugin so the Gateway asks the LLM to convert temperatures to Celsius.
Download and install the Grafana Dashboard available in the GitHub repository. It has two tiles:
The dashboard is totally based on the metrics generated by the Prometheus plugin. The configuration is divided into two parts:
AI Proxy Advanced plugin with the following parameters
Prometheus plugin with the parameter

Grafana Dashboard based on the metrics generated by the Prometheus plugin
Now that we have our final version of the AI Agent, it's time to build a LangGraph Server based on it. You have multiple deployment options to run your LangGraph Server but we're going to use our own Minikube cluster in a deployment called Standalone Container.
For details, you can refer to the links below:
The first step is to create the Docker image for the server. The code below removes the lines where we execute the graph. Another change is for the Kong Data Plane address, referring to the Kubernetes FQDN Service.
The Docker image requires a “langgraph.json” file with the dependencies and the name of the graph variable inside the code, in our case “graph”.
Create the image with the “langgraph” CLI command. It requires Docker installed in your environment.
or
Push it to Docker Hub:
Install your LangGraph Service using the Helm Chart available:
The “values.yaml” defines the service as “LoadBalancer” to make it available. Currently, only Postgres is supported as a database for LangGraph Server and Redis as the task queue. The file specifies Postgres resources for its Kubernetes deployment. Finally, LangGraph Server requires a LangSmith API Key. LangSmith is a platform used to monitor your server. Log to LangSmith and create your API Key.
Deploy the LangGraph Server:
If you want to uninstall it, run:
If the LangGraph Server is deployed, you can use its API to send requests to your graph.
Look for your assistants with:
The expected response is:
Use the assistant's name to invoke graph.
The expected response is:
With Kong AI Gateway 3.11, we'll be able to support other GenAI infrastructures besides LLMs - which include video, images, etc. The following diagram lists the new modes supported:

Here's an example of a Kong Route declaration with the AI Proxy Advanced plugin enabled to protect the text-to-image OpenAI's Dall-E 2 model,
In order to do it, Kong AI Gateway 3.11 defines new configuration parameters like:
image/generation, it supports, for example, text/generation and text/embeddings for regular LLMs and embedding models, audio/speech and audio/transcription for audio based models implementing speech recognition, audio-to-text, etc.llm/v1/responses, llm/v1/assistants, llm/v1/files and llm/v1/batchesimage/v1/images/generations, image/v1/images/editsaudio/v1/audio/speech, audio/v1/audio/transcriptions and audio/v1/audio/translationsrealtime/v1/realtimeThis blog post has presented a basic AI Agent using Kong AI Gateway and LangGraph. Redis was used as a vector database and a local Ollama was the infrastructure that provided the Embedding Model.
Behind the Gateway, we've three LLM infrastructures (OpenAI, Mistral and Anthropic) and three external functions were used as tools by the AI Agent.
The Gateway was responsible for abstracting the LLM infrastructures and protecting the external functions with specific policies including Rate Limiting and API Keys.
You can discover all the features available on the Kong AI Gateway page.



On December 23, 2024, the security research team at CloudSEK completed a year-long investigation of the cloud-based API testing tool Postman. CloudSEK’s findings revealed that more than 30,000 publicly accessible Postman workspaces had been leakin

In this blog post, we'll talk about the significant challenge of managing and governing a growing number of APIs across multiple teams in an organization — and how Control Plane Groups are a clear solution to avoid the chaos of inconsistent policies

How is agentic AI impacting the enterprise and the workforce? New research looks at agentic adoption and potential impacts The promise of agentic AI is huge. But how is it impacting the enterprise and the developers and IT professionals most likely

Meet Emily. She’s an API product manager at ACME, Inc., an ecommerce company that runs on dozens of APIs. One morning, her team lead asks a simple question: “Who’s our top API consumer, and which of your APIs are causing the most issues right now?”

The healthcare industry is buzzing about FHIR (Fast Healthcare Interoperability Resources). Pronounced “fire,” this widely adopted data standard has been revolutionizing how healthcare information is exchanged. But building a truly modern, secure, a

Behind every smooth user experience is a maze of APIs quietly handling requests, responses, and data flows. This makes APIs critical connectors that enable applications to communicate and share data seamlessly. When these vital conduits fail, the
Get a personalized walkthrough of Kong's platform tailored to your architecture, use cases, and scale requirements.