How to Strengthen a ReACT AI Agent with Kong AI Gateway

July 15, 2025

17 min read

Claudio Acquaviva

Principal Architect, Kong

This is part one of a series exploring how Kong AI Gateway can be used in an AI Agent development with LangGraph. The series comprises three parts:

Basic ReAct AI Agent with Kong AI Gateway
Single LLM ReAct AI Agent with Kong AI Gateway and LangGraph
Multi-LLM ReAct AI Agent and LangGraph Server

Introduction

To put it simply: ChatGPT is an AI agent.

As such, ChatGPT is a type of artificial intelligence system designed to perceive natural language input, process information, and respond accordingly. Anthropic's blog post, “Building Effective Agents”, has a nice and concise definition of AI Agents: “Agents are systems that leverage LLMs to autonomously coordinate their operations and tool interactions, exercising control over task execution.”

On the other hand, no single model has all the answers. A multi-GenAI Agent brings the best of multiple LLM, Audio, Video and Image models together. Instead of relying on one brain, this agent intelligently selects or coordinates between different language models like GPT-4, Claude, Mistral, and others, to deliver more accurate responses.

Ideally, we should isolate the AI Agent from the policies that drive the consumption of the models. Or, if you will, the AI Agent should rely on an external component, intelligent enough to understand what and how many models are available and, most importantly, decide which model should be called to address the incoming requests. For example, let's say you have multiple available models, each one of them trained for specific topics like Mathematics, Classical Music, etc. Another scenario would consider digesting a variety of inputs, including texts, images, and audio. Moreover, models can be added or removed from your environment along the way, making AI agent development even more challenging.

This blog post explores how developers can leverage Kong AI Gateway to implement better and smarter AI agents using LangGraph. We are going to focus on LLM models, but keep in mind, Kong AI Gateway will support other kinds of models along the way, including audio, images, etc. Moreover, it's important to note that LangGraph is an extensive AI Agent framework with lots of other features and capabilities. Refer to its documentation portal to learn more about it.

Before we get started, it’s important to make an important comment: the agents we are going to create and run are introductory ones and are not meant to be applied for production environments. The main purpose is to demonstrate how to leverage the LangGraph and Kong AI Gateway capabilities we have available when creating an AI Agent.

MCP (Model Context Protocol)

We have seen a lot of traction around the new protocol proposed November 2024, by Anthropic, called MCP (Model Context Protocol). At the same time, Kong launched in April 2025 the new Kong MCP Server. MCP contributes tremendously to AI Agent developments. However, since this is an introductory blog post, we are going to exercise the inclusion of MCP in this architecture, integrating with Kong and LangGraph, in the next one.

Blog structure

This blog series is structured as follows:

Part I

Kong AI Gateway introduction, implementation architecture and AI Agent fundamentals
Kong version of a simple ReAct-based AI Agent with no frameworks

Part II

LangGraph fundamentals and basic AI Agent
Tools and function calling with Kong AI Gateway and observability layer
Single LLM AI Agent with Kong AI Gateway and LangGraph

Part III

Multi LLM AI Agent
LangGraph Server

Kong AI Gateway Introduction

In April 2025, Kong announced Kong Gateway 3.10 with the 5th version of Kong AI Gateway capabilities to address AI-based use cases, including automated RAG, PII (Personally Identifiable Information) sanitization, and load balance based on tokens and costs.

The diagram below represents the Kong AI Gateway capabilities:

Kong AI Gateway functional capabilities

Also, from the architecture perspective, in a nutshell, the Konnect Control Plane and Data Plane nodes topology remains the same.

The Kong AI Gateway sits between the GenAI applications we build and the LLMs we consume. By leveraging the same underlying core of Kong Gateway, we're reducing complexity in deploying the AI Gateway capabilities as well. And of course, it works on Konnect, Kubernetes, self-hosted, or across multiple clouds.

Implementation Architecture

The Agent implementation architecture should include components representing and responsible for the functional scope described above. The architecture comprises:

Kong AI Gateway to abstract and protect:
- LLM models
- External functions used by the AI Agent
Mistral, Anthropic, and OpenAI as LLM model infrastructures
Redis as the Vector Database
Ollama as the Embedding Model infrastructure
Observability layer with Grafana, Loki, and Prometheus

Kong AI Gateway, implemented as a regular Konnect Data Plane Node, Redis, Ollama, and the Observability layer run on a Minikube Kubernetes Cluster.

Multi-LLM ReAct AI Agent Reference Architecture

The artifacts used to implement the architecture are available at the following GitHub repo.

AI Agent Architecture, Reasoning Frameworks and Prompt Engineering

In September 2024, Google launched a white paper, called “Agents”, exploring the basics of AI Agents, including their architectures and components. The fundamental diagram, included in the paper, is a nice starting point to understand the main moving parts someone working with AI Agents should master.

AI Agent components

The diagram presents three main components of an agent:

Model: the LLM model that will act as the centralized component responsible for guiding the agent’s decision-making processes.
Tools: allow Foundation Models to interact with external data and services
Orchestration: defines a recurring loop in which the agent receives input, plans, reasons, and makes decisions as to the Agent's next action. Moreover, the Orchestration layer defines “Memories”, used for saving conversations.

An interesting perspective is to think of agents like LLM enhanced with components and capabilities like tools, memory, reasoning, etc.

LangGraph documentation also provides a concise introduction to AI Agents and their main components.

Reasoning Frameworks

As you can see in the diagram, one of the main responsibilities of the Orchestration layer is controlling the reasoning process implemented by the Model. In fact, at the core of the layer, there's a loop responsible for building prompts, making decisions, calling external systems, tracking the steps that have been processed, etc., as it interacts with the Model. The Model, in turn, implements the reasoning process itself supporting a framework.

There are some Reasoning Frameworks defined the LLM models usually support:

Chain of Thought (CoT): based on step-by-step reasoning, defines a linear thought to arrive at an answer. It does not implement tools nor interactions with external components.
Tree of Thought (ToT): extends CoTs with multiple possible solutions, creating a tree-like structure, evaluating intermediate states to choose the branch to continue.
ReAct (Reasoning and Action): also based on CoT adding tools, external interaction, and observation. Besides, it's supported by Agent Frameworks like LangGraph. Due to all these factors, ReAct has been considered the best Reasoning Framework to implement AI Agent.

In the blog post the ReAct white paper authors wrote, there's a nice diagram comparing the ReAct Framework with others:

ReAct Framework comparison

Prompt Engineering

Prompt engineering is the practice of crafting inputs (prompts) to get the most useful, accurate, or creative responses from LLM models.

It's critical for AI Agents since you shape how the agents should communicate, think, act, interact with tools, reason and respond. So, the better Prompt Engineering you have the better AI Agent you are going to get.

The same blog post also includes a nice comparison of four prompting methods to respond to a HotpotQA. HotpotQA is a question answering dataset designed for multi-hop reasoning. You can learn more about it in the HotpotQA GitHub repository.

Prompt Engineering comparison

Simple ReAct based AI Agent

With all these new elements in place, it's time to exercise them. The following code implements a simple ReAct based AI Agent written in Python, including a reasoning loop only, without an Agent Framework or tools interacting with external environments. The AI Agent uses OpenAI's Chat Completions API to consume the GPT-4 LLM model.

As we have said, the prompt is critical to drive how the Agent should behave. As you can see, ours instructs the Agent with a ReAct pattern including the format it should follow as well as requesting to present the logical steps, with thoughts and observations, used to achieve the final answer.

import os
from openai import OpenAI

# Set your OpenAI key
client = OpenAI(
    api_key = os.getenv("OPENAI_API_KEY")
)

# Initial ReAct-style prompt
def build_prompt(task: str) -> str:
    return f"""
You are an intelligent AI agent solving problems using the ReAct pattern.

Follow this format:
Thought: What are you thinking?
Action: The action you are taking
Observation: (result from imaginary action)
Thought: ...
FINAL_ANSWER: <your final answer>

Task: {task}

Each time you respond, you should only give **one logical step** in your reasoning process.
Do not jump to the final answer immediately.
Each step should be presented using the format above.

Begin.

Thought:
"""

# Reasoning loop with no tools
def run_simple_react_agent(task: str, max_steps: int = 10):
    prompt = build_prompt(task)

    for step in range(max_steps):
        print(f"\n--> Step {step + 1}")


        response = client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.7
        )

        output = response.choices[0].message.content.strip()
        print(output)

        prompt += output + "\n"

        if "FINAL_ANSWER:" in output:
            print("\n Agent finished.")
            break

        # Continue loop by cueing next Thought
        prompt += "Thought:\n"
    else:
        print("\n Max steps reached without a final answer.")

# Run example
run_simple_react_agent("Taking the 'The Grapes of Wrath' novel by John Steinbeck, what are the main differences when comparing it to the movie?")

Before executing it, make sure you have installed the openai Python package. If you run the script you should get a response like this:

--> Step 1
In order to compare the differences between 'The Grapes of Wrath' novel and its movie adaptation, I need to understand the plot, characters, themes, and narrative structure of both the novel and the movie.

Action: 
I will look up summaries, reviews, and analyses of both the novel and the movie to identify key points of similarity and divergence. 

Observation: 
I have gathered information about the plot, characters, themes, and narrative structure of both 'The Grapes of Wrath' novel and its movie adaptation.

--> Step 2
Now that I have the necessary information, I need to compare the two mediums to identify differences. I'll start by comparing the plot of the novel and the movie.

Action:
I will compare the plot of the novel 'The Grapes of Wrath' to the plot of its movie adaptation.

--> Step 3
After comparing the plot of the novel and the movie, I noticed that the movie omits some parts of the novel's plot and simplifies others.

Action: 
Next, I will compare the characters in the novel and the movie to identify any differences.
Observation:
The characters in the movie are generally faithful to the novel, but some have been downplayed or omitted entirely, potentially reducing the complexity and depth of the character relationships and development seen in the novel.

Thought: 
Now, I'll compare the themes presented in the novel and the movie. 

Action: 
I will compare the themes of the novel 'The Grapes of Wrath' to the themes presented in its movie adaptation.

--> Step 4
The themes in the movie are generally consistent with those in the novel, but the film doesn't delve as deeply into some of the more complex and nuanced themes that the novel does.

Action: 
Finally, I will compare the narrative structure of the novel and the movie to identify any differences.

--> Step 5
The narrative structure in the movie is more linear and simplified, whereas the novel uses intercalary chapters to add depth and context to the main story, showing a broader picture of the societal issues of the time.

Action: 
Now, I will compile the differences I have identified into a coherent comparison.

FINAL_ANSWER: 
The main differences between 'The Grapes of Wrath' novel and its movie adaptation are as follows: 
1. Plot: The movie omits and simplifies parts of the novel's plot.
2. Characters: Some characters in the movie are downplayed or omitted, reducing the complexity and depth of the character relationships and development seen in the novel.
3. Themes: The movie doesn't delve as deeply into some of the more complex and nuanced themes that the novel does.
4. Narrative Structure: The movie has a more linear narrative, whereas the novel uses intercalary chapters to add depth and context to the main story.

Agent finished.

Kong Version

As our first exercise, let's inject Kong Data Plane to our simple scenario. Here's the architecture illustrating it:

AI Agent with Kong Konnect Data Plane

OpenAI API support

Kong AI Gateway supports the OpenAI API specification as well as Bedrock and Gemini as incoming formats. The consumer can then send standard OpenAI requests to the Kong AI Gateway. As a basic example, consider this OpenAI request:

    curl https://api.openai.com/v1/chat/completions \
      -H "Content-Type: application/json" \
      -H "Authorization: Bearer $OPENAI_API_KEY" \
      -d '{
        "model": "gpt-4o",
        "messages": [
          {
            "role": "user",
            "content": "Hello!"
          }
        ]
      }'

When we add Kong AI Gateway, sitting in front of the LLM model, we're not just exposing it but also allowing the consumers to use the same mechanism — in this case, OpenAI APIs — to consume it. That leads to a very flexible and powerful capability when we come to development processes. In other words, Kong AI Gateway normalizes the consumption of any LLM infrastructure, including Amazon Bedrock, Mistral, OpenAI, Cohere, etc.

As an exercise, the new request should be something like this. The request has some minor differences:

It sends a request to the Kong API Gateway Data Plane Node.
It replaces the OpenAI endpoint with a Kong API Gateway route.
The API Key is actually managed by the Kong API Gateway now.
It does not refer to any model, since it's being resolved by the AI Gateway.

 curl http://$DATA_PLANE_LB/bedrock-route \
  -H "Content-Type: application/json" \
  -H "apikey: $KONG_API_KEY" \
  -d '{
     "messages": [
       {
         "role": "user",
         "content": "Hello!"
       }
     ]
   }'

Minikube installation

We are going to deploy our Data Plane in a Minikube Cluster over Docker or Podman. For example, you can start Podman with:

podman machine set --memory 8196
podman machine start

If you want to stop it run:

podman machine stop

Then you can install Minikube with:

minikube start --driver=podman --memory='no-limit'

To be able to consume the Kubernetes Load Balancer Services, in another terminal run:

minikube tunnel

Konnect subscription

Now you have to get access to Konnect. Click on the Registration link and present your credentials. Or, if you already have a Konnect subscription, log in to it.

Konnect PAT

In order to interact with Konnect, using command lines, you need a Konnect Personal Access Token (PAT). To generate your PAT, go to Konnect UI, click on your initials in the upper right corner of the Konnect home page, then select "Personal Access Tokens." Click on "+ Generate Token," name your PAT, set its expiration time, and be sure to copy and save it as an environment variable also named as PAT. Konnect won’t display your PAT again.

Kong Gateway Operator (KGO), Konnect Control Plane creation and Data Plane deployment

For Kubernetes deployments, Kong provides Kong Gateway Operator (KGO), an Operator capable of managing all flavours of Kong installations, including Kong Ingress Controller, Kong Gateway Data Planes, for self-managed or Konnect based deployments.

Our topology comprises a hybrid deployment where a Data Plane, running on Minikube, is connected to the Konnect Control Plane.

Start adding the KGO Helm Charts to your environment:

helm repo add kong https://charts.konghq.com  
helm repo update kong

Install the Operator with:

helm upgrade --install kgo kong/gateway-operator \
  -n kong-system \
  --create-namespace \
  --set image.tag=1.6.1 \
  --set kubernetes-configuration-crds.enabled=true \
  --set env.ENABLE_CONTROLLER_KONNECT=true

You can check the Operator's logs with:

kubectl logs -f $(kubectl get pod -n kong-system -o json | jq -r '.items[].metadata | select(.name | startswith("kgo-gateway"))' | jq -r '.name') -n kong-system

And if you want to uninstall it run:

helm uninstall kgo -n kong-system
kubectl delete namespace kong-system

Konnect Control Plane creation

The first thing to do, in order to get your Konnect Control Plane defined, you have to create a Kubernetes Secret with your PAT. KGO requires your secret to be labeled. The commands should be like these ones:

kubectl create namespace kong

kubectl delete secret konnect-pat -n kong
kubectl create secret generic konnect-pat -n kong --from-literal=token='kpat_K6Cgbx…..'

kubectl label secret konnect-pat -n kong "konghq.com/credential=konnect"

Then, the following declaration defines an Authentication Configuration, based on the Kubernetes Secret and referring to a Konnect API URL, and the actual Konnect Control Plane. Check the documentation to learn more about it.

cat <<EOF | kubectl apply -f -
kind: KonnectAPIAuthConfiguration
apiVersion: konnect.konghq.com/v1alpha1
metadata:
  name: konnect-api-auth-conf
  namespace: kong
spec:
  type: secretRef
  secretRef:
    name: konnect-pat
    namespace: kong
  serverURL: us.api.konghq.com
---
kind: KonnectGatewayControlPlane
apiVersion: konnect.konghq.com/v1alpha1
metadata:
 name: ai-gateway
 namespace: kong
spec:
 name: ai-gateway
 konnect:
   authRef:
     name: konnect-api-auth-conf
EOF

You should see your Control Plane listed in your Konnect Organization:

Kong Konnect Control Plane

Konnect Data Plane deployment

The next declaration instantiates a Data Plane connected to your Control Plane. It creates a KonnectExtension, asking KGO to manage the certificate and private key provisioning automatically, and the actual Data Plane. The Data Plane declaration specifies the Docker image, in our case 3.10, as well as how the Kubernetes Service, related to the Data Plane, should be created.

The Data Plane declaration also defines the “unstrusted_lua_sandbox_requires” environment variables with functions used by the Observability layer in order to allow them to be executed inside the Data Plane's Lua sandbox.

cat <<EOF | kubectl apply -f -
kind: KonnectExtension
apiVersion: konnect.konghq.com/v1alpha1
metadata:
 name: konnect-config1
 namespace: kong
spec:
 clientAuth:
   certificateSecret:
     provisioning: Automatic
 konnect:
   controlPlane:
     ref:
       type: konnectNamespacedRef
       konnectNamespacedRef:
         name: ai-gateway
---
apiVersion: gateway-operator.konghq.com/v1beta1
kind: DataPlane
metadata:
 name: dataplane1
 namespace: kong
spec:
 extensions:
 - kind: KonnectExtension
   name: konnect-config1
   group: konnect.konghq.com
 deployment:
   podTemplateSpec:
     spec:
       containers:
       - name: proxy
         image: kong/kong-gateway:3.10
         env:
         - name: KONG_UNTRUSTED_LUA_SANDBOX_REQUIRES
           value: pl.stringio, ffi-zlib, cjson.safe
 network:
   services:
     ingress:
       name: proxy1
       type: LoadBalancer
EOF

You can check the Data Plane logs with

kubectl logs -f $(kubectl get pod -n kong -o json | jq -r '.items[].metadata | select(.name | startswith("dataplane-"))' | jq -r '.name') -n kong

Also, you should see your first Data Plane listed and connected to the previously created Konnect Control Plane:

Kong Konnect Data Plane

Consume the Data Plane

If you check the new Kubernetes Service, since it's been created as “Load Balancer”, and we are running Minikube, you'll see that its external ip is defined as “127.0.0.1”.

% kubectl get service proxy1 -n kong
NAME     TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE
proxy1   LoadBalancer   10.103.170.14   127.0.0.1     80:30350/TCP,443:30510/TCP   7m34s

You should also see that, in the terminal running Minikube's tunnel, the “proxy1” service has been exposed. It might need to type your admin's password.

❗  The service/ingress proxy1 requires privileged ports to be exposed: [80 443]
🔑  sudo permission will be asked for it.
🏃  Starting tunnel for service proxy1.

If you send a request to it you should get a response from the Data Plane

% curl -i http://localhost
HTTP/1.1 404 Not Found
Date: Thu, 24 Apr 2025 14:07:26 GMT
Content-Type: application/json; charset=utf-8
Connection: keep-alive
Content-Length: 103
X-Kong-Response-Latency: 1
Server: kong/3.10.0.1-enterprise-edition
X-Kong-Request-Id: 2b83f976af69655067bb5a4320a4677f

{
  "message":"no Route matched with those values",
  "request_id":"2b83f976af69655067bb5a4320a4677f"
}

Finally, if you want to delete all components run:

kubectl delete dataplane dataplane1 -n kong
kubectl delete konnectextensions.konnect.konghq.com konnect-config1 -n kong
kubectl delete konnectgatewaycontrolplane ai-gateway -n kong
kubectl delete konnectapiauthconfiguration konnect-api-auth-conf -n kong
kubectl delete secret konnect-pat -n kong
kubectl delete namespace kong

Creating Kong Objects using decK

In this next step, you'll create the Kong Objects required to consume the OpenAI LLM model: Kong Service, Kong Route and Kong Plugin. You can continue using KGO and Kubernetes style declarations. However, we are going to exercise decK (declarations for Kong). With decK you can manage Kong Konnect configuration and create Kong Objects in a declarative way. Check the decK documentation to learn how to install it.

Once you have decK installed, you should ping Konnect to check if the connection is up using the same PAT you created previously. You can use the following command to do it:

% deck gateway ping --konnect-token $PAT
Successfully Konnected to the Kong organization!

decK declaration

Now, create a file named "kong_agent_simple.yaml” with the following decK declaration:

cat > kong_agent_simple.yaml << 'EOF'
_format_version: "3.0"
_info:
  select_tags:
  - agent
_konnect:
  control_plane_name: ai-gateway
services:
- name: agent-service
  host: localhost
  port: 32000
  routes:
  - name: agent-route1
    paths:
    - /agent-route
    plugins:
    - name: ai-proxy-advanced
      instance_name: "ai-proxy-advanced-openai-agent"
      enabled: true
      config:
        targets:
        - auth:
            header_name: "Authorization"
            header_value: "Bearer <your_OPENAI_API_KEY>"
          route_type: "llm/v1/chat"
          model:
            provider: "openai"
            name: "gpt-4"
EOF

The declaration defines multiple Kong Objects:

Kong Gateway Service named "agent-service". The service doesn’t need to map to any real upstream URL. In fact, it can point somewhere empty, for example, http://localhost:32000. This is because the AI Proxy plugin, also configured in the declaration, overwrites the upstream URL. This requirement will be removed in a later Kong revision.
Kong Route: the Gateway Service has a route defined with the "/agent-route" path. That's the route we're going to consume to reach OpenAI's GPT-4 LLM.
AI Proxy Advanced Plugin. It's configured to consume OpenAI's “gpt-4” model. The “route_type” parameter, set as “llm/v1/chat”, refers to OpenAI's “https://api.openai.com/v1/chat/completions” endpoint. Kong recommends storing the API Keys as secrets in a Secret Manager like AWS Secrets Manager or HashiCorp Vault. The current configuration, including the OpenAI API Key in the declaration, is for lab environments only, not recommended for production. Please refer to the official AI Proxy Advanced Plugin documentation page to learn more about its configuration.

The declaration has been tagged as "agent" so you can manage its objects without impacting any other ones you might have created previously. Also, note the declaration is saying it should be applied to your "ai-gateway" Konnect Control Plane.

You can submit the declaration with the following decK command:

deck gateway sync --konnect-token $PAT kong_agent_simple.yaml

If you want to destroy all objects run:

deck gateway reset --konnect-control-plane-name "ai-gateway" --select-tag "agent" --konnect-token $PAT -f

AI Agent

Now, we're ready to get back to our Agent. There are some changes you have to apply to your code to make Kong Version out of it:

To redirect the requests to Kong AI Data Plane, you have to change your OpenAI constructor to refer to Kong Route.
Note that the OpenAI constructor still requires the OpenAI API Key. However, since it's managed by Kong, you can use any value for it.
Since the AI Proxy Advanced Plugin has been configured to consume the GPT-4 model, you can set the “model” parameter you have in the “chat.completions.create” call as an empty string.
And, of course, you can delete the line where you import the “os” package.

Here's the new constructor:

…
client = OpenAI(
    base_url="http://localhost:80/agent-route",
    api_key="dummy"
)
…

If you run your new code, you are supposed to get similar results, as Kong AI Gateway is not applying any policy to your requests. So far, if you check your Data Plane, you'll see all requests that have been processed by the Data Plane and routed to OpenAI's GPT-4 model.

Here’s the Kong Service Analytics tab:

Kong Konnect Analytics

As you can see, the Agent sent 5 requests to Kong Data Plane (and therefore, GPT-4), one for each step it has processed to respond to our question.

Moreover, the Konnect Control Plane provides extensive Analytics capabilities where you can check all requests processed by the Data Planes. Click on “Analytics” menu option and “Requests”:

Kong Konnect processed requests

Note you can introspect all processed requests including all major data like HTTP method, Kong Route and Service, Upstream URI, etc. For performance and security reasons, the Data Plane does not report the Control Plane with the body of the requests, which might be interesting to check, especially for situations like AI Agents. In the next section, we are going to configure the Data Plane to externalize both requests and responses bodies

Kong AI Gateway Plugins

Before we jump to the next section, it's important to keep in mind that Kong AI Gateway provides an extensive list of AI-related plugins to implement specific policies related to:

Prompt Engineering
Semantic Processing
Rate Limiting based on tokens
Request and Response transformations based on LLM queries
PII (Personally Identifiable Information) sanitization

As we stated before, this blog post will leverage Semantic Routing capabilities to manage multiple LLMs sitting behind and getting protected by the Kong AI Gateway.

That concludes part one of the series. In part two, we're going to explore the fundamentals of LangGraph to create an AI Agent including Kong AI Gateway, Tools, and Function Calling.

Topics:AI

Agentic AI

AI Gateway

How to Strengthen a ReAct AI Agent with Kong AI Gateway