# RAG Application with Kong AI Gateway, AWS Bedrock, Redis and LangChain
Claudio Acquaviva
Principal Architect, Kong
For the last couple of years, Retrieval-Augmented Generation (RAG) architectures have become a rising trend for AI-based applications. Generally speaking, RAG offers a solution to some of the limitations in traditional generative AI models, such as accuracy and hallucinations, allowing companies to create more contextually relevant AI applications. The combination of retrieval-based systems like RAG and generative models such as Large Language Models (LLMs) will likely become more prevalent as companies seek to leverage their data for their new AI applications.
This blog post describes a RAG implementation using LangChain as the application orchestrator, driving Amazon Bedrock, protected by Kong AI Gateway, and Redis as the vector database.
The Kong AI Gateway provides AI infrastructure for every team building new GenAI applications, and it combines the Kong API Gateway features with a set of AI-focused plugins that allow us to ensure data security and privacy, response quality and accuracy, and finally data and infra readiness for AI, while also providing out of the box observability for our LLM traffic and the ability to manage access tiers to the LLMs. This approach allows applications to adopt LLM infrastructures within their existing API traffic flow without having to rebuild common requirements that all AI applications need when they go into production.
The diagram below represents the architecture:
*The Kong AI Gateway sits in between the GenAI applications we build and the LLMs we consume.*
### RAG Applications and LLM Frameworks
Concurrently, customers started to leverage their AI infrastructures to build LLM-based applications. One of the main processes used for such applications is Retrieval-Augmented Generation, or RAG, for short.
Fundamentally, LLMs are trained, for a long period of time, on massive public datasets. Although that turns into a powerful solution, it lacks context. That's where RAG comes in: [as described in its original paper](https://arxiv.org/abs/2005.11401)as described in its original paper, it enhances the capabilities of LLMs by incorporating internal and domain-specific customer data optimizing the LLM output with a knowledge base outside the data sources used for the LLM training. That assimilation is solved during query time, meaning it's done without retraining the model.
The picture describes the process:
Basically, the RAG application comprises two main processes:
- **Data Preparation**: External data sources, including all sorts of documents, are chunked and submitted to an Embedding Model which converts them into embeddings. The embeddings are stored in a Vector Database.
- **Query time:** A consumer wants to send a query to the actual Prompt/Chat LLM model. However, the query should be enhanced with relevant data, taken from the Vector Database, and not available in the LLM model. The following steps are then performed:The consumer builds a Prompt.
- The RAG application converts the Prompt into an embedding, calling the Embedding Model.
- Leveraging Semantic Search, RAG matches the Prompt Embedding with the most relevant information and retrieves that Vector Database.
- The Vector Database returns relevant data as a response to the search query.
- The RAG application sends a query to the Prompt/Chat LLM Model combining the Prompt with the Relevant Data returned by the Vector Database.
- The LLM Model returns a response.
As you can see, there are multiple components in a typical RAG application implementation. You might see a need for a new one responsible for orchestrating the components. That's the main responsibility of emerging LLM frameworks like [LangChain](https://www.langchain.com/langchain)LangChain.
**This content contains a video which can not be displayed in Agent mode**
## Kong AI Gateway and Amazon Bedrock
We should get started discussing the benefits Kong AI Gateway brings to Amazon Bedrock for an LLM-based application. As we stated before, Kong AI Gateway leverages the existing Kong API Gateway extensibility model to provide specific AI-based plugins. Here are some values provided by Kong AI Gateway and its plugins:
- AI Proxy and AI Proxy Advanced plugins: the Multi-LLM capability allows the AI Gateway to abstract Amazon Bedrock (and other LLMs as well) load balancing models based on several policies including latency time, model usage, semantics etc. These plugins extract LLM observability metrics (like number of requests, latencies, and errors for each LLM provider) and the number of incoming prompt tokens and outgoing response tokens. All this is in addition to hundreds of metrics already provided by Kong Gateway on the underlying API requests and responses. Finally, Kong AI Gateway leverages the observability capabilities provided by Konnect to track Amazon Bedrock usage out of the box, as well as generate reports based on monitoring data.
-
Prompt Engineering:
- AI Prompt Template plugin, responsible for pre-configuring AI prompts to users
- AI Prompt Decorator plugin, which injects messages at the start or end of a caller's chat history.
- AI Prompt Guard plugin lets you configure a series of PCRE-compatible regular expressions to allow and block specific prompts, words, phrases, or otherwise and have more control over a LLM service, controlled by Amazon Bedrock.
- AI Semantic Prompt Guard plugin to self-configurable semantic (or pattern-matching) prompt protection.
- AI Semantic Cache plugin caches responses based on threshold, to improve performance (and therefore end-user experience) and cost.
- AI Rate Limiting Advanced, you can tailor per-user or per-model policies based on the tokens returned by the LLM provider under Amazon Bedrock management or craft a custom function to count the tokens for requests.
- AI Request Transformer and AI Response Transformer plugins seamlessly integrate with the LLM on Amazon Bedrock, enabling introspection and transformation of the request's body before proxying it to the Upstream Service and prior to forwarding the response to the client.
Besides, the Kong AI Gateway use cases can combine policies implemented by hundreds of Kong Gateway plugins, such as:
- Authentication and Authorization: OIDC, mTLS, API Key, LDAP, SAML, Open Policy Agent (OPA)
- Traffic Control: Request Validator and Size Limiting, WebSocket support, Route by Header, etc.
- Observability: OpenTelemetry (OTel), Prometheus, Zipkin, etc.
Also, from the architecture perspective, in a nutshell, the Konnect Control Plane and Data Plane nodes topology remains the same.
*By leveraging the same underlying core of Kong Gateway, we're reducing complexity in deploying the AI Gateway capabilities as well. And of course, it works on Konnect, Kubernetes, self-hosted, or across multiple clouds.*
The RAG implementation architecture should include components representing and responsible for the functional scope described above. The architecture comprises:
- Redis as the Vector Database
- Amazon Bedrock supporting both Prompt/Chat and Embeddings Models
- Kong AI Gateway to abstract and protect Amazon Bedrock as well as implement LLM-based use cases.
Kong AI Gateway, implemented as a regular Konnect Data Plane Node, and Redis run on an Amazon EKS 3.1 Cluster.
The architecture can be analyzed from both perspectives: Data Preparation and actual Query time. Let's take a look at the Data Preparation time.
### Data Preparation time
The Data Preparation time is quite simple: the Document Loader component sends data chunks, based on the data taken, and calls Amazon Bedrock asking the Embedding Model to convert them into embeddings. The embeddings are then stored in the Redis Vector Database. Here's a diagram describing the flow:
For the purpose of this blog post, we're going to use the "amazon.titan-embed-text-v1" Embedding Model.
### Query time
Logically speaking we can break the Query time into some main steps:
- The AI Application builds a Prompt and, using the same "amazon.titan-embed-text-v1" Embedding Model, generates embeddings to it.
- The AI Application takes the embeddings and semantically searches the Vector Database. The database returns relevant data related to the context.
- The AI Application sends the prompt, augmented with the retrieved data received from the Vector Database, to Kong AI Gateway.
- Kong AI Gateway, using the AI Proxy (or AI Proxy Advanced) plugin, proxies the request to Amazon Bedrock and returns the response to the AI Application. Kong AI Gateway enforces eventual policies previously enabled.
## Kong AI Gateway and Amazon Bedrock Integration and APIs
Let's take a look at the specific integration point between Kong AI Gateway and Amazon Bedrock. To get a better understanding, here's the architecture cut isolating both:
The consumer can be any RESTful-based component, in our case will be a LangChain application.
As you can see, there are two important topics here:
- OpenAI API specification
- Amazon Bedrock Converse API and EKS Pod Identity
When we add Kong AI Gateway, sitting in front of Amazon Bedrock, we're not just exposing it but also allowing the consumers to use the same mechanism — in this case, OpenAI APIs — to consume it. That leads to a very flexible and powerful capability when we come to development processes. In other words, Kong AI Gateway normalizes the consumption of any LLM infrastructure, including Amazon Bedrock, Mistral, OpenAI, Cohere, etc.
As an exercise, the new request should be something like this. The request has some minor differences:
- It sends a request to the Kong API Gateway Data Plane Node.
- It replaces the OpenAI endpoint with a Kong API Gateway route.
- The API Key is actually managed by the Kong API Gateway now.
- We're using an Amazon Bedrock Model, meta.llama3-70b-instruct-v1:0.
For example, here's a request to Amazon Bedrock, using the AWS CLI. Use "aws configure" first to set your Access and Secret Key as well as the AWS region you want to use then run the command.
aws bedrock-runtime converse \
--model-id amazon.titan-text-express-v1 \
--messages '[{"role":"user","content":[{"text":"Tell me about John Coltrane"}]}]'
It's reasonable to consume Bedrock service with local CLI commands. However, for Amazon EKS deployments, the recommended approach is [EKS Pod Identity](https://docs.aws.amazon.com/eks/latest/userguide/pod-identities.html)EKS Pod Identity, instead of simple long-term credentials like AWS Access and Secret Keys. In a nutshell, EKS Pod Identity allows the Data Plane Pod's container to use the AWS SDK and send API requests to AWS services using AWS Identity and Access Management (IAM) permissions.
Amazon EKS Pod Identity associations provide the ability to manage credentials for your applications, similar to the way that Amazon EC2 instance profiles provide credentials to Amazon EC2 instances.
As another best practice, we recommend the Private Key and Digital Certificate pair used by the Konnect Control Plane and the Data Plane connectivity to be stored in AWS Secrets Manager. In this sense, the Data Plane deployment refers to the secrets to get installed.
## Amazon Elastic Kubernetes Service (EKS) installation and preparation
Now, we're ready to get started with our Kong AI Gateway deployment. As the installation architecture defines, it will be running on an EKS Cluster.
Note command creates an EKS Cluster, version 1.31, with a single node based on the g6.xlarge instance type, powered by NVIDIA GPUs. That is particularly interesting if you're planning to deploy and run LLMs locally in the EKS Cluster.
### Cluster preparation
After the installation, we should prepare the Cluster to receive the other components:
Please, refer to the official documentation to learn more about the components and their installation processes.
## Redis installation
As defined in the implementation architecture, Redis plays the Vector Database role. It's going to be deployed and running in a specific EKS namespace and exposed with an NLB.
The second component to be installed is the Kong AI Gateway. The process can be divided into two steps:
- Pod Identity configuration
- Kong Data Plane deployment
### Pod Identity configuration
In this first step, we configure EKS Pod Identity describing which AWS Services the Data Plane Pods should be allowed to access. In our case, we need to consume Amazon Bedrock and AWS Secrets Manager.
Also, we're going to use AWS Secrets Manager to store our Private Key and Digital Certificate pair the Konnect Control Plane and Data Plane used to communicate.
Considering all this, let's create the IAM policy with the following request:
aws iam create-policy \
--policy-name bedrock-policy \
--policy-document '{"Version":"2012-10-17","Statement":[{"Effect":"Allow","Action":["bedrock:InvokeModel","bedrock:InvokeModelWithResponseStream","secretsmanager:ListSecrets","secretsmanager:GetSecretValue"],"Resource":"*"}]}'
#### Pod Identity Association
Pod Identity takes a Kubernetes Service Account to manage the permissions. So create the Kubernetes namespace for the Kong Data Plane deployment and a simple Service Account inside of it.
kubectl create namespace kong
kubectl create sa kaigateway-podid-sa -n kong
Now we're ready to create the Pod Identity Association. We use the same eksctl command to do it:
Any Konnect subscription has a "default" Control Plane defined. You can proceed using it or optionally create a new one. The following instructions are based on a new Control Plane. You can create a new one by clicking on the "Gateway Manager" menu option. Then click on "New Control Plane" and choose the Kong Gateway option. Inside the "Kong Gateway" page type "AI Gateway" for the name and leave the "Self-Managed Hybrid Instances" option set.
#### Create a Konnect Vault for AWS Secrets Manager
Now, inside the new "AI Gateway" Control Plane, click on the "Vaults" menu option and create a Vault referring to AWS Secret Manager where the Private Key and Digital Certificate pair is going to be stored. Inside the "New Vault" page, define "Ohio (us-east-2)" for Region and "aws-secrets" for Prefix. Click on "Save."
#### Private Key and Digital Certificate pair creation
Still inside the "AI Gateway" Control Plane, click on "Data Plane Nodes" menu option and "New Data Plane Node" button.
Choose Kubernetes as your platform. Click on "Generate certificate", copy and save the Digital Certificate and Private Key as tls.crt and tls.key files as described in the instructions. Also copy and save the configuration parameters in a values.yaml file.
#### Store the Private Key and Digital pair in AWS Secrets Manager
Store the pair in AWS Secret Manager with the following commands:
Finally, the last step will deploy the Data Plane. Take the values.yaml file and change it adding the AWS Secret Manager references and the Kubernetes Service Account used by the Pod Identity Association. The updated version should be something like this. Replace the cluster_* endpoints and server names with yours:
- Since the Private Key and Digital Certificate pair has been stored in AWS Secrets, remove the "secretVolumes" section.
- Replace the "cluster_cert" and "cluster_cert_key" fields with Konnect Vault references and AWS Secret Manager secrets.
- Add the proxy annotation to request a public NLB for the Data Plane.
- Add the deployment section referring to the Kubernetes Service Account that has been used to create the Pod Identity Association.
Use the Helm command to deploy the Data Plane:
helm repo add kong https://charts.konghq.comhelm repo update
helm install kong kong/kong -n kong --values ./values.yaml
### Checking the Data Plane
Use the Load Balancer created during the deployment:
DATA_PLANE_LB=$(kubectl get service -n kong kong-kong-proxy --output=jsonpath='{.status.loadBalancer.ingress[0].hostname}')
You should get a response like this:
% curl -i $DATA_PLANE_LB
HTTP/1.1404 Not Found
Date: Thu,03 Oct 202412:36:53 GMT
Content-Type: application/json; charset=utf-8Connection: keep-alive
Content-Length:103X-Kong-Response-Latency:0Server: kong/3.8.0.0-enterprise-edition
X-Kong-Request-Id: c655f65a2d1fef7c7ed0550e3f7dc553
{"message":"no Route matched with those values","request_id":"c655f65a2d1fef7c7ed0550e3f7dc553"}
Now we can define the Kong Objects necessary to expose and control Bedrock, including Kong Gateway Service, Routes, and Plugins.
## decK
With [decK](https://docs.konghq.com/deck/latest/)decK (declarations for Kong) you can manage Kong Konnect configuration and create Kong Objects in a declarative way. decK state files describe the configuration of Kong API Gateway. State files encapsulate the complete configuration of Kong in a declarative format, including services, routes, plugins, consumers, and other entities that define how requests are processed and routed through Kong. Please check the decK documentation to learn how to [install](https://docs.konghq.com/deck/latest/installation/)install it.
- Kong Route: the Gateway Service has a route defined with the "/bedrock-route" path. That's the route we're going to consume to reach out to Bedrock.
-
Kong Route Plugins: the Kong Route has some plugins configured. Note that only the AI Proxy and Key Auth Plugins are enabled. The other ones are configured but disabled.
- Kong Consumer: the declaration also defines a Kong Consumer, named "user1" with an API Key based credential. The Kong Consumer will allow the Kong AI Gateway to map the incoming requests with the API Key "apikey" (required by the Key Auth Plugin) with the credential "123456" to the Kong Consumer "user1". Once the Kong Consumer is mapped, the Kong AI Gateway can apply policies, defined with other Plugins, to this specific Kong Consumer.
The declaration has been tagged as "rag-bedrock" so you can manage its objects without impacting any other ones you might have created previously. Also, note the declaration is saying it should be applied to the "AI Gateway" Konnect Control Plane.
You can submit the declaration with the following decK command:
$ deck gateway sync --konnect-token $PAT kong_bedrock.yaml
creating consumer user1
creating service bedrock-rag-service
creating key-auth 123456 for consumer user1
creating route bedrock-rag-route1
creating plugin key-auth for route bedrock-rag-route1
creating plugin ai-prompt-decorator for route bedrock-rag-route1
creating plugin ai-proxy for route bedrock-rag-route1
creating plugin ai-rate-limiting-advanced for consumer user1
Summary: Created:8 Updated:0 Deleted:0
### Consume the Kong AI Gateway Route
You can now send a request consuming the Kong Route. Note we have injected the API Key with the expected name, "apikey", and with a value that corresponds to the Kong Consumer, "123456". If you change the value or the API Key name, or even not add it, you get a 401 (Unauthorized) error.
curl -s -X POST http://$DATA_PLANE_LB/bedrock-route \ -H "Content-Type: application/json" \
-H "apikey: 123456" \
-d '{"messages":[{"role":"user","content":"What is niilism?"}],"model":"meta.llama3-70b-instruct-v1:0"}' | jq
To enable the other Plugins you can change the declaration as "enabled: true" or use the Konnect UI.
## LangChain
With all the components we need for our RAG application in place, it's time for the most exciting part of this blog post: the RAG application.
The minimalist app we're going to present focuses on the main steps of the RAG processes. It's not intended to be used other than learning processes and maybe lab environments.
Basically, the application comprises two Python scripts, using [LangChain](https://www.langchain.com/langchain)LangChain, one of the main frameworks available today to write AI applications including LLMs and RAG. The scripts are the following:
- **rag_redis_bedrock_langchain_embeddings.py**: it implements the Data Preparation process with Redis and Bedrock's "amazon.titan-embed-text-v1" Embedding Model.
- **rag_redis_bedrock_langchain_kong_ai_gateway.py**: responsible for the Query time implementation combining Kong AI Gateway, Redis, and Bedrock.
From the RAG perspective, the application is based on an interview Marco Palladino, Kong's co-founder and CTO, gave discussing Kong AI Gateway and our current technology environment.
After the execution, you can check the Redis database using its dashboard or with the following command. You should see 52 points keys:
% kubectl exec $(kubectl get pod -n redis -o json | jq -r '.items[].metadata.name') -n redis -- redis-cli info keyspace
# Keyspace
db0:keys=52,expires=0,avg_ttl=0,subexpiry=0
#### Python script code
Here's the code of the first Python script:
import requests
#from langchain_redis import RedisConfig, RedisVectorStore
from langchain_community.vectorstores.redis import Redis
from langchain_aws import BedrockEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain_community.document_loaders import TextLoader
from langchain_community.document_loaders import DirectoryLoader
# Download transcript text and split it into chunks
# https://softwareengineeringdaily.com/2024/06/20/its-apis-all-the-way-down-with-marco-palladinourl = "http://softwareengineeringdaily.com/wp-content/uploads/2024/06/SED1683-Kong.txt"headers = {'User-Agent': 'LangChain'}res = requests.get(url, headers=headers)
with open("ragdoc.txt","w") as f: f.write(res.text)
loader = DirectoryLoader(".","ragdoc.txt", loader_cls=TextLoader)
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=50)
chunks = text_splitter.split_documents(documents)
# Generate embeddings based on the text chunks using Bedrock's Embedding Model and store them in Redis
redis_url = "redis://a7c63f0c9a740483987398a6c0b6e444-967754848.us-east-2.elb.amazonaws.com:6379"embeddings = BedrockEmbeddings(region_name="us-east-1", model_id="amazon.titan-embed-text-v1")
index_name = "docs-bedrock"
vectorstore = Redis(
redis_url=redis_url, embedding=embeddings, index_name=index_name,)
vectorstore.drop_index(redis_url=redis_url, index_name=index_name, delete_documents=True)
vectorstore.add_documents(documents=chunks)
### LangChain Chain
Before exploring the second Python script, responsible for the actual RAG Query, it's important to get a better understanding of what a LangChain Chain is. Here's a simple example including Kong AI Gateway and Amazon Bedrock. Note that the code does not implement RAG, restricted to explore the Chain concept. The next section will evolve the example to include RAG.
The code requires the installation of the "langchain_openai" package.
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
kong_dp = "http://k8s-kong-kongkong-cf4df736b2-f8ba17426cc444cc.elb.us-east-2.amazonaws.com"bedrock_rag_url = kong_dp + "/bedrock-route"prompt = ChatPromptTemplate.from_template("tell me what you think about {topic}")
llm = ChatOpenAI(base_url=bedrock_rag_url, model_name="meta.llama3-70b-instruct-v1:0", temperature=0, api_key="dummy", default_headers={"apikey":"123456"})
chain = prompt | llm
print("\nanswer: ", chain.invoke({"topic":"Ozempic"}).content)
print("\nanswer: ", chain.invoke({"topic":"Fantastic Negrito"}).content)
The most important line of the code is the creation of the LangChain Chain: chain = prompt | llm.
Note that the Class refers to the Kong AI Gateway Route. We're asking the Gateway to consume the Bedrock model meta.llama3-70b-instruct-v1:0. Besides, since the Route has the Key Auth plugin enabled, we've added the API Key using the default_headers parameter. Finally, the standard and required api_key parameter should be passed but it's totally ignored by the Gateway.
### RAG Query
Now, let's evolve the script to add RAG. This time, LangChain Chain will be updated to include the Vector Database.
First, remember that, in a RAG application, the Query should combine a Prompt and contextual and relevant data, related to the Prompt, coming from the Vector Database. Something like this:
And the code snippet with the new Chain should be like this. Fundamentally, the Chain has included two new steps: one before and another after the original prompt and llm steps.
So let's review all of them. However, before jumping to the Chain, we have to describe a bit how it is going to hit the Redis Vector Database to get the Relevant Data coming from it.
#### retriever
The first line gets the reference to the Redis index we created in the first Python script.
retriever = vectorstore.as_retriever(
search_type="similarity_distance_threshold", search_kwargs={"distance_threshold":0.6})
print("Redis query: \n" + str(retriever.invoke(input="what did Marco say about Ingress Controller?")[0]))
Once we have the RAG Application done, it'd be interesting to start enabling the AI Plugins we have created.
### Kong AI Prompt Decorator plugin
For example, besides the Key Auth plugin, you can enable the AI Prompt Decorator, configured with decK, to see the response in Brazilian Portuguese.
You can use decK again, if you will or go to Konnect UI and look for the Kong Route's Plugins page. Use the toggle button to enable the Plugin.
### Kong AI Rate Limiting Advanced plugin
You can also enable the AI Rate Limiting Advanced plugin we have created for the user. Notice that, differently to the AI Prompt Decorator plugin, this plugin has been applied to the existing Kong Consumer created by the decK declaration. If you create another Kong Consumer (and a different Credential), the AI Rate Limiting plugin will not be enforced.
Also note that the AI Rate Limiting Advanced defines policies based on the number of tokens returned by the LLM provider, not the number of requests as we usually have for regular rate limiting plugins.
### Python script code
Here's the code of the second Python script:
from langchain_community.vectorstores.redis import Redis
from langchain_aws import BedrockEmbeddings
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from langchain_core.runnables import RunnablePassthrough, RunnableParallel
from langchain_core.output_parsers import StrOutputParser
kong_dp = "http://k8s-kong-kongkong-af48daf85f-3e01263abaeeb50a.elb.us-east-2.amazonaws.com"bedrock_rag_url = kong_dp + "/bedrock-route"redis_url = "redis://a7c63f0c9a740483987398a6c0b6e444-967754848.us-east-2.elb.amazonaws.com:6379"embeddings = BedrockEmbeddings(region_name="us-east-1", model_id="amazon.titan-embed-text-v1")
index_name = "docs-bedrock"# Step 1: Get Redis Index
vectorstore = Redis(
redis_url=redis_url, embedding=embeddings, index_name=index_name,)
# Step 2: Retrieve & Augment
# Once you have the vector database reference, you can define it as the retriever component,# which fetches the additional context based on the semantic similarity between the user query and the embedded chunks.
retriever = vectorstore.as_retriever(
search_type="similarity_distance_threshold", search_kwargs={"distance_threshold":0.6})
# Next, to augment the prompt with the additional context, you need to prepare a prompt template.
# The prompt can be easily customized from a prompt template, as shown below.
template = """
Answer the question based only on the following context:{context}Question:{question}"""
prompt = ChatPromptTemplate.from_template(template)
# Step 3: Generate
# Finally, you can build a chain for the RAG pipeline, chaining together the retriever, the prompt template and the LLM.
# Once the RAG chain is defined, you can invoke it.
llm = ChatOpenAI(base_url=bedrock_rag_url, model_name="meta.llama3-70b-instruct-v1:0", temperature=0, api_key="dummy", default_headers={"apikey":"123456"})
retriever_prompt_setup = RunnableParallel({"context": retriever,"question": RunnablePassthrough()})
rag_chain = (
retriever_prompt_setup
| prompt
| llm
| StrOutputParser()
)
query = "What did Marco say about AI Gateway?"print("\nRAG query: " + query)
print("\nRAG answer: ", rag_chain.invoke(query))
query = "Is there any reference to Kubernetes in the context?"print("\nRAG query: " + query)
print("\nRAG answer: ", rag_chain.invoke(query))
query = "What did MP say about Chopin?"print("\nRAG query: " + query)
print("\nRAG answer: ", rag_chain.invoke(query))
## Conclusion
This blog post has presented a basic RAG Application using Kong AI Gateway, LangChain, Amazon Bedrock, and Redis as the Vector Database. It's totally feasible to implement advanced RAG apps with query transformation, multiple data sources, multiple retrieval stages, RAG Agents, etc. Moreover, Kong AI Gateway provides other plugins to enrich the relationship with the LLM providers, including Semantic Cache, Semantic Routing, Request and Response Transformation, etc.
Also, it's important to keep in mind that, just like we did with the Key Auth plugins, we can continue combining other API Gateway plugins to your AI-based use cases like using the OIDC plugin to secure your Foundation Models with AWS Cognito, using the Prometheus plugin to monitor your AI Gateway with Amazon Managed Prometheus and Grafana, and so on.
Finally, the architect flexibility provided natively by Konnect and Kong API Gateway allows us to deploy the Data Planes in a variety of platforms including AWS EC2 VMs, Amazon ECS, and Kong Dedicated Cloud Gateway, Kong's SaaS service for the Data Planes running in AWS.
In the latest Kong Gateway 3.12 release , announced October 2025, specific MCP capabilities have been released: AI MCP Proxy plugin: it works as a protocol bridge, translating between MCP and HTTP so that MCP-compatible clients can either call exi
APIs have quietly powered the global shift to an interconnected economy. They’ve served as the data exchange highways behind the seamless experiences we now take for granted — booking a ride, paying a vendor, sending a message, syncing financial rec
More easily manage AI spend, build AI agents and chatbots, get real-time AI responses, and ensure content safety
We're introducing several new Kong AI Gateway capabilities in Kong Gateway 3.7 and Kong Gateway Enterprise 3.7, including enterprise-o
Bring Financial Accountability to Enterprise LLM Usage with Konnect Metering and Billing
Showback and chargeback are not the same thing. Most organizations conflate these two concepts, and that conflation delays action. Understanding the LLM showb
Kong Gateway is an API gateway and a core component of the Kong Konnect platform . Built on a plugin-based extensibility model, it centralizes essential functions such as proxying, routing, load balancing, and health checking, efficiently manag
AI observability extends traditional monitoring by adding behavioral telemetry for quality, safety, and cost metrics alongside standard logs, metrics, and traces Time-to-First-Token (TTFT) and token usage metrics are critical performance indicator
Why AI guardrails matter It's natural to consider the necessity of guardrails for your sophisticated AI implementations. The truth is, much like any powerful technology, AI requires a set of protective measures to ensure its reliability and integrit