REGISTER NOW FOR THE KONG AGENTIC ERA WORLD TOUR GOVERN A2A TRAFFIC WITH KONG'S NEW AGENT GATEWAY WHY GARTNER’S “CONTEXT MESH” CHANGES EVERYTHING DON’T MISS API + AI SUMMIT 2026 SEPT 30 – OCT 1
  • [Why Kong](/company/why-kong)Why Kong
    • Explore the unified API Platform
        • BUILD APIs
        • [
          Kong Insomnia](/products/kong-insomnia)
          Kong Insomnia
        • [
          API Design](/products/kong-insomnia/api-design)
          API Design
        • [
          API Mocking](/products/kong-insomnia/api-mocking)
          API Mocking
        • [
          API Testing and Debugging](/products/kong-insomnia/api-testing-and-debugging)
          API Testing and Debugging
        • [
          MCP Client](/products/kong-insomnia/mcp-client)
          MCP Client
        • RUN APIs
        • [
          API Gateway](/products/kong-gateway)
          API Gateway
        • [
          Context Mesh](/products/kong-konnect/features/context-mesh)
          Context Mesh
        • [
          AI Gateway](/products/kong-ai-gateway)
          AI Gateway
        • [
          Event Gateway](/products/event-gateway)
          Event Gateway
        • [
          Kubernetes Operator](/products/kong-gateway-operator)
          Kubernetes Operator
        • [
          Service Mesh](/products/kong-mesh)
          Service Mesh
        • [
          Ingress Controller](/products/kong-ingress-controller)
          Ingress Controller
        • [
          Runtime Management](/products/kong-konnect/features/runtime-management)
          Runtime Management
        • DISCOVER APIs
        • [
          Developer Portal](/products/kong-konnect/features/developer-portal)
          Developer Portal
        • [
          Service Catalog](/products/kong-konnect/features/api-service-catalog)
          Service Catalog
        • [
          MCP Registry](/products/mcp-registry)
          MCP Registry
        • GOVERN APIs
        • [
          Metering and Billing](/products/kong-konnect/features/usage-based-metering-and-billing)
          Metering and Billing
        • [
          APIOps and Automation](/products/apiops-automation)
          APIOps and Automation
        • [
          API Observability](/products/kong-konnect/features/api-observability)
          API Observability
        • [Why Kong?](/company/why-kong)Why Kong?
      • CLOUD
      • [Cloud API Gateways](/products/kong-konnect/features/dedicated-cloud-gateways)Cloud API Gateways
      • [Need a self-hosted or hybrid option?](/products/kong-enterprise)Need a self-hosted or hybrid option?
      • COMPARE
      • [Considering AI Gateway alternatives? ](/performance-comparison/ai-gateway-alternatives)Considering AI Gateway alternatives?
      • [Kong vs. Postman](/performance-comparison/kong-vs-postman)Kong vs. Postman
      • [Kong vs. MuleSoft](/performance-comparison/kong-vs-mulesoft)Kong vs. MuleSoft
      • [Kong vs. Apigee](/performance-comparison/kong-vs-apigee)Kong vs. Apigee
      • [Kong vs. IBM](/performance-comparison/ibm-api-connect-vs-kong)Kong vs. IBM
      • GET STARTED
      • [Sign Up for Kong Konnect](/products/kong-konnect/register)Sign Up for Kong Konnect
      • [Documentation](https://developer.konghq.com/)Documentation
      • FOR PLATFORM TEAMS
      • [Developer Platform](/solutions/building-developer-platform)Developer Platform
      • [Kubernetes and Microservices](/solutions/build-on-kubernetes)Kubernetes and Microservices
      • [Observability](/solutions/observability)Observability
      • [Service Mesh Connectivity ](/solutions/service-mesh-connectivity)Service Mesh Connectivity
      • [Kafka Event Streaming](/solutions/kafka-stream-api-management)Kafka Event Streaming
      • FOR EXECUTIVES
      • [AI Connectivity](/ai-connectivity)AI Connectivity
      • [Open Banking](/solutions/open-banking)Open Banking
      • [Legacy Migration](/solutions/legacy-api-management-migration)Legacy Migration
      • [Platform Cost Reduction](/solutions/api-platform-consolidation)Platform Cost Reduction
      • [Kafka Cost Optimization](/solutions/reduce-kafka-cost)Kafka Cost Optimization
      • [API Monetization](/solutions/api-monetization)API Monetization
      • [AI Monetization](/solutions/ai-monetization)AI Monetization
      • [AI FinOps](/solutions/ai-cost-governance-finops)AI FinOps
      • FOR AI TEAMS
      • [Agent Gateway](/agent-gateway)Agent Gateway
      • [AI Governance](/solutions/ai-governance)AI Governance
      • [AI Security](/solutions/ai-security)AI Security
      • [AI Cost Control](/solutions/ai-cost-optimization-management)AI Cost Control
      • [Agentic Infrastructure](/solutions/agentic-ai-workflows)Agentic Infrastructure
      • [MCP Production](/solutions/mcp-production-and-consumption)MCP Production
      • [MCP Traffic Gateway](/solutions/mcp-governance)MCP Traffic Gateway
      • FOR DEVELOPERS
      • [Mobile App API Development](/solutions/mobile-application-api-development)Mobile App API Development
      • [GenAI App Development](/solutions/power-openai-applications)GenAI App Development
      • [API Gateway for Istio](/solutions/istio-gateway)API Gateway for Istio
      • [Decentralized Load Balancing](/solutions/decentralized-load-balancing)Decentralized Load Balancing
      • BY INDUSTRY
      • [Financial Services](/solutions/financial-services-industry)Financial Services
      • [Healthcare](/solutions/healthcare)Healthcare
      • [Higher Education](/solutions/api-platform-for-education-services)Higher Education
      • [Insurance](/solutions/insurance)Insurance
      • [Manufacturing](/solutions/manufacturing)Manufacturing
      • [Retail](/solutions/retail)Retail
      • [Software & Technology](/solutions/software-and-technology)Software & Technology
      • [Transportation](/solutions/transportation-and-logistics)Transportation
      • [See all Solutions](/solutions)See all Solutions
  • [Pricing](/pricing)Pricing
      • DOCUMENTATION
      • [Kong Konnect](https://developer.konghq.com/konnect/)Kong Konnect
      • [Kong Gateway](https://developer.konghq.com/gateway/)Kong Gateway
      • [Kong Mesh](https://developer.konghq.com/mesh/)Kong Mesh
      • [Kong AI Gateway](https://developer.konghq.com/ai-gateway/)Kong AI Gateway
      • [Kong Event Gateway](https://developer.konghq.com/event-gateway/)Kong Event Gateway
      • [Kong Insomnia](https://developer.konghq.com/insomnia/)Kong Insomnia
      • [Plugin Hub](https://developer.konghq.com/plugins/)Plugin Hub
      • EXPLORE
      • [Blog](/blog)Blog
      • [Learning Center](/blog/learning-center)Learning Center
      • [eBooks](/resources/e-book)eBooks
      • [Reports](/resources/reports)Reports
      • [Demos](/resources/demos)Demos
      • [Customer Stories](/customer-stories)Customer Stories
      • [Videos](/resources/videos)Videos
      • EVENTS
      • [API + AI Summit](/events/conferences/api-ai-summit)API + AI Summit
      • [Agentic Era World Tour](/agentic-era-world-tour)Agentic Era World Tour
      • [Webinars](/events/webinars)Webinars
      • [User Calls](/events/user-calls)User Calls
      • [Workshops](/events/workshops)Workshops
      • [Meetups](/events/meetups)Meetups
      • [See All Events](/events)See All Events
      • FOR DEVELOPERS
      • [Get Started](https://developer.konghq.com/)Get Started
      • [Community](/community)Community
      • [Certification](/academy/certification)Certification
      • [Training](https://education.konghq.com)Training
      • COMPANY
      • [About Us](/company/about-us)About Us
      • [We're Hiring!](/company/careers)We're Hiring!
      • [Press Room](/company/press-room)Press Room
      • [Contact Us](/company/contact-us)Contact Us
      • [Kong Partner Program](/partners)Kong Partner Program
      • [Enterprise Support Portal](https://support.konghq.com/s/)Enterprise Support Portal
      • [Documentation](https://developer.konghq.com/?_gl=1*tphanb*_gcl_au*MTcxNTQ5NjQ0MC4xNzY5Nzg4MDY0LjIwMTI3NzEwOTEuMTc3MzMxODI2MS4xNzczMzE4MjYw*_ga*NDIwMDU4MTU3LjE3Njk3ODgwNjQ.*_ga_4JK9146J1H*czE3NzQwMjg1MjkkbzE4OSRnMCR0MTc3NDAyODUyOSRqNjAkbDAkaDA)Documentation
  • [](/search)
  • [Login](https://cloud.konghq.com/login)Login
  • [Book Demo](/contact-sales)Book Demo
  • [Get Started](/products/kong-konnect/register)Get Started
[Blog](/blog)Blog
  • [AI Gateway](/blog/tag/ai-gateway)AI Gateway
  • [AI Security](/blog/tag/ai-security)AI Security
  • [AIOps](/blog/tag/aiops)AIOps
  • [API Security](/blog/tag/api-security)API Security
  • [API Gateway](/blog/tag/api-gateway)API Gateway
|
    • [API Management](/blog/tag/api-management)API Management
    • [API Development](/blog/tag/api-development)API Development
    • [API Design](/blog/tag/api-design)API Design
    • [Automation](/blog/tag/automation)Automation
    • [Service Mesh](/blog/tag/service-mesh)Service Mesh
    • [Insomnia](/blog/tag/insomnia)Insomnia
    • [Event Gateway](/blog/tag/event-gateway)Event Gateway
    • [View All Blogs](/blog/page/1)View All Blogs
We're Entering the Age of AI Connectivity [Read more](/blog/news/the-age-of-ai-connectivity)Read moreProducts & Agents:
    • [Kong AI Gateway](/products/kong-ai-gateway)Kong AI Gateway
    • [Kong API Gateway](/products/kong-gateway)Kong API Gateway
    • [Kong Event Gateway](/products/event-gateway)Kong Event Gateway
    • [Kong Metering & Billing](/products/usage-based-metering-and-billing)Kong Metering & Billing
    • [Kong Insomnia](/products/kong-insomnia)Kong Insomnia
    • [Kong Konnect](/products/kong-konnect)Kong Konnect
  • [Documentation](https://developer.konghq.com)Documentation
  • [Book Demo](/contact-sales)Book Demo
  1. Home
  2. Blog
  3. Engineering
  4. Semantic Processing and Vector Similarity Search with Kong and Redis
[Engineering](/blog/engineering)Engineering
April 29, 2025
14 min read

# Semantic Processing and Vector Similarity Search with Kong and Redis

Claudio Acquaviva
Principal Architect, Kong

Kong has supported Redis since its early versions. In fact, the integration between Kong Gateway and Redis is a powerful combination to enhance API management. We can summarize the integration points and use cases of Kong and Redis into three main groups:

  • - **Kong Gateway:** Kong integrates with Redis via plugins that enable it to leverage Redis’ capabilities for enhanced API functionality including Caching, Rate Limiting and Session Management.
  • - **Kong AI Gateway**: Starting with Kong Gateway 3.6, several new AI-based plugins leverage Redis Vector Databases to implement AI use cases, like Rate Limiting policies based on LLM tokens, Semantic Caching, Semantic Prompt Guards, and Semantic Routing.
  • - **RAG and Agent Applications**: Kong AI Gateway and Redis can collaborate for AI-based applications using frameworks like LangChain and LangGraph.

For all use cases, Kong supports multiple flavors of Redis deployments, including [Redis Community Edition](https://redis.io/docs/latest/get-started/)Redis Community Edition (including while using Redis Cluster for horizontal scalability or Redis Sentinel for high availability), [Redis Software](https://redis.io/software/)Redis Software (which provides enterprise capabilities often needed for production workloads) and [Redis Cloud](https://redis.io/cloud/)Redis Cloud (available on AWS, GCP and as Azure Managed Redis in Azure). 

This blog post focuses on how Kong and Redis can be used to address Semantic Processing use cases like Similarity Search and Semantic Routing across multiple LLM environments.

## Kong AI Gateway Reference Architecture

To get started let's take a look at a high-level reference architecture of the Kong AI Gateway. As you can see, the Kong Gateway Data Plane, responsible for handling the incoming traffic, can be configured with two types of Kong Plugins:

### Kong Gateway plugins

One of the main capabilities provided by Kong Gateway is extensibility. An extensive list of plugins allows you to implement specific policies to protect and control the APIs deployed in the Gateway. The plugins offload critical and complex processing usually implemented by backend services and applications. With the Gateway and its plugins in place, the backend services can focus on business logic only, leading to a faster application development process. Each plugin is responsible for specific functionality, including:

  • - Authentication/authorization: to implement security mechanisms such as Basic Authentication, LDAP, Mutual TLS (mTLS), API Key, OPA (Open Policy Agent) based access control policies, etc.
  • - Integration with Kafka-based Data/Event Streaming infrastructures.
  • - Log processing: to externalize all requests processed by the Gateway to third-party infrastructures.
  • - Analytics and monitoring: to provide metrics to external systems, including OpenTelemetry-based systems and Prometheus.
  • - Traffic control: to implement canary releases, mocking endpoints, routing policies based on request headers, etc.
  • - Transformations: to transform requests before routing them to the upstreams and to transform responses before returning to the consumers.
  • - For IoT projects, where MQTT over WebSockets connections are extensively used, Kong provides WebSockets Size Limit and WebSockets Validator plugins to control the events sent by the devices.

Also, Kong Gateway provides plugins that implement several integration points with Redis:

  • - [Proxy Caching Advanced](https://docs.konghq.com/hub/kong-inc/proxy-cache-advanced/)Proxy Caching Advanced and [GraphQL Proxy Caching Advanced](https://docs.konghq.com/hub/kong-inc/graphql-proxy-cache-advanced/)GraphQL Proxy Caching Advanced - cache and serve requested responses.
  • - [Rate Limiting](https://docs.konghq.com/hub/kong-inc/rate-limiting/)Rate Limiting, [Rate Limiting Advanced](https://docs.konghq.com/hub/kong-inc/rate-limiting-advanced/)Rate Limiting Advanced, [Service Protection](https://docs.konghq.com/hub/kong-inc/service-protection/)Service Protection and [GraphQL Rate Limiting Advanced](https://docs.konghq.com/hub/kong-inc/graphql-rate-limiting-advanced/)GraphQL Rate Limiting Advanced - rate limit how many HTTP requests can be made in a given time frame.
  • - [OpenID Connect](https://docs.konghq.com/hub/kong-inc/openid-connect/)OpenID Connect and [Upstream OAuth](https://docs.konghq.com/hub/kong-inc/upstream-oauth/)Upstream OAuth - for session storage and token caching.
  • - [ACME](https://docs.konghq.com/hub/kong-inc/acme/)ACME - stores digital certificates.

### Kong AI Gateway plugins

On the other hand, Kong AI Gateway leverages the existing Kong API Gateway extensibility model to provide specific AI-based plugins, more precisely to protect LLM infrastructures:

  • - [AI Proxy](https://docs.konghq.com/hub/kong-inc/ai-proxy/)AI Proxy and [AI Proxy Advanced](https://docs.konghq.com/hub/kong-inc/ai-proxy-advanced/)AI Proxy Advanced plugins: the Multi-LLM capability allows the AI Gateway to abstract and load balancing multiple LLM models based on policies including latency time, model usage, semantics etc.
  • -

    Prompt Engineering:

    • - [AI Prompt Template](https://docs.konghq.com/hub/kong-inc/ai-prompt-template/)AI Prompt Template plugin, responsible for pre-configuring AI prompts to users
    • - [AI Prompt Decorator](https://docs.konghq.com/hub/kong-inc/ai-prompt-decorator/)AI Prompt Decorator plugin, which injects messages at the start or end of a caller's chat history.
    • - [AI Prompt Guard](https://docs.konghq.com/hub/kong-inc/ai-prompt-guard/)AI Prompt Guard plugin lets you configure a series of PCRE-compatible regular expressions to allow and block specific prompts, words, phrases, or otherwise and have more control over an LLM service.
  • - [AI Semantic Prompt Guard](https://docs.konghq.com/hub/kong-inc/ai-semantic-prompt-guard/)AI Semantic Prompt Guard plugin to self-configurable semantic (or pattern-matching) prompt protection.
  • - [AI Semantic Cache](https://docs.konghq.com/hub/kong-inc/ai-semantic-cache/)AI Semantic Cache plugin caches responses based on threshold, to improve performance (and therefore end-user experience) and cost.
  • - [AI Rate Limiting Advanced](https://docs.konghq.com/hub/kong-inc/ai-rate-limiting-advanced/)AI Rate Limiting Advanced, you can tailor per-user or per-model policies based on the tokens returned by the LLM provider or craft a custom function to count the tokens for requests.
  • - [AI Request Transformer](https://docs.konghq.com/hub/kong-inc/ai-request-transformer/)AI Request Transformer and [AI Response Transformer](https://docs.konghq.com/hub/kong-inc/ai-response-transformer/)AI Response Transformer plugins seamlessly integrate with the LLM, enabling introspection and transformation of the request's body before proxying it to the Upstream Service and prior to forwarding the response to the client.

By leveraging the same underlying core of Kong Gateway, and combining both categories of plugins, we can implement powerful policies and reduce complexity in deploying the AI Gateway capabilities as well.

The first use case we are going to focus on is Semantic Cache where the AI Gateway plugin integrates with Redis to perform Similarity Search. Then, we are going to explore how the AI Proxy Advanced Plugin can take advantage of Redis to implement Semantic Routing across multiple LLMs models.

As a remark, AI Rate Limiting Advanced and AI Semantic Prompt Guard Plugins are two other examples where AI Gateway and Redis work together.

Before diving into the first use case, let's highlight and summarize the main concepts Kong AI Gateway and Redis rely on.

### Embeddings

Embeddings (aka Vectors or even Vector Embeddings) are a representation of unstructured data like text, images, etc. In a LLM context, the dimensionality of the Embeddings refers to the number of characteristics captured in the vector representation of a given sentence: the more dimensions an Embedding has, the better and more effective it is.

There are multiple ML-based embedding methods used in NLP like:

  • - [One-hot](https://en.wikipedia.org/wiki/One-hot)One-hot
  • - [word2vec](https://code.google.com/archive/p/word2vec/)word2vec
  • - TF-IDF (Term frequency-inverse document frequency)
  • - [GloVe](https://nlp.stanford.edu/projects/glove/)GloVe (Global Vectors for Word Representation)
  • - [BERT](https://github.com/google-research/bert)BERT (Bidirectional Encoder Representations from Transformers)

Here's an example of a Python script using [“Sentence Transformers”](https://sbert.net/)“Sentence Transformers” module (aka SBERT, or Sentence-BERT, maintained by [Hugging Face](https://huggingface.co/)Hugging Face), using the [“all-mpnet-base-v2”](https://huggingface.co/sentence-transformers/all-mpnet-base-v2)“all-mpnet-base-v2” Embedding Model to encode a simple sentence into an embedding:

from sentence_transformers import SentenceTransformer
from sentence_transformers.util import truncate_embeddings

model = SentenceTransformer('all-mpnet-base-v2', cache_folder="./")
embeddings = model.encode("Who is Joseph Conrad?")
embeddings = truncate_embeddings(embeddings, 3)
print(embeddings.size)
print(embeddings)

The “all-mpnet-base-v2” Embedding Model encodes sentences to a 768-dimensional Vector. As an experiment, we have truncated the vector to 3 dimensions only.

The output should be like:

3
[ 0.06030013 -0.00782523  0.01018228]

### Vector Database

A Vector Database stores and searches Vector Embeddings. They are essential for AI-based applications supporting images, texts, etc., providing Vector Stores, Vector Indexes, and more importantly, algorithms to implement Vector Similarity Searches.

Redis provides nice introductions about [Vector Embeddings](https://redis.io/glossary/vector-embeddings/)Vector Embeddings and [Vector Databases](https://redis.io/blog/vector-databases-101/)Vector Databases. Redis Query Engine is an in-built capability within Redis that provides vector search functionality (as well as other types of search such as full-text, numeric etc.) and is known for [industry-leading performance](https://redis.io/blog/benchmarking-results-for-vector-databases/)industry-leading performance. Redis is built for Speed and delivers unmatched performance with sub-millisecond latency, leveraging in-memory data structures and advanced optimizations to power real-time applications at scale. This is critical for gateway use-cases where deployment happens in the “hot-path” of LLM queries. 

In addition, Redis can be deployed as as enterprise software and/or as a cloud service, thereby adding several enterprise capabilities including:

  • - **Scalability**: Redis can easily scale horizontally and effortlessly handle dynamic workloads and ability to manage massive datasets across distributed architectures.
  • - **High availability and persistence**: Redis supports high availability with built-in support for multi-AZ deployments, seamless failover, data persistence through backups and active-active architecture, enabling robust disaster recovery and consistent application performance.
  • - **Flexibility**: Redis natively supports multiple data structures such as JSON, Hash, Strings,  Streams, and more to suit diverse application needs.
  • - **Broad ecosystem**: As one of the world’s most popular databases, Redis has a rich ecosystem of client libraries ([redisvl](https://www.redisvl.com/index.html)redisvl for GenAI use-cases), [developer tools](https://redis.io/docs/latest/develop/tools/)developer tools and integrations.  

### Vector Similarity Search

With similarity search we can find, in a typically unstructured dataset, items similar (or dissimilar) to a certain presented item. For example, given a picture of a cell phone, try to find similar ones considering its shape, color, etc. Or, given two pictures, check the similarity score between them.

In our NLP context, we are interested in similar responses returned by the LLM when applications send prompts to it. For example, these two following sentences “Who is Joseph Conrad?” and “Tell me more about Joseph Conrad”, semantically speaking, should have a high similarity score.

We can extend our Python script to try that out:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer('all-mpnet-base-v2', cache_folder="./")

sentences = [
    "Who is Joseph Conrad?",
    "Tell me more about Joseph Conrad.",
    "Living is easy with eyes closed.",
]

embeddings = model.encode(sentences)
print(embeddings.shape)

similarities = model.similarity(embeddings, embeddings)
print(similarities)

The output should be like this. The embeddings are 

(3, 768)
tensor([[1.0000, 0.8600, 0.0628],
        [0.8600, 1.0000, 0.1377],
        [0.0628, 0.1377, 1.0000]])

The “shape” is composed of 3 embeddings of 768 dimensions each. The code asks to cross-check the similarity of all embeddings. The more similar they are, the higher the score. Notice that the “1.0000” score is returned, as expected, when self-checking a given embedding.

The [“similarity”](https://sbert.net/docs/sentence_transformer/usage/semantic_textual_similarity.html)“similarity” method returns a [“Tensor”](https://pytorch.org/docs/stable/tensors.html#torch.Tensor)“Tensor” object, which is implemented by [PyTorch](https://pytorch.org/)PyTorch, the ML library used by Sentence Transformer.

There are several techniques for similarity calculation including distance or angle between the vectors. The most common methods are:

  • - [Euclidean Distance](https://redis.io/learn/howtos/solutions/vector/getting-started-vector#euclidean-distance-l2-norm)Euclidean Distance: based on linear distance between two points
  • - [Cosine Similarity](https://redis.io/learn/howtos/solutions/vector/getting-started-vector#cosine-similarity)Cosine Similarity: used by default by the “similarity” method: based on the angle between two vectors.
  • - [Dot Product](https://redis.io/learn/howtos/solutions/vector/getting-started-vector#inner-product)Dot Product: based on the product of vectors.

In a Vector Database context, [Vector Similarity Search (VSS)](https://redis.io/learn/howtos/solutions/vector/getting-started-vector#what-is-vector-similarity)Vector Similarity Search (VSS) is the process of finding vectors in the vector database that are similar to a given query vector.

## RediSearch and Redis VSS

Back in 2022, Redis launched [RediSearch](https://redis.io/docs/latest/develop/interact/search-and-query/administration/overview/)RediSearch [2.4](https://redis.io/docs/latest/operate/oss_and_stack/stack-with-enterprise/release-notes/redisearch/redisearch-2.4-release-notes/)2.4, a text search engine built on top of Redis data store, with RedisVSS (Vector Similarity Search).

To get a better understanding of how RedisVSS works, consider this Python script implementing a basic similarity search. Make sure you have set the “OPENAI_API_TYPE” environment variable as “openai" before running the script.

import redis
from redis.commands.search.field import TextField, VectorField
from redis.commands.search.indexDefinition import IndexDefinition, IndexType
from redis.commands.search.query import Query
import numpy as np
import openai
import os

### Get environment variables
openai.api_key = os.getenv("OPENAI_API_KEY")
host = os.getenv("REDIS_LB")

### Create a Redis Index for the Vector Embeddings
client = redis.Redis(host=host, port=6379)

try:
    client.ft('index1').dropindex(delete_documents=True)
except:
    print("index does not exist")


schema = (
    TextField("name"),
    TextField("description"),
    VectorField(
        "vector",
        "FLAT",
        {
            "TYPE": "FLOAT32",
            "DIM": 1536,
            "DISTANCE_METRIC": "COSINE",
        }
    ),
)

definition = IndexDefinition(prefix=["vectors:"], index_type=IndexType.HASH)
res = client.ft("index1").create_index(fields=schema, definition=definition)


### Step 1: call OpenAI to generate Embeddings for the reference text and stores it in Redis

name = "vector1"
content = "Who is Joseph Conrad?"
redis_key = f"vectors:{name}"

res = openai.embeddings.create(input=content, model="text-embedding-3-small").data[0].embedding

embeddings = np.array(res, dtype=np.float32).tobytes()

pipe = client.pipeline()
pipe.hset(redis_key, mapping = {
  "name": name,
  "description": content,
  "vector": embeddings
})
res = pipe.execute()


### Step 2: perform Vector Range queries with 2 new texts and get the distance (similarity) score

query = (
    Query("@vector:[VECTOR_RANGE $radius $vec]=>{$yield_distance_as: distance_score}")
     .return_fields("id", "distance_score")
     .dialect(2)
)

# Text #1
content = "Tell me more about Joseph Conrad"
res = openai.embeddings.create(input=content, model="text-embedding-3-small").data[0].embedding
new_embeddings = np.array(res, dtype=np.float32).tobytes()

query_params = {
    "radius": 1,
    "vec": new_embeddings
}
res = client.ft("index1").search(query, query_params).docs
print(res)

# Text #2
content = "Living is easy with eyes closed"
res = openai.embeddings.create(input=content, model="text-embedding-3-small").data[0].embedding
new_embeddings = np.array(res, dtype=np.float32).tobytes()

query_params = {
    "radius": 1,
    "vec": new_embeddings
}
res = client.ft("index1").search(query, query_params).docs
print(res)

Initially, the script creates an index to receive the embeddings returned by OpenAI. We are using the “text-embedding-3-small” OpenAI model, which has 1536 dimensions, so the index has a VectorField defined to support those dimensions.

Next, the script has two steps:

  • - Stores the embeddings of a reference text, generated by OpenAI Embedding Model.
  • - Performs Vector Range queries passing two new texts to check their similarity with the original one.

Here's a diagram representing the steps:

The code assumes you have a Redis environment available. Please check the [Redis Products documentation](https://redis.io/docs/latest/operate/)Redis Products documentation to learn more about it. It also assumes you have two environment variables defined: OpenAI API Key and Load Balancer address where Redis is available.

The script was coded using two main libraries:

  • - “Python client for Redis” ([redis-py](https://redis.io/docs/latest/develop/clients/redis-py/)redis-py) library to make Redis calls.
  • - [“OpenAI Python API”](https://github.com/openai/openai-python)“OpenAI Python API” to interact with OpenAI.

While executing the code, you can monitor Redis with, for example, [`redis-cli`](https://redis.io/docs/latest/operate/rs/references/cli-utilities/redis-cli/)`redis-cli`` monitor`. The code line `res = client.ft("index1").search(query, query_params).docs` should log a message like this one:

"FT.SEARCH" "index1" "@vector:[VECTOR_RANGE $radius $vec]=>{$YIELD_DISTANCE_AS: score}" "RETURN" "2" "id" "score" "DIALECT" "2" "LIMIT" "0" "10" "params" "4" "radius" "1" "vec" "\xcb9\x9c<\xf8T\x18=\xaa\xd4\xb5\xbcB\xc0.=\xb5………."

Let's examine the command. Implicitly, the [`.ft("index1")`](https://redis-py.readthedocs.io/en/stable/commands.html#redis.commands.cluster.RedisClusterCommands.ft)`.ft("index1")` method call gives us support to [Redis Search Commands](https://redis-py.readthedocs.io/en/stable/redismodules.html#redisearch-commands)Redis Search Commands, as the [`.search(query, query_params)`](https://redis-py.readthedocs.io/en/stable/redismodules.html#redis.commands.search.commands.SearchCommands.search)`.search(query, query_params)` call sends an actual search query using the `FT.SEARCH` Redis command. 

The `FT.SEARCH` command receives the parameters defined in both `query` and `query_params` objects. The `query` parameter, defined using the [Query](https://redis.io/docs/latest/develop/interact/search-and-query/query/)Query object, specifies the actual command as well as the return fields and dialect.

query = (
    Query("@vector:[VECTOR_RANGE $radius $vec]=>{$yield_distance_as: distance_score}")
     .return_fields("id", "distance_score")
     .dialect(2)
)

We want to return the distance (similarity) score, so we must yield it via the `$yield_distance_as` attribute.

Query Dialects enable enhancements to the query API, allowing the introduction of new features while maintaining compatibility with existing applications. For Vector Queries like ours, Query Dialect should be set with a value equal or greater than 2. Please, check the specific [Query Dialect documentation](https://redis.io/docs/latest/develop/interact/search-and-query/advanced-concepts/dialects/)Query Dialect documentation page to learn more about it.

On the other hand, the `query_params` object defines extra parameters, including the radius and the embeddings it should be considered for the search.

query_params = {
    "radius": 1,
    "vec": new_embeddings
}

The final `FT.SEARCH` also includes parameters to define offset and number of results. Check the [documentation](https://redis.io/docs/latest/commands/ft.search/)documentation to learn more about it.

In fact, the `FT.SEARCH` command sent by the script is just an example of [Vector Search](https://redis.io/docs/latest/develop/interact/search-and-query/query/vector-search/)Vector Search supported by Redis. Basically, Redis supports two main types of searches:

  • - [KNN Vector Search](https://redis.io/docs/latest/develop/interact/search-and-query/advanced-concepts/vectors/#knn-vector-search)KNN Vector Search: this [algorithm](https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm)algorithm finds the top “k-nearest neighbors” to a query vector.
  • - [Vector Range Query](https://redis.io/docs/latest/develop/interact/search-and-query/advanced-concepts/vectors/#vector-range-queries)Vector Range Query: the query type the script uses filters the index based on a radius parameter. Radius defines the semantic distance between the two vectors: the input query vector and indexed vector.

Our script's intent is to examine the distance between the two vectors and not implement any filter. That’s the reason why it has set the Vector Range Query with `"radius": 1`.

After running the script, its output should be like:

[Document {'id': 'vectors:vector1', 'payload': None, 'distance_score': '0.123970687389'}]
[Document {'id': 'vectors:vector1', 'payload': None, 'distance_score': '0.903066933155'}]

That means that, as expected, the stored embedding, related to the reference text "Who is Joseph Conrad?", is closer to the first new text, “Tell me more about Joseph Conrad”, than to the second one “Living is easy with eyes closed”.

Now that we have an introductory perspective of how we can implement Vector Similarity Searches with Redis, let’s examine the Kong AI Gateway Semantic Cache Plugin which is responsible for implementing semantic caching. We'll see it performs similar searches to what we have done with the Python script.

## Kong AI Semantic Cache Plugin

To get started, logically speaking, we can analyze the Caching flow from two different perspectives:

  • - Request #1: We don't have any data cached.
  • - Request #2: Kong AI Gateway has stored some data in the Redis Vector Database.

Here's a diagram illustrating the scenarios:

### Konnect Data Plane deployment

Before exploring how the Kong AI Gateway Semantic Cache Plugin and Redis work together we have to deploy a Konnect Data Plane (based on Kong Gateway). Please, refer to the [Konnect documentation](https://docs.konghq.com/konnect/)Konnect documentation to register and spin your first Data Plane up.

### Kong Gateway Objects creation

Next, we need to create Kong Gateway objects (Kong Gateway Services, Kong Routes and Kong Plugins) to implement the use case. There are several ways to do that, including Konnect RESTfull APIs, Konnect GUI, etc. With [decK](https://docs.konghq.com/deck/latest/)decK (declarations for Kong), we can manage Kong Konnect configuration and create Kong Objects in a declarative way. Please check the [decK documentation](https://docs.konghq.com/deck/latest/guides/getting-started/)decK documentation to learn how to [use it with Konnect](https://docs.konghq.com/deck/latest/guides/konnect/)use it with Konnect.

### AI Proxy and AI Semantic Cache Plugins

Here's the decK declaration we are going to submit to Konnect to implement the Semantic Cache use case:

_format_version: "3.0"
_info:
  select_tags:
  - semantic-cache
_konnect:
  control_plane_name: default
services:
- name: service1
  host: localhost
  port: 32000
  routes:
  - name: route1
    paths:
    - /openai-route
    plugins:
    - name: ai-proxy
      instance_name: ai-proxy-openai-route
      enabled: true
      config:
        auth:
          header_name: Authorization
          header_value: Bearer <your_OPENAI_APIKEY>
        route_type: llm/v1/chat
        model:
          provider: openai
          name: gpt-4
          options:
            max_tokens: 512
            temperature: 1.0
    - name: ai-semantic-cache
      instance_name: ai-semantic-cache-openai
      enabled: true
      config:
        embeddings:
          auth:
            header_name: Authorization
            header_value: Bearer <your_OPENAI_APIKEY>
          model:
            provider: openai
            name: text-embedding-3-small
            options:
              upstream_url: https://api.openai.com/v1/embeddings
        vectordb:
          dimensions: 1536
          distance_metric: cosine
          strategy: redis
          threshold: 0.2
          redis:
            host: redis-stack.redis.svc.cluster.local
            port: 6379

The declaration creates the following Kong Objects in the “default” Konnect Control Plane::

  • - Kong Gateway Service “service1”. It's a fake service. In fact, the Kong AI Proxy plugin created next will determine the actual upstream destination.
  • - Kong Route “route1” with the path “/openai-route”. That's the route which exposes the LLM model.
  • - AI Proxy Plugin. It's configured to consume OpenAI's “gpt-4” model. The “route_type” parameter, set as “llm/v1/chat”, refers to OpenAI's “https://api.openai.com/v1/chat/completions” endpoint. Kong recommends storing the API Keys as secrets in a Secret Manager like AWS Secrets Manager or HashiCorp Vault. The current configuration, including the OpenAI API Key in the declaration, is for lab environments only, not recommended for production. Please refer to the official [AI Proxy Plugin documentation page](https://docs.konghq.com/hub/kong-inc/ai-proxy/)AI Proxy Plugin documentation page to learn more about its configuration.
  • -

    AI Semantic Cache Plugin. It has two settings. Please check the

    • - `embeddings`: consumed the “text-embedding-3-small” Embedding Model, the same model we used in our Python script.
    • - `vectordb`: refers to an existing Redis infrastructure to store the embeddings and process the VSS requests. Again, it's configured with the same dimensions and distance metric as we used before. The threshold is going to be translated as the “radius” parameter in the VSS queries sent by the Kong Gateway Data Plane.

After submitting the decK declaration to Konnect, you should see the new Objects using the Konnect UI:

### Request #1

With the new Kong Objects in place, the Kong Data Plane is refreshed with them, and we are ready to start sending requests to it. Here's the first one with the same content we used in the Python script:

curl -i -X POST \
  --url $DATA_PLANE_LB/openai-route \
  --header 'Content-Type: application/json' \
   --data '{
   "messages": [
     {
       "role": "user",
       "content": "Who is Joseph Conrad?"
     }
   ]
 }'

You should get a response like this, meaning the Gateway successfully routed the request to OpenAI which returned an actual message to us. From Semantic Caching and Similarity perspective, the most important headers are:

  • - `X-Cache-Status: Miss`, telling us the Gateway wasn't able to find any data in the cache to satisfy the request.
  • - `X-Kong-Upstream-Latency` and `X-Kong-Proxy-Latency`, showing the latency times.
HTTP/1.1 200 OK
Content-Type: application/json
Connection: keep-alive
X-Cache-Status: Miss
x-ratelimit-limit-requests: 10000
CF-RAY: 8fce86cde915eae2-ORD
x-ratelimit-limit-tokens: 10000
x-ratelimit-remaining-requests: 9999
x-ratelimit-remaining-tokens: 9481
x-ratelimit-reset-requests: 8.64s
x-ratelimit-reset-tokens: 3.114s
access-control-expose-headers: X-Request-ID
x-request-id: req_29afd8838136a2f7793d6c129430b341
X-Content-Type-Options: nosniff
openai-organization: user-4qzstwunaw6d1dhwnga5bc5q
Date: Sat, 04 Jan 2025 22:05:00 GMT
alt-svc: h3=":443"; ma=86400
openai-processing-ms: 10002
openai-version: 2020-10-01
CF-Cache-Status: DYNAMIC
strict-transport-security: max-age=31536000; includeSubDomains; preload
Server: cloudflare
Content-Length: 1456
X-Kong-LLM-Model: openai/gpt-4
X-Kong-Upstream-Latency: 10097
X-Kong-Proxy-Latency: 471
Via: 1.1 kong/3.9.0.0-enterprise-edition
X-Kong-Request-Id: 36f6b41df3b74f78f586ae327af27075

{
  "id": "chatcmpl-Am6YEtvUquPHdHdcI59eZC3UfOUVz",
  "object": "chat.completion",
  "created": 1736028290,
  "model": "gpt-4-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Joseph Conrad was a Polish-British writer regarded as one of the greatest novelists to write in the English language. He was born on December 3, 1857, and died on August 3, 1924. Though he did not speak English fluently until his twenties, he was a master prose stylist who brought a non-English sensibility into English literature.\n\nConrad wrote stories and novels, many with a nautical setting, that depict trials of the human spirit in the midst of what he saw as an impassive, inscrutable universe. His notable works include \"Heart of Darkness\", \"Lord Jim\", and \"Nostromo\". Conrad's writing often presents a deep, pessimistic view of the world and deals with the theme of the clash of cultures and moral ambiguity.",
        "refusal": null
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 163,
    "total_tokens": 175,
    "prompt_tokens_details": {
      "cached_tokens": 0,
      "audio_tokens": 0
    },
    "completion_tokens_details": {
      "reasoning_tokens": 0,
      "audio_tokens": 0,
      "accepted_prediction_tokens": 0,
      "rejected_prediction_tokens": 0
    }
  },
  "system_fingerprint": null
}

#### Redis Introspection

Kong Gateway creates a new index. You can check it with `redis-cli ft._list`. The index should be named like: `idx:vss_kong_semantic_cache:511efd84-117b-4c89-87cb-f92f9b74a6c0:openai-gpt-4`

And `redis-cli ft.search idx:vss_kong_semantic_cache:511efd84-117b-4c89-87cb-f92f9b74a6c0:openai-gpt-4 "*" return 1 -` should return the ID of OpenAI's response. Something like:

1
kong_semantic_cache:511efd84-117b-4c89-87cb-f92f9b74a6c0:openai-gpt-4:fcdf7d8995a227392f839b4530f8d8c3055748b96275fa9558523619172fd2a8

The following `json.get` command should return the actual response received from OpenAI

redis-cli json.get kong_semantic_cache:511efd84-117b-4c89-87cb-f92f9b74a6c0:openai-gpt-4:fcdf7d8995a227392f839b4530f8d8c3055748b96275fa9558523619172fd2a8 | jq '.payload.choices[].message.content'

More importantly, `redis-cli monitor` tells us all the commands the plugin sent to Redis to implement the cache. The main ones are:

1. `"FT.INFO"` to check if the index exists.

2. There's no index, so create it. The actual command is the following. Notice the index is similar to the one we used in the Python script.

"FT.CREATE" "idx:vss_kong_semantic_cache:511efd84-117b-4c89-87cb-f92f9b74a6c0:openai-gpt-4" "ON" "JSON" "PREFIX" "1" "kong_semantic_cache:511efd84-117b-4c89-87cb-f92f9b74a6c0:openai-gpt-4:" "SCORE" "1.0" "SCHEMA" "$.vector" "AS" "vector" "VECTOR" "FLAT" "6" "TYPE" "FLOAT32" "DIM" "1536" "DISTANCE_METRIC" "COSINE"

3. Run a VSS to check if there's a key which satisfies the request. The command looks like this. Again, notice the command is the same we used in our Python script. The “range” parameter reflects the “threshold” configuration used in the decK declaration.

"FT.SEARCH""idx:vss_kong_semantic_cache:511efd84-117b-4c89-87cb-f92f9b74a6c0:openai-gpt-4" "@vector:[VECTOR_RANGE $range $query_vector]=>{$YIELD_DISTANCE_AS: vector_score}" "SORTBY" "vector_score" "DIALECT" "2" "LIMIT" "0" "4" "PARAMS" "4" "query_vector" "\x981\x87<bE\xe4<b\xa3\..........\xbc" "range" "0.2"

4. Since the index has just been created, the VSS couldn't find any key so the plugin sends a `"JSON.SET"` command to add a new index key with the embeddings received from OpenAI.

5. `"expire"` command to set the expiration time of the key as, by default, 300 seconds.

You can check the new index key using the Redis dashboard:

### Request #2

If we send another request with similar content, the Gateway should return the same response, since it's going to take from the Cache, as noticed in the `X-Cache-Status: Hit` header. Besides, the response has specific header related to the cache: `X-Cache-Key` and `X-Cache-Ttl`.

The response should be returned faster, since the Gateway didn't have to route the request to OpenAI.

curl -i -X POST \
  --url $DATA_PLANE_LB/openai-route \
  --header 'Content-Type: application/json' \
   --data '{
   "messages": [
     {
       "role": "user",
       "content": "Tell me more about Joseph Conrad"
     }
   ]
 }'
HTTP/1.1 200 OK
Date: Sun, 05 Jan 2025 14:28:59 GMT
Content-Type: application/json; charset=utf-8
Connection: keep-alive
X-Cache-Status: Hit
Age: 0
X-Cache-Key: kong_semantic_cache:511efd84-117b-4c89-87cb-f92f9b74a6c0:openai-gpt-4:fcdf7d8995a227392f839b4530f8d8c3055748b96275fa9558523619172fd2a8
X-Cache-Ttl: 288
Content-Length: 1020
X-Kong-Response-Latency: 221
Server: kong/3.9.0.0-enterprise-edition
X-Kong-Request-Id: eef1373a3a688a68f088a52f72318315

{"object":"chat.completion","system_fingerprint":null,"id":"fcdf7d8995a22…….

If you send another request with non-similar content, the plugin creates a new index key. For example:

curl -i -X POST \
  --url $DATA_PLANE_LB/openai-route \
  --header 'Content-Type: application/json' \
   --data '{
   "messages": [
     {
       "role": "user",
       "content": "Living is easy with eyes closed"
     }
   ]
 }'

Check the index keys again with:

redis-cli --scan
"kong_semantic_cache:511efd84-117b-4c89-87cb-f92f9b74a6c0:openai-gpt-4:fcdf7d8995a227392f839b4530f8d8c3055748b96275fa9558523619172fd2a8"
"kong_semantic_cache:511efd84-117b-4c89-87cb-f92f9b74a6c0:openai-gpt-4:22fbee1a1a45147167f29cc53183d0d2eef618c973e4284ad0179970209cf131"

## Kong AI Proxy Advanced Plugin and Semantic Routing

Kong AI Gateway provides several semantic based capabilities besides caching. A powerful one is Semantic Routing. With such a feature, we can let the Gateway decide the best model to handle a given request. For example, you might have models trained in specific topics, like Mathematics or Classical Music, so it'd be interesting to route the requests depending on the presented content. By analyzing the content of the request, the plugin can match it to the most appropriate model that is known to perform better in similar contexts. This feature enhances the flexibility and efficiency of model selection, especially when dealing with a diverse range of AI providers and models.

In fact Semantic Routing is one of the load balancing algorithms supported by the [AI Proxy Advanced Plugin](https://docs.konghq.com/hub/kong-inc/ai-proxy-advanced/)AI Proxy Advanced Plugin. The other supported algorithms are:

  • - Round-robin
  • - Weight-based
  • - Lowest-usage
  • - Lowest-latency
  • - [Consistent-hashing](https://docs.konghq.com/gateway/latest/how-kong-works/load-balancing/#consistent-hashing)Consistent-hashing (sticky-session on given header value)

For the purpose of this blog post we are going to explore the Semantic Routing algorithm.

The diagram below shows how the AI Proxy Advanced Plugin works:

  • - At the configuration time, the plugin sends requests to an Embeddings Model based on descriptions defined. The embeddings returned are stored in Redis Vector Database.
  • - During request processing time, the plugin gets the request content and sends a VSS query to Redis Vector Database. Depending on the similarity score, the plugin routes the request to the best targeted LLM model sitting behind the Gateway.

Here's the new decK declaration:

_format_version: "3.0"
_info:
  select_tags:
  - semantic-routing
_konnect:
  control_plane_name: default
services:
- name: service1
  host: localhost
  port: 32000
  routes:
  - name: route1
    paths:
    - /openai-route
    plugins:
    - name: ai-proxy-advanced
      instance_name: ai-proxy-openai-route
      enabled: true
      config:
        balancer:
          algorithm: semantic
        embeddings:
          auth:
            header_name: Authorization
            header_value: Bearer <your_OPENAI_APIKEY>
          model:
            provider: openai
            name: text-embedding-3-small
            options:
              upstream_url: "https://api.openai.com/v1/embeddings"
        vectordb:
          dimensions: 1536
          distance_metric: cosine
          strategy: redis
          threshold: 0.8
          redis:
            host: redis-stack.redis.svc.cluster.local
            port: 6379
        targets:
        - model:
            provider: openai
            name: gpt-4
          route_type: "llm/v1/chat"
          auth:
            header_name: Authorization
            header_value: Bearer <your_OPENAI_APIKEY>
          description: "mathematics, algebra, calculus, trigonometry"
        - model:
            provider: openai
            name: gpt-4o-mini
          route_type: "llm/v1/chat"
          auth:
            header_name: Authorization
            header_value: Bearer <your_OPENAI_APIKEY>
          description: "piano, orchestra, liszt, classical music"

The main configuration sections are:

  • - `balancer` with `algorithm`: `semantic`, telling the load balancer will be based on Semantic Routing.
  • - `embeddings` with the necessary setting to reach out the Embedding Model. The same observations made previously regarding the API Key remains here.
  • - `vectordb` with the Redis host and Index configurations as well as the threshold to drive the VSS query.
  • - `targets`: each one of them represents a LLM model. Note the description parameter is used to configure the load balancing algorithm according to the topic the model has been trained for.

As you can see, for convenience's sake, the configuration uses OpenAI's model for embeddings and targets. Also, just for this exploration, we are also using the `gpt-4` and `gpt-4o-mini` OpenAI's models for the targets.

After submitting the decK declaration to Konnect Control Plane, the Redis Vector Database should have a new index defined and a key for each target created. We can then start sending requests to the Gateway. The first two requests have contents related to Classical Musical, so the response should come from the related model, `gpt-4o-mini-2024-07-18`.

% curl -s -X POST \
  --url $DATA_PLANE_LB/openai-route \
  --header 'Content-Type: application/json' \
  --data '{
    "messages": [ 
      { 
        "role": "user", 
        "content": "Who wrote the Hungarian Rhapsodies piano pieces?"
      } 
    ] 
  }' | jq '.model' 
"gpt-4o-mini-2024-07-18"

% curl -s -X POST \
  --url $DATA_PLANE_LB/openai-route \
  --header 'Content-Type: application/json' \
  --data '{
    "messages": [
      {
        "role": "user",
        "content": "Tell me a contemporary pianist of Chopin"
      }
    ]
  }' | jq '.model'
"gpt-4o-mini-2024-07-18"

Now, the next request is related to Mathematics, therefore the response comes from the other model, `gpt-4-0613`.

% curl -s -X POST \
  --url $DATA_PLANE_LB/openai-route \
  --header 'Content-Type: application/json' \
  --data '{
     "messages": [
       {
         "role": "user",
         "content": "Tell me about Fermat''s last theorem"
       }
     ]
   }' | jq '.model'
"gpt-4-0613"

## Conclusion

Kong has historically supported Redis to implement a variety of critical policies and use cases. The most recent collaborations, implemented by the Kong AI Gateway, focus on Semantic Processing where Redis Vector Similarity Search capabilities play an important role.

This blog post explored two main semantic-based use cases: Semantic Caching and Semantic Routing. Check Kong's and Redis’ documentation pages to learn more about the extensive list of API and AI Gateway use cases you can implement using both technologies.

- [AI](/blog/tag/ai)AI- [AI Gateway](/blog/tag/ai-gateway)AI Gateway- [Plugins](/blog/tag/plugins)Plugins

## More on this topic

_Videos_

## Context‑Aware LLM Traffic Management with RAG and AI Gateway

_Videos_

## Getting Started with Kong AI Gateway: Step-by-Step Workshop

## See Kong in action

Accelerate deployments, reduce vulnerabilities, and gain real-time visibility. 

[Get a Demo](/contact-sales)Get a Demo
**Topics**
- [AI](/blog/tag/ai)AI- [AI Gateway](/blog/tag/ai-gateway)AI Gateway- [Plugins](/blog/tag/plugins)Plugins
Claudio Acquaviva
Principal Architect, Kong

Recommended posts

# Insights from eBay: How API Ecosystems Are Ushering In the Agentic Era

[Engineering](/blog)EngineeringDecember 15, 2025

APIs have quietly powered the global shift to an interconnected economy. They’ve served as the data exchange highways behind the seamless experiences we now take for granted — booking a ride, paying a vendor, sending a message, syncing financial rec

Amit Dey
[](https://konghq.com/blog/engineering/api-ecosystems-for-the-agentic-era)

# Kong AI/MCP Gateway and Kong MCP Server Technical Breakdown

[Engineering](/blog)EngineeringDecember 11, 2025

In the latest Kong Gateway 3.12 release , announced October 2025, specific MCP capabilities have been released: AI MCP Proxy plugin: it works as a protocol bridge, translating between MCP and HTTP so that MCP-compatible clients can either call exi

Jason Matis
[](https://konghq.com/blog/engineering/ai-gateway-mcp-gateway-mcp-server-breakdown)

# LLM Cost Management: How to Implement AI Showback and Chargeback

[Enterprise](/blog)EnterpriseApril 6, 2026

Bring Financial Accountability to Enterprise LLM Usage with Konnect Metering and Billing Showback and chargeback are not the same thing. Most organizations conflate these two concepts, and that conflation delays action. Understanding the LLM showb

Alex Drag
[](https://konghq.com/blog/enterprise/llm-cost-management-ai-showback-and-chargeback)

# AI Voice Agents with Kong AI Gateway and Cerebras

[Engineering](/blog)EngineeringNovember 24, 2025

Kong Gateway is an API gateway and a core component of the Kong Konnect platform . Built on a plugin-based extensibility model, it centralizes essential functions such as proxying, routing, load balancing, and health checking, efficiently manag

Claudio Acquaviva
[](https://konghq.com/blog/engineering/ai-voice-agents-kong-ai-gateway-cerebras)

# AI Observability: Monitoring and Troubleshooting Your LLM Infrastructure

[Learning Center](/blog)Learning CenterFebruary 27, 2026

AI observability extends traditional monitoring by adding behavioral telemetry for quality, safety, and cost metrics alongside standard logs, metrics, and traces Time-to-First-Token (TTFT) and token usage metrics are critical performance indicator

Kong
[](https://konghq.com/blog/learning-center/guide-to-ai-observability)

# AI Guardrails: Ensure Safe, Responsible, Cost-Effective AI Integration

[Engineering](/blog)EngineeringAugust 25, 2025

Why AI guardrails matter It's natural to consider the necessity of guardrails for your sophisticated AI implementations. The truth is, much like any powerful technology, AI requires a set of protective measures to ensure its reliability and integrit

Jason Matis
[](https://konghq.com/blog/engineering/ai-guardrails)

# Securing Enterprise AI: OWASP Top 10 LLM Vulnerabilities Guide

[Engineering](/blog)EngineeringJuly 31, 2025

Introduction to OWASP Top 10 for LLM Applications 2025 The OWASP Top 10 for LLM Applications 2025 represents a significant evolution in AI security guidance, reflecting the rapid maturation of enterprise AI deployments over the past year. The key up

Michael Field
[](https://konghq.com/blog/engineering/owasp-top-10-ai-and-llm-guide)

## Ready to see Kong in action?

Get a personalized walkthrough of Kong's platform tailored to your architecture, use cases, and scale requirements.

[Get a Demo](/contact-sales)Get a Demo

## step-0

  • ## Company

    • [About Kong](/company/about-us)About Kong
    • [Customers](/customer-stories)Customers
    • [Careers](/company/careers)Careers
    • [Press](/company/press-room)Press
    • [Events](/events)Events
    • [Contact](/company/contact-us)Contact
    • [Pricing](/pricing)Pricing
      • Terms
      • Privacy
      • Trust and Compliance
  • ## Platform

    • [Kong AI Gateway](/products/kong-ai-gateway)Kong AI Gateway
    • [Kong Konnect](/products/kong-konnect)Kong Konnect
    • [Kong Gateway](/products/kong-gateway)Kong Gateway
    • [Kong Event Gateway](/products/event-gateway)Kong Event Gateway
    • [Kong Insomnia](/products/kong-insomnia)Kong Insomnia
    • [Documentation](https://developer.konghq.com)Documentation
    • [Book Demo](/contact-sales)Book Demo
  • ## Compare

    • [AI Gateway Alternatives](/performance-comparison/ai-gateway-alternatives)AI Gateway Alternatives
    • [Kong vs Apigee](/performance-comparison/kong-vs-apigee)Kong vs Apigee
    • [Kong vs IBM](/performance-comparison/ibm-api-connect-vs-kong)Kong vs IBM
    • [Kong vs Postman](/performance-comparison/kong-vs-postman)Kong vs Postman
    • [Kong vs Mulesoft](/performance-comparison/kong-vs-mulesoft)Kong vs Mulesoft
  • ## Explore More

    • [Open Banking API Solutions](/solutions/open-banking)Open Banking API Solutions
    • [API Governance Solutions](/solutions/api-governance)API Governance Solutions
    • [Istio API Gateway Integration](/solutions/istio-gateway)Istio API Gateway Integration
    • [Kubernetes API Management](/solutions/build-on-kubernetes)Kubernetes API Management
    • [API Gateway: Build vs Buy](/campaign/secure-api-scalability)API Gateway: Build vs Buy
    • [Kong vs Apigee](/performance-comparison/kong-vs-apigee)Kong vs Apigee
  • ## Open Source

    • [Kong Gateway](https://developer.konghq.com/gateway/install/)Kong Gateway
    • [Kuma](https://kuma.io/)Kuma
    • [Insomnia](https://insomnia.rest/)Insomnia
    • [Kong Community](/community)Kong Community

Kong enables the connectivity layer for the agentic era – securely connecting, governing, and monetizing APIs and AI tokens across any model or cloud.

  • English
  • Japanese
  • Frenchcoming soon
  • Spanishcoming soon
  • Germancoming soon
© Kong Inc. 2026
Interaction mode