The integration of next-generation AI workloads — from large language models (LLMs) to speech-to-text and text-to-speech — demands a powerful, secure, and scalable infrastructure. This is especially true for building advanced AI voice agents, which rely on seamless, natural, and highly efficient user interaction.
This blog post explores how the combined power of Kong AI Gateway and Cerebras AI infrastructure creates an unparalleled foundation for deploying and managing these AI agents at scale. We'll detail a reference architecture that leverages Kong AI Gateway for secure traffic control, policy enforcement, and observability, while utilizing Cerebras' high-performance compute and LLM optimization to orchestrate STT, LLM (like the Qwen-3-32B model), and TTS models in a cohesive, production-ready solution.
Kong Konnect and Cerebras
Kong Gateway is an API gateway and a core component of the Kong Konnect platform. Built on a plugin-based extensibility model, it centralizes essential functions such as proxying, routing, load balancing, and health checking, efficiently managing both microservices and traditional API traffic.
One of Kong Gateway’s greatest strengths lies in its extensible plugin ecosystem, which allows seamless integration of diverse policies and functionalities — including Authentication and Authorization, Rate Limiting, Proxy Caching, Request and Response Transformation, and Traffic Control.
Kong AI Gateway extends Konnect capabilities to the world of generative AI, including LLM models and providing a unified way to connect applications with several other infrastructures, including video, images, sound, etc.
Besides abstracting the complexity of interacting with diverse GenAI infrastructure through a single and standardized interface, it provides features such as Prompt Engineering, Semantic Processing, RAG, and MCP support. This makes it a key component for organizations building AI agents, enabling developers to experiment with models, optimize costs, and ensure compliance across all AI traffic.
Cerebras provides a cutting-edge computing platform for AI workloads. At the heart of its innovation is the Wafer-Scale Engine (WSE), a powerful processor designed to accelerate deep learning training and inference by orders of magnitude compared to traditional GPU or CPU clusters.
Beyond the hardware, Cerebras offers a complete AI supercomputing solution powered by the Cerebras Software Platform (CSoft), a platform that integrates seamlessly with existing AI frameworks, like PyTorch and TensorFlow. Cerebras also offers the Cerebras Cloud, enabling users to access its powerful AI compute infrastructure as a service, including multiple GenAI Models.
AI voice agents
An AI voice agent offers a seamless, natural, and highly efficient way for users to interact with digital systems through conversation. Typically, AI voice agents rely on speech-to-text (STT) models to convert spoken language into text while text-to-speech (TTS) models transform the agent’s textual responses back into speech.
The integration of Cerebras LLM models, STT and TTS models with Kong AI Gateway enables orchestration of AI voice agents. Developers can route audio streams to STT models for transcription, pass the resulting text through a Cerebras language model for understanding or generation, and send the output to TTS for natural speech synthesis — all governed and monitored by Kong AI Gateway.
The following diagram depicts a reference architecture of the AI voice agent:
As you can see in the diagram, Kong AI Gateway abstracts all GenAI models, including Cerebras LLM as well as the STT and TTS models. Combined with the extensive list of AI-based capabilities like Prompt Decorator, Semantic Caching, etc. Kong AI Gateway provides an easy-to-use and monitor infrastructure ideal for AI agents development.
Kong AI Gateway and Cerebras at work
It's easier to see an AI voice agent consuming Kong AI Gateway, Cerebras LLM, and STT/TTS models than reading about it. The following video demonstrates a simple AI voice agent following the reference architecture:
The AI agent was written with LiveKit, which provides the infrastructure for capturing, transmitting, and managing bi-directional audio streams between users and the AI agent.
From the AI agent perspective, all GenAI models are abstracted by Kong AI Gateway, Each model is exposed to the AI agent through a specific Kong AI Gateway Route.
Here's the snippet of the AI voice agent defining the agent sSession and referring the Kong AI Gateway Routes:
The DATA_PLANE_URL variable is where the Kong AI Gateway Data Plane is located. The AI agent sends requests to the Gateway using the Routes defined for each model.
At the same time, here's the Kong AI Gateway declaration defining the Gateway Services exposed by the Kong Routes:
Again, for each GenAI model, there's a configuration describing how the AI Gateway can integrate with them. Note that the Gateway is taking care of the Cerebras’ API Key, providing a more secure environment for Agent development.
The STT and TTS models, referred to by the AI agent, are deployed in the Speaches.AI engine and exposed through the specific URLs. The LLM model, as expected, is totally managed by Cerebras Cloud.
Observability
Both Cerebras and Konnect provide observability capabilities. For example, here's a Cerebras screenshot with the Qwen-3-32B model consumption used by the AI Voice Agent.
Similarly, Konnect provides ready-to-use dashboards and explorer capabilities to monitor how the models are getting consumed:
Conclusion
The integration of Kong AI Gateway with Cerebras AI infrastructure creates a powerful foundation for deploying and managing next-generation AI workloads at scale. Kong AI Gateway provides a secure, high-performance entry point for managing APIs and AI model endpoints, ensuring efficient traffic control, observability, and policy enforcement across diverse environments. Combined with Cerebras’ large-scale compute capabilities and LLM optimization, this architecture enables seamless orchestration of advanced AI services such as speech-to-text (STT), text-to-speech (TTS), and language understanding models.
Contact sales@konghq.com and sales@cerebras.com if you have questions or need support.
Our next blog post will describe in detail how to configure both Kong AI Gateway and Cerebras. Register for both Kong Konnect and Cerebras to get a trial and start experimenting with both technologies.
Today, we're excited to announce the general availability of AI Manager in Kong Konnect, the platform to manage all of your API, AI, and event connectivity across all modern digital applications and AI agents. Kong already provides the fastest and m
Marco Palladino
AI Observability: Monitoring and Troubleshooting Your LLM Infrastructure
AI observability extends traditional monitoring by adding behavioral telemetry for quality, safety, and cost metrics alongside standard logs, metrics, and traces Time-to-First-Token (TTFT) and token usage metrics are critical performance indicator
Kong
From Chaos to Control: How Kong AI Gateway Streamlined My GenAI Application
🚧 The challenge: Scaling GenAI with governance While building a GenAI-powered agent for one of our company websites, I integrated components like LLM APIs, embedding models, and a RAG (Retrieval-Augmented Generation) pipeline. The application was d
Sachin Ghumbre
AI Guardrails: Ensure Safe, Responsible, Cost-Effective AI Integration
Why AI guardrails matter It's natural to consider the necessity of guardrails for your sophisticated AI implementations. The truth is, much like any powerful technology, AI requires a set of protective measures to ensure its reliability and integrit
Jason Matis
Securing Enterprise AI: OWASP Top 10 LLM Vulnerabilities Guide
Introduction to OWASP Top 10 for LLM Applications 2025 The OWASP Top 10 for LLM Applications 2025 represents a significant evolution in AI security guidance, reflecting the rapid maturation of enterprise AI deployments over the past year. The key up
Michael Field
Build Your Own Internal RAG Agent with Kong AI Gateway
What Is RAG, and Why Should You Use It?
RAG (Retrieval-Augmented Generation) is not a new concept in AI, and unsurprisingly, when talking to companies, everyone seems to have their own interpretation of how to implement it.
So, let’s start with a r
Antoine Jacquemin
Secure AI at Scale: Prisma AIRS and Kong AI Gateway Now Integrated
In today's digital landscape, APIs are the backbone of modern applications, and AI is the engine of innovation. As organizations increasingly rely on microservices and AI-powered features, the API gateway has become the critical control point for man
Tom Prenderville
Ready to see Kong in action?
Get a personalized walkthrough of Kong's platform tailored to your architecture, use cases, and scale requirements.