# AI Voice Agents with Kong AI Gateway and Cerebras
Claudio Acquaviva
Principal Architect, Kong
The integration of next-generation AI workloads — from large language models (LLMs) to speech-to-text and text-to-speech — demands a powerful, secure, and scalable infrastructure. This is especially true for building advanced AI voice agents, which rely on seamless, natural, and highly efficient user interaction.
This blog post explores how the combined power of Kong AI Gateway and Cerebras AI infrastructure creates an unparalleled foundation for deploying and managing these AI agents at scale. We'll detail a reference architecture that leverages Kong AI Gateway for secure traffic control, policy enforcement, and observability, while utilizing Cerebras' high-performance compute and LLM optimization to orchestrate STT, LLM (like the Qwen-3-32B model), and TTS models in a cohesive, production-ready solution.
One of Kong Gateway’s greatest strengths lies in its extensible plugin ecosystem, which allows seamless integration of diverse policies and functionalities — including Authentication and Authorization, Rate Limiting, Proxy Caching, Request and Response Transformation, and Traffic Control.
Kong AI Gateway extends Konnect capabilities to the world of generative AI, including LLM models and providing a unified way to connect applications with several other infrastructures, including video, images, sound, etc.
Besides abstracting the complexity of interacting with diverse GenAI infrastructure through a single and standardized interface, it provides features such as Prompt Engineering, Semantic Processing, RAG, and MCP support. This makes it a key component for organizations building AI agents, enabling developers to experiment with models, optimize costs, and ensure compliance across all AI traffic.
**Cerebras** provides a cutting-edge computing platform for AI workloads. At the heart of its innovation is the Wafer-Scale Engine (WSE), a powerful processor designed to accelerate deep learning training and inference by orders of magnitude compared to traditional GPU or CPU clusters.
An AI voice agent offers a seamless, natural, and highly efficient way for users to interact with digital systems through conversation. Typically, AI voice agents rely on speech-to-text (STT) models to convert spoken language into text while text-to-speech (TTS) models transform the agent’s textual responses back into speech.
The integration of Cerebras LLM models, STT and TTS models with Kong AI Gateway enables orchestration of AI voice agents. Developers can route audio streams to STT models for transcription, pass the resulting text through a Cerebras language model for understanding or generation, and send the output to TTS for natural speech synthesis — all governed and monitored by [Kong AI Gateway](https://konghq.com/products/kong-ai-gateway)Kong AI Gateway.
The following diagram depicts a reference architecture of the AI voice agent:
As you can see in the diagram, Kong AI Gateway abstracts all GenAI models, including Cerebras LLM as well as the STT and TTS models. Combined with the extensive list of AI-based capabilities like Prompt Decorator, Semantic Caching, etc. Kong AI Gateway provides an easy-to-use and monitor infrastructure ideal for AI agents development.
## Kong AI Gateway and Cerebras at work
It's easier to see an AI voice agent consuming Kong AI Gateway, Cerebras LLM, and STT/TTS models than reading about it. The following video demonstrates a simple AI voice agent following the reference architecture:
**This content contains a video which can not be displayed in Agent mode**
The AI agent was written with [LiveKit,](https://livekit.io/)LiveKit, which provides the infrastructure for capturing, transmitting, and managing bi-directional audio streams between users and the AI agent.
From the AI agent perspective, all GenAI models are abstracted by Kong AI Gateway, Each model is exposed to the AI agent through a specific Kong AI Gateway Route.
Here's the snippet of the AI voice agent defining the agent sSession and referring the Kong AI Gateway Routes:
The `DATA_PLANE_URL` variable is where the Kong AI Gateway Data Plane is located. The AI agent sends requests to the Gateway using the Routes defined for each model.
At the same time, here's the Kong AI Gateway declaration defining the Gateway Services exposed by the Kong Routes:
Again, for each GenAI model, there's a configuration describing how the AI Gateway can integrate with them. Note that the Gateway is taking care of the Cerebras’ API Key, providing a more secure environment for Agent development.
The STT and TTS models, referred to by the AI agent, are deployed in the [Speaches.AI](http://speaches.ai)Speaches.AI engine and exposed through the specific URLs. The LLM model, as expected, is totally managed by Cerebras Cloud.
## Observability
Both Cerebras and Konnect provide observability capabilities. For example, here's a Cerebras screenshot with the Qwen-3-32B model consumption used by the AI Voice Agent.
Similarly, Konnect provides ready-to-use dashboards and explorer capabilities to monitor how the models are getting consumed:
## Conclusion
The integration of Kong AI Gateway with Cerebras AI infrastructure creates a powerful foundation for deploying and managing next-generation AI workloads at scale. Kong AI Gateway provides a secure, high-performance entry point for managing APIs and AI model endpoints, ensuring efficient traffic control, observability, and policy enforcement across diverse environments. Combined with Cerebras’ large-scale compute capabilities and LLM optimization, this architecture enables seamless orchestration of advanced AI services such as speech-to-text (STT), text-to-speech (TTS), and language understanding models.
Contact sales@konghq.com and sales@cerebras.com if you have questions or need support.
Bring Financial Accountability to Enterprise LLM Usage with Konnect Metering and Billing
Showback and chargeback are not the same thing. Most organizations conflate these two concepts, and that conflation delays action. Understanding the LLM showb
Today, we're excited to announce the general availability of AI Manager in Kong Konnect, the platform to manage all of your API, AI, and event connectivity across all modern digital applications and AI agents. Kong already provides the fastest and m
Kong Agent Gateway Is Here — And It Completes the AI Data Path
Kong Agent Gateway is a new capability within Kong AI Gateway that extends our platform to more robustly cover agent-to-agent (A2A) communication. With this release, Kong AI Gateway n
The Shifting Economic Landscape: The AI token economy in 2026 is evolving, and enterprise leaders must distinguish between low-cost input tokens and high-premium output tokens to maintain profitability. Agentic AI Financial Risks: The transition t
AI observability extends traditional monitoring by adding behavioral telemetry for quality, safety, and cost metrics alongside standard logs, metrics, and traces Time-to-First-Token (TTFT) and token usage metrics are critical performance indicator
🚧 The challenge: Scaling GenAI with governance While building a GenAI-powered agent for one of our company websites, I integrated components like LLM APIs, embedding models, and a RAG (Retrieval-Augmented Generation) pipeline. The application was d
Why AI guardrails matter It's natural to consider the necessity of guardrails for your sophisticated AI implementations. The truth is, much like any powerful technology, AI requires a set of protective measures to ensure its reliability and integrit