AI Voice Agents with Kong AI Gateway and Cerebras

November 24, 2025

4 min read

Claudio Acquaviva

Principal Architect, Kong

The integration of next-generation AI workloads — from large language models (LLMs) to speech-to-text and text-to-speech — demands a powerful, secure, and scalable infrastructure. This is especially true for building advanced AI voice agents, which rely on seamless, natural, and highly efficient user interaction.

This blog post explores how the combined power of Kong AI Gateway and Cerebras AI infrastructure creates an unparalleled foundation for deploying and managing these AI agents at scale. We'll detail a reference architecture that leverages Kong AI Gateway for secure traffic control, policy enforcement, and observability, while utilizing Cerebras' high-performance compute and LLM optimization to orchestrate STT, LLM (like the Qwen-3-32B model), and TTS models in a cohesive, production-ready solution.

Kong Gateway is an API gateway and a core component of the Kong Konnect platform. Built on a plugin-based extensibility model, it centralizes essential functions such as proxying, routing, load balancing, and health checking, efficiently managing both microservices and traditional API traffic.

One of Kong Gateway’s greatest strengths lies in its extensible plugin ecosystem, which allows seamless integration of diverse policies and functionalities — including Authentication and Authorization, Rate Limiting, Proxy Caching, Request and Response Transformation, and Traffic Control.

Kong AI Gateway extends Konnect capabilities to the world of generative AI, including LLM models and providing a unified way to connect applications with several other infrastructures, including video, images, sound, etc.

Besides abstracting the complexity of interacting with diverse GenAI infrastructure through a single and standardized interface, it provides features such as Prompt Engineering, Semantic Processing, RAG, and MCP support. This makes it a key component for organizations building AI agents, enabling developers to experiment with models, optimize costs, and ensure compliance across all AI traffic.

Cerebras provides a cutting-edge computing platform for AI workloads. At the heart of its innovation is the Wafer-Scale Engine (WSE), a powerful processor designed to accelerate deep learning training and inference by orders of magnitude compared to traditional GPU or CPU clusters.

Beyond the hardware, Cerebras offers a complete AI supercomputing solution powered by the Cerebras Software Platform (CSoft), a platform that integrates seamlessly with existing AI frameworks, like PyTorch and TensorFlow. Cerebras also offers the Cerebras Cloud, enabling users to access its powerful AI compute infrastructure as a service, including multiple GenAI Models.

An AI voice agent offers a seamless, natural, and highly efficient way for users to interact with digital systems through conversation. Typically, AI voice agents rely on speech-to-text (STT) models to convert spoken language into text while text-to-speech (TTS) models transform the agent’s textual responses back into speech.

The integration of Cerebras LLM models, STT and TTS models with Kong AI Gateway enables orchestration of AI voice agents. Developers can route audio streams to STT models for transcription, pass the resulting text through a Cerebras language model for understanding or generation, and send the output to TTS for natural speech synthesis — all governed and monitored by Kong AI Gateway.

The following diagram depicts a reference architecture of the AI voice agent:

As you can see in the diagram, Kong AI Gateway abstracts all GenAI models, including Cerebras LLM as well as the STT and TTS models. Combined with the extensive list of AI-based capabilities like Prompt Decorator, Semantic Caching, etc. Kong AI Gateway provides an easy-to-use and monitor infrastructure ideal for AI agents development.

It's easier to see an AI voice agent consuming Kong AI Gateway, Cerebras LLM, and STT/TTS models than reading about it. The following video demonstrates a simple AI voice agent following the reference architecture:

The AI agent was written with LiveKit, which provides the infrastructure for capturing, transmitting, and managing bi-directional audio streams between users and the AI agent.

From the AI agent perspective, all GenAI models are abstracted by Kong AI Gateway, Each model is exposed to the AI agent through a specific Kong AI Gateway Route.

Here's the snippet of the AI voice agent defining the agent sSession and referring the Kong AI Gateway Routes:

The DATA_PLANE_URL variable is where the Kong AI Gateway Data Plane is located. The AI agent sends requests to the Gateway using the Routes defined for each model.

At the same time, here's the Kong AI Gateway declaration defining the Gateway Services exposed by the Kong Routes:

Again, for each GenAI model, there's a configuration describing how the AI Gateway can integrate with them. Note that the Gateway is taking care of the Cerebras’ API Key, providing a more secure environment for Agent development.

The STT and TTS models, referred to by the AI agent, are deployed in the Speaches.AI engine and exposed through the specific URLs. The LLM model, as expected, is totally managed by Cerebras Cloud.

Both Cerebras and Konnect provide observability capabilities. For example, here's a Cerebras screenshot with the Qwen-3-32B model consumption used by the AI Voice Agent.

Similarly, Konnect provides ready-to-use dashboards and explorer capabilities to monitor how the models are getting consumed:

The integration of Kong AI Gateway with Cerebras AI infrastructure creates a powerful foundation for deploying and managing next-generation AI workloads at scale. Kong AI Gateway provides a secure, high-performance entry point for managing APIs and AI model endpoints, ensuring efficient traffic control, observability, and policy enforcement across diverse environments. Combined with Cerebras’ large-scale compute capabilities and LLM optimization, this architecture enables seamless orchestration of advanced AI services such as speech-to-text (STT), text-to-speech (TTS), and language understanding models.

Contact sales@konghq.com and sales@cerebras.com if you have questions or need support.

Our next blog post will describe in detail how to configure both Kong AI Gateway and Cerebras. Register for both Kong Konnect and Cerebras to get a trial and start experimenting with both technologies.

Learn More Get a Demo

Topics:AI Gateway

LLM

Kong Konnect

Observability

AI Voice Agents with Kong AI Gateway and Cerebras

Kong Konnect and Cerebras

AI voice agents

Kong AI Gateway and Cerebras at work

Observability

Conclusion

Unleash the power of APIs with Kong Konnect