Product Releases
September 11, 2024
4 min read

Introducing the Insomnia AI Runner: Accelerate and secure GenAI traffic to one or more LLMs

Marco Palladino
CTO and Co-Founder

Today with the release of Insomnia 10, we are quite stoked to also announce a brand new offering in Insomnia, the AI Runner, a managed SaaS service that provides developers with the ability to accelerate and secure LLM traffic for their applications. This capability is the first of a new class of developer infrastructure products that will complement Insomnia’s existing developer tooling capabilities for APIs.

The AI Runner enables developers to accelerate LLM traffic by up to 20x with semantic caching while also securing LLM traffic with out-of-the-box AI guardrails. You can also use the AI Runner to consume multiple LLMs with a single OpenAI-compatible interface. By doing so, you can build faster user experiences powered by AI that are more secure and easier to build, and it only takes a few seconds to use.

All Insomnia users can get started with the AI Runner for free.

Security and acceleration in one line of code

With the Insomnia AI Runner, you can create as many “AI Runners” as you need to accelerate and secure your LLM traffic. The Insomnia AI Runner sits in the execution path of your LLM traffic for GenAI, and it accelerates all LLM traffic with semantic caching while also securing your traffic with guardrails that you can apply in one click.

You can create as many AI Runners as you need - each one with their own configuration.

You can create as many “AI Runners” as you need, and each one will provision a URL that you can use in your applications by simply changing one line of code to point to the new URL.

Migrating to the AI Runner is extremely easy, simply point your line of code to it.

By doing so, it becomes extremely easy to migrate existing applications written in vanilla GenAI integrations, or via frameworks like LangChain and others.

Accelerate AI with semantic caching

The AI Runner is able to understand the intent and meaning of the prompts you are sending through it. If it finds two similar prompts, it will return a copy of the cached content instead of making an upstream request to the LLM you are consuming, even when similar prompts are using different words.

With semantic caching, the Insomnia AI Runner can accelerate all GenAI traffic significantly. In the chart above, the lower the value, the lower the latency.


To understand the nuances between two different prompts, the AI Runner gives you the ability to set a similarity threshold to determine if cached content should be returned or not. A stronger similarity threshold will result in more cache hits and higher performance, but it can also result in prompts with wide variances being interpreted as having the same meaning. On the other hand, a lower threshold will understand more nuances between the prompts, but it will return a lower hit ratio.

You can easily configure the AI Runner’s similarity threshold.


Additionally, you can configure the caching time to live (TTL) for each AI Runner, as well as store credentials for your LLM within the AI Runner itself. This makes it so that you don’t need to update your applications when you want to modify your credentials, as it will be applied on the fly by the AI Runner.

Secure AI with out-of-the-box guardrails

It is crucial to ensure that AI traffic follows specific guidelines for improving security, reducing mishandling of sensitive customer information, and returning better responses.

As such, the AI Runner ships with AI guardrails out of the box. This makes it easier to protect your LLM traffic against security attacks while ensuring that personal and sensitive data is not returned by the LLMs.

Out-of-the-box AI guardrails are available and ready to use for your AI traffic.


By allowing you to select exactly which guardrails you want to apply for each AI Runner, Insomnia makes it easier to create secure AI experiences, with less coding.

In the future, we will allow you to easily create your own guardrails, too.

Built for developers, powered by Konnect

Under the hood, the new AI Runner is powered by a subset of features provided by Kong’s AI Gateway technology. It runs on the enterprise infrastructure provided by Kong Konnect, which is currently powering hundreds of enterprise organizations across the world, including those operating in highly regulated industries. 

The Insomnia AI Runner is powered by Kong AI Gateway, running on Kong Konnect.

It is entirely possible to self-host your own version of the AI Runner by deploying Kong’s AI Gateway directly (you can contact sales to learn more) and - by doing so - gain access to even more AI features that are currently unavailable in the Insomnia AI Runner.

Get started for free

You can get started for free with AI Runner today.