What is MCP? Diving Deep into the Future of Remote AI Context
The hype for Anthropic’s Model Context Protocol (MCP) has reached a boiling point. Everyone (including Kong) is releasing something around MCP to ensure they aren't seen as falling behind in the ever-changing AI landscape.
However, in this mad dash, there remains confusion around MCP and what this standard actually enables. Some see MCP as a total game-changer, and some see it as little more than a thin and unnecessary wrapper.
As usual, the truth lies somewhere in between. Let’s dive into the details.
What is Model Context Protocol (MCP)?
MCP, Model Context Protocol, was released by Anthropic back in November 2024 to minimal fanfare. The protocol was billed as a universal, open standard detailing how AI assistants can gain context — external data, tools, and services — to help accomplish tasks. After some bigger players in the AI space adopted the standard, we saw an explosion in interest earlier this year.
So why do models need additional context? Well, for example, let's say a user prompts an LLM with the following: “Can you draft a follow-up email to Acme Corp that references our last call and ties in the pricing slide from my proposal deck?” No LLM in the world can successfully complete this task without additional context: access to the call transcript, proposal slide deck, knowledge of Acme Corp, etc. However, without a standard, the LLM needs a custom integration to access each piece of context, leading to fragmented, inconsistent experiences, thereby slowing down the development of AI applications.
MCP is just a standardized way (i.e., a protocol) for LLMs to access the context they need to accomplish tasks. There are plenty of great resources that get into the weeds of the protocol and the underlying transports, but for now, let’s just define the core entities at play:
- LLM: The large language model that interprets user input and any additional context.
- MCP host: The runtime environment (e.g., a chatbot) that connects an LLM with one or more MCP clients.
- MCP client: A component running inside the MCP host that maintains 1:1 connections with MCP servers and relays context to the LLM.
- MCP server: A service exposing context in a standardized format to MCP clients
- Context types: Four structured primitives — tools, resources, prompts, and sampling — that an MCP server can provide to an LLM.
These entities work together to create a consistent approach in providing additional context to models. This context can be anything from callable functions (i.e., tools) to text files (i.e., resources) to parameterized prompts that guide the LLM into a structured workflow.
To illustrate, let’s walk through a simple example of MCP in action. Let's say a user asks a chatbot something that requires additional context beyond the model’s training data, such as, "What's the weather in San Francisco today?"
The MCP client first ensures it has up-to-date tool definitions from connected MCP servers. It injects the get_weather
tool into the conversation context before sending everything to the LLM. The LLM recognizes the need for additional context to answer the query and invokes get_weather(city="San Francisco")
with the appropriate parameters.
The MCP client routes this tool call to the correct server, which executes the logic — calling a weather API — and returns the result. This data is passed back to the LLM, which is leveraged to craft a final response for the user.

Notably, the actual logic executed here is just a standard API call, which illustrates an important point. MCP does not replace APIs, it’s just an additional interface to make APIs more LLM friendly.
Tool calling… repackaged?
As we just saw, one of the first things people notice about MCP is that it’s not doing anything totally novel. Which leads to some common objections:
- It’s just tool calling repackaged
- It’s a thin wrapper on APIs
- Why not just use REST or gRPC?
Most importantly, people ask, "So what?" They see hype without substance.
Let’s attempt to address these critiques. In its simplest form, yes, you could just view MCP as tool calling repackaged (we’ll get to what makes it more in the next section). But let's go back to Anthropic’s mission statement for this protocol: it’s all about defining an open, universal standard. The focus here is on driving innovation through interoperability in this rapidly evolving space.
Still, many ask why we need a layer on top of existing APIs with well-defined standards? Consider a REST API defined by OpenAPI specification that's both machine and human readable. The spec defines things like what endpoints exist, what parameters are required, what the response might look like, etc. In other words, it tells you how to call an API.
What it doesn’t tell you is why or when to call it, or how to interpret the response — all things an LLM, especially autonomous agents, need to reason effectively. For example, a POST /todos
endpoint might let you create a task with a task_descriptio
n parameter, but it lacks any guidance on how it maps to a goal like “plan my day.”
In contrast, MCP wraps that same function add_task(task_description)
with a natural-language prompt that describes its intent — e.g., “Given a goal like {user_goal}
, break it down into tasks and call add_task
to add them to a todo list.” This additional context helps LLMs to reason about when and why to invoke tools, enabling goal-driven, not hardcoded, logic.
Additionally, REST APIs don’t support a concept called reflection — the ability for a client to ask a server, “What functions do you offer, and how do I call them?” That’s essential for dynamic, model-driven interactions. This requirement immediately leads people to look to gRPC, which does support reflection and defines callable functions with structured inputs and outputs (i.e., tools) via Protocol Buffers.
But gRPC wasn’t built with language models in mind. gRPC prioritizes performance and uses a compact binary format between machines. Its schemas are terse and lack natural language descriptions. While efficient, this makes gRPC interfaces difficult for LLMs to understand without extra tooling or translation. MCP, on the other hand, uses plain JSON with embedded descriptions and usage guidance for each tool, making it far easier for LLMs to interpret and use. MCP effectively serves as a bridge between traditional RPC systems and AI-driven tool use.
This combination of reflection and intention-based communication, describing the what instead of the how, enables more resilient systems. Steve Manuel aptly describes MCP as “the differential for modern APIs and systems” in his blog. This resilience is achieved by allowing clients to dynamically discover server capabilities at runtime instead of build time, and semantic parameters, which use human-readable and conceptually-focused definitions, making MCP-based integrations capable of seamlessly handling changes to the underlying API.
OK, so hopefully now you’re at least starting to see MCP as a necessary and valuable standard, but perhaps you're still confused why there's so much buzz around it. Let’s now take a look at something else MCP offers: enabling agentic workflows.
MCP for agentic workflows
A raw LLM simply maps inputs to outputs. An agentic LLM system gives the LLM:
- Tools to act
- A memory of past steps
- A way to loop and reason iteratively
- Optional goals or tasks
Therefore, when you hook an LLM up with tools, let it decide what tools to call, let it reflect on outcomes, and let it plan next steps — you’ve made it agentic. It can now decide what to do next without being told every step.
So what does this have to do with MCP? Well, as we mentioned, MCP can provide context beyond just tools. MCP servers can also provide parameterized prompts that effectively allow the MCP server to provide the next instruction to the LLM. This prompt chaining can open some very interesting doors.
What’s even more compelling is how MCP can surface related tools at the right time, without needing to cram every option into the prompt context. Rather than overengineering prompt descriptions to account for every possibility and force the LLM into a deterministic workflow, MCP allows a more modular approach: "Here’s the response from this tool call, and here are some tools that might help if this gets more complex." This makes the system more adaptive and scalable, while still giving the LLM the flexibility to explore new paths if the initial instruction isn’t fully deterministic.
In fact, with these capabilities, we have something resembling an agent that emerges from the interplay between
- LLM (reasoning and deciding)
- MCP servers (offering tools and chaining prompts)
- MCP client (managing the loop and execution)
- User (providing the goal)
Let’s take a look at this in action. We’re going to demonstrate a very simple agentic workflow where an LLM invokes tools from multiple MCP servers based on returned prompts. Here are the servers we're working with:
Todo List MCP Server
Calendar MCP Server
OK, so now let's imagine we have a chatbot with access to the context provided by these MCP servers. When a user provides a high-level goal like “I want to focus on deep work today,” the MCP client coordinates a modular, multi-server workflow to fulfill the request. It packages the user message, along with tool metadata and prompt instructions from all connected MCP servers, and sends it to the LLM. The LLM first selects a high-level planning tool plan_daily_tasks
from the Todo Server, which returns a prompt directing the LLM to break down the goal into actionable tasks using add_task
.
As tasks are created and the LLM is notified, the LLM reasons further and decides to schedule the tasks by invoking schedule_todo_task
, triggering the Calendar Server. That server responds with new prompt guidance to use schedule_event
, at which point the LLM finalizes the day’s plan with specific times.
Each tool interaction is routed and mediated by the MCP client, which manages the reasoning loop, coordinates tool execution, and tracks interaction state across the session. This forms a fully agentic workflow: the user sets the goal, the LLM reasons and decides, the MCP servers expose tools and dynamic prompts, and the MCP client orchestrates the process, enabling intelligent, composable automation across domains.

From a very basic and high-level prompt, we now have an agent that makes several decisions on its own to reach an end goal. Of course, there is little value in generating these tasks without knowing more about what the user wants to focus their deep work on, but improving this simply requires modifying the MCP server to have a more comprehensive and well-thought-out prompt.
MCP nesting
Things start becoming really interesting when you start to look beyond a single layer of MCP clients and servers. MCP servers can also be clients to other MCP servers. This nesting enables modularity, composition, and agent-like delegation, where one server can "delegate" part of its reasoning or functionality to another.
It’s like microservices for agents. Just as we moved from monoliths to microservices for backend applications, we’re now decoupling tool logic from the agent runtime using MCP servers. Based on the rapid addition of new MCP servers, it’s easy to imagine a vast and highly composable system of tooling that can be used like lego bricks to build out comprehensive workflows.
For example, you could have a dev-scaffolding
MCP server that acts as a high-level orchestrator that focuses on helping devs go from ideas to working code by coordinating several specialized, upstream MCP servers. When a user requests a new app feature (e.g., "Add a login feature"), the orchestrator server uses upstream servers — spec-writer
to generate an API spec, code-gen
to scaffold code from that spec, and test-writer
to produce corresponding test cases.
These collective MCP servers could also be used for environment-specific functionality. In other words, they expose the same interface (e.g., query_database
) but are configured for different environments. This would allow you to have a dev-app-server
that includes upstream MCP servers like a dev-db-server
using a SQlite database, a dev-auth-server
that returns mocked auth responses, and a dev-deploy-server
that wraps a local CLI tool. Then the prod-app-server
would point to correlate upstream servers tied to cloud-based deployments.
Platforms like mcp.run have already heavily leveraged this composability. Mcp.run allows you to install an extensible, dynamically updateable server that leverages an upstream registry of MCP servers they call servlets. These servlets do not need to be installed locally, but can run remotely on the mcp.run infrastructure. This is quite powerful for a number of reasons, but for the purposes of this blog, it highlights an important shift that is taking place in the MCP ecosystem.
Remote MCP servers
Today, the vast majority of MCP servers are run locally and leverage the standard input/output (stdio) transport for local communication. This means every MCP server must be installed locally alongside the MCP client. While great for quick testing, this greatly hampers the ability to have a larger network of tools agents for the following reasons:
- Limited interoperability: The beauty of MCP is in its ability to help stand up a diverse ecosystem of resilient services for LLMs and agents. Local MCP servers can't be easily shared across teams, tools, or agents — even with a registry, each user must manually install and configure them, which fragments the ecosystem.
- No central updates: Updates require every client to re-install or re-sync manually, increasing maintenance overhead and the risk of version drift across environments.
- Harder to secure and audit: It's more difficult to apply centralized security policies, monitor usage, or audit tool behavior when each instance is running locally and independently.
- Poor developer experience for consumers: Other developers or agents can’t simply “call” your server or discover its tools unless they clone your whole setup, which stifles reuse and composability.

Image credit: https://docs.mcp.run/blog/2024/12/18/universal-tools-for-ai/
With Anthropic’s recent updates to the MCP spec, they are clearly setting the stage for an explosion in growth in remote servers that will help address the above concerns. This is most evident in the addition of experimental support for a "Streamable HTTP" transport to replace the existing HTTP+SSE approach. This eliminates the hard requirement for persistent, stateful connections and defaults to a stateless MCP server.
Additionally, Anthropic recently updated the MCP spec to introduce MCP Authorization – based on OAuth 2.1. While a necessary step for remote MCP servers, there has been a lot of discussion around the issues introduced by this approach, namely, requiring the MCP server to act as both the resource and authorization server.
Cracks in the architecture: Scaling challenges for remote MCP
MCP servers going remote is inevitable—but not effortless. While the shift toward remote-first MCP servers promises extensibility and reuse, it also introduces a fresh set of operational and architectural challenges that demand attention. As we move away from tightly-coupled local workflows and embrace distributed composability, we must confront a number of critical issues that threaten both developer experience and system resilience.
Authentication and authorization woes
The MCP specification proposes OAuth 2.1 as a foundation for secure remote access, but its implementation details remain complex and problematic. MCP servers are expected to act as both authorization servers and resource servers. This dual responsibility breaks conventional security models and increases the risk of misconfiguration.
Unlike traditional APIs that can rely on well-established IAM patterns, MCP introduces novel identity challenges, particularly when chaining tools across nested servers with varying access policies.
Security risks and tool poisoning attacks
One of the more subtle but critical vulnerabilities in the MCP model has been recently outlined by Invariant Labs. In what they call “Tool Poisoning Attacks (TPAs),” malicious actors can inject harmful instructions directly into the metadata of MCP tools. Since these descriptions are interpreted by LLMs as natural context, a poisoned tool could quietly subvert agentic reasoning and coerce it to leak sensitive data, perform unintended actions, or corrupt decision logic.
These risks are exacerbated when MCP servers are publicly discoverable or shared across organizational boundaries, and no clear boundary exists to verify or constrain which tools are trustworthy.
Fragile infrastructure: High availability, load balancing, and failover
When local tools break, it’s a personal inconvenience. When remote MCP servers fail, it’s a systemic failure that can cascade across an entire agentic workflow. High availability becomes a hard requirement in this world, especially when toolchains depend on server chaining. A single upstream server going offline could stall the entire plan execution.
Yet today, MCP lacks a built-in mechanism for load balancing or failover. These are critical gaps that need addressing as we rely more heavily on distributed composition.
Developer onboarding and ecosystem fragmentation
With the proliferation of MCP servers, discoverability becomes a pressing concern. How do developers find trusted, maintained servers? How do they know what tools are available or how to invoke them? While Anthropic has hinted at a registry system in their roadmap, no robust discovery solution exists today.
Without clear strategies for documentation, onboarding, and governance, developers are left to navigate a fragmented ecosystem where reusability and collaboration suffer.
Context bloat and LLM bias
Remote composition sounds elegant — until you realize that each server added to a session expands the LLM context window. Tool metadata, parameter schemas, prompt templates: it all adds up, especially in high-churn, multi-agent environments.
And once tools are injected into context, there’s no guarantee they’ll be used wisely. LLMs are often biased toward invoking tools that appear in context, even when unnecessary. This can lead to redundant calls, bloated prompt chains, and inefficient workflows. A problem that will be exacerbated by the increasing number of remote servers being registered.
The gateway pattern: An old friend for a new interface
To folks who live and breathe APIs, many of these challenges sound . . . familiar. Authentication quirks? Load balancing? Developer onboarding? These are the kinds of problems that modern API management tooling—especially API gateways—have been solving for well over a decade.
And as we stated earlier, MCP doesn’t replace APIs. It simply introduces a new interface layer that makes APIs more LLM-friendly. In fact, many MCP servers are just clever wrappers around existing APIs. So, rather than reinvent the wheel, let’s explore how we can apply the battle-tested API gateway to the emerging world of remote MCP servers.
Auth? Already solved
Gateways are already great at managing authentication and authorization, especially in enterprise environments. Instead of relying on each MCP server to act as its own OAuth 2.1 provider (a pattern that introduces security and operational complexity), we can delegate auth to a central gateway that interfaces with proper identity providers and authorization servers.
This simplifies token handling, supports centralized policy enforcement, and adheres to real-world IAM patterns that organizations already trust.
Security, guardrails, and trust boundaries
The gateway could serve as a vital security layer that filters and enforces which MCP servers and tools are even eligible to be passed into an LLM context. This provides a natural checkpoint for organizations to implement allowlists, scan for tool poisoning patterns, and ensure that only vetted, trusted sources are ever included in agentic workflows.
In essence, a gateway becomes a programmable trust boundary that stands between your agents and the open-ended world of MCP. When used properly, this alone could neutralize a large class of Tool Poisoning Attacks.
Resilience, load balancing, and observability built in
When MCP servers are registered behind a gateway, we get automatic benefits: load balancing, failover, health checks, and telemetry. Gateways are built for high availability, and they’re designed to route requests to the healthiest upstream server.
This is critical for agentic workflows where the failure of one link could disrupt the entire chain. Add in monitoring and circuit breakers, and you’ve got the makings of a reliable, observable infrastructure layer that MCP currently lacks.
Gateway as developer experience engine
Modern API gateways don’t just route traffic, but anchor entire developer ecosystems. API portals, internal catalogs, usage analytics, and onboarding flows are all well-supported in today’s API management stacks. There’s no reason MCP should be different.
By exposing MCP servers through something like a gateway-managed developer portal, we can offer consistent discovery, documentation, and access control, turning a fragmented server sprawl into a curated marketplace of capabilities.
Tackling context bloat and client overhead
The final two problems — LLM context bloat and bias — are tougher nuts to crack. But this is where a future, more intelligent gateway could shine.
Imagine the gateway not just as a proxy, but as an adaptive MCP server. One that connects to upstream MCP servers, introspects their tools, and selectively injects relevant context based on the user’s prompt. It could maintain persistent upstream connections and handle tool registration dynamically, reducing the need for spawning redundant client processes and minimizing token bloat in the LLM context.
Tools like mcpx have started down this road already, but it makes sense to centralize and scale this capability in the gateway: after all, it’s already the front door to your organization’s APIs.
Conclusion
As AI agents evolve from novelties to core components of modern software, their need for structured, reliable access to tools and data becomes foundational. MCP introduces a powerful new interface for exposing that functionality, which enables agents to reason, plan, and act across services. But as MCP servers move toward a remote-first model, developer and operational complexity rise dramatically.
From authentication and load balancing to context management and server discovery, the road to remote MCP isn’t without potholes. Yet, many of these challenges are familiar to those who’ve spent time in the world of API infrastructure. That’s why an API gateway, long trusted for securing, scaling, and exposing HTTP services, may be the perfect solution to extend MCP into production-grade, enterprise-ready territory.
At Kong, we believe this convergence is already happening. With the Kong Konnect platform and Kong AI Gateway, organizations can begin to apply proven gateway patterns to emerging AI interfaces like MCP. From scalable auth to load balancing and developer onboarding, much of what’s needed for remote MCP is already here.
👉 Want to learn more? See how Kong is solving real-world MCP server challenges today.
Unleash the power of APIs with Kong Konnect

Model Context Protocol (MCP) FAQs
What is the Model Context Protocol (MCP)?
MCP (Model Context Protocol) is an open standard that defines how large language models (LLMs) can access external data, tools, and services. It streamlines the process of providing LLMs with the additional context they need—such as files, prompts, or callable functions—to complete tasks effectively.
Why do LLMs need additional context with MCP?
LLMs often rely on information beyond their trained dataset, such as transcripts, presentation decks, or enterprise data. MCP provides a standardized way for LLMs to retrieve this real-time context. As a result, it eliminates the need for custom integrations and reduces fragmented or inconsistent user experiences when building AI applications.
How does MCP differ from using existing REST or gRPC APIs?
Traditional REST or gRPC APIs specify how to call a service, but they don’t inherently guide an LLM on why or when to call these services. MCP not only defines callable functions (tools) using a structured, plain JSON format but also includes natural-language prompts that help the LLM interpret and use these functions with minimal custom code.
Is MCP just another form of tool calling?
MCP can certainly resemble “tool calling,” but it goes further by serving as a universal, open protocol. Instead of only exposing endpoints, MCP provides additional context in the form of descriptions and usage guidance. This enables better decision-making by AI models and promotes interoperability across multiple tools and services.
How does MCP enable agentic workflows?
MCP provides tools, memory, and optional prompts that enable LLMs to reason and act in an iterative loop. The LLM can autonomously decide which tools to call, interpret results, and plan next steps. This “agentic” approach lets the LLM work more independently, making dynamic choices to achieve a given user goal.
What are the main challenges with remote MCP servers?
Moving from local to remote MCP servers introduces complexities like secure authentication (OAuth 2.1), high availability, load balancing, and the risk of “tool poisoning” attacks. Without proper oversight and infrastructure, these distributed services can lead to fragility, poor user experiences, and security vulnerabilities.
What is tool poisoning in MCP?
Tool poisoning is when malicious actors embed harmful instructions directly into an MCP tool’s metadata or descriptions. Because MCP relies on natural-language guidance, a compromised tool could mislead the LLM into leaking sensitive data or taking unwanted actions. Guardrails like centralized security policies and careful tool vetting are key to preventing these attacks.
Why is the gateway pattern relevant for MCP?
Gateway solutions have long solved challenges like authentication, load balancing, and developer onboarding for traditional APIs. Similarly, by placing an API gateway in front of MCP servers, you can manage security, scale infrastructure, and provide a central point for discovering and documenting MCP-based services. This helps create a more reliable, enterprise-grade framework for remote MCP implementations.
How can a gateway improve the security of remote MCP servers?
Gateways centralize authentication and authorization, allowing you to apply consistent security policies across multiple MCP servers. They also enable the creation of trust boundaries, where only vetted tools are passed to the LLM. This reduces the risk of unauthorized access or tool poisoning and ensures robust compliance across your AI services.
How does MCP handle context bloat in multi-server scenarios?
When multiple MCP servers are added, each one injects tool metadata, prompts, and other information into the LLM’s context window. This can lead to context bloat, where the LLM is overwhelmed with unnecessary details. Techniques like selective tool injection, intelligent prompt chaining, and centralized orchestration via a gateway can mitigate these issues and keep the context more focused.
How are organizations like Kong addressing MCP challenges?
Kong applies proven API management principles to MCP with its Kong Konnect and Kong AI Gateway offerings. By leveraging existing gateway features—authentication, load balancing, developer portals, and more—Kong helps enterprises secure, scale, and govern remote MCP servers. This ensures smooth developer experiences and robust operational consistency as AI-driven workflows grow.
Where can I learn more about implementing remote MCP servers?
You can explore Kong’s blog posts on MCP, including details on security, governance, and best-practice patterns for integrating LLMs with enterprise services. Additionally, check Anthropic’s official MCP specification and platforms like mcp.run for insights into setting up composable and remote MCP servers.