REGISTER NOW FOR THE KONG AGENTIC ERA WORLD TOUR GOVERN A2A TRAFFIC WITH KONG'S NEW AGENT GATEWAY WHY GARTNER’S “CONTEXT MESH” CHANGES EVERYTHING DON’T MISS API + AI SUMMIT 2026 SEPT 30 – OCT 1
  • [Why Kong](/company/why-kong)Why Kong
    • Explore the unified API Platform
        • BUILD APIs
        • [
          Kong Insomnia](/products/kong-insomnia)
          Kong Insomnia
        • [
          API Design](/products/kong-insomnia/api-design)
          API Design
        • [
          API Mocking](/products/kong-insomnia/api-mocking)
          API Mocking
        • [
          API Testing and Debugging](/products/kong-insomnia/api-testing-and-debugging)
          API Testing and Debugging
        • [
          MCP Client](/products/kong-insomnia/mcp-client)
          MCP Client
        • RUN APIs
        • [
          API Gateway](/products/kong-gateway)
          API Gateway
        • [
          Context Mesh](/products/kong-konnect/features/context-mesh)
          Context Mesh
        • [
          AI Gateway](/products/kong-ai-gateway)
          AI Gateway
        • [
          Event Gateway](/products/event-gateway)
          Event Gateway
        • [
          Kubernetes Operator](/products/kong-gateway-operator)
          Kubernetes Operator
        • [
          Service Mesh](/products/kong-mesh)
          Service Mesh
        • [
          Ingress Controller](/products/kong-ingress-controller)
          Ingress Controller
        • [
          Runtime Management](/products/kong-konnect/features/runtime-management)
          Runtime Management
        • DISCOVER APIs
        • [
          Developer Portal](/products/kong-konnect/features/developer-portal)
          Developer Portal
        • [
          Service Catalog](/products/kong-konnect/features/api-service-catalog)
          Service Catalog
        • [
          MCP Registry](/products/mcp-registry)
          MCP Registry
        • GOVERN APIs
        • [
          Metering and Billing](/products/kong-konnect/features/usage-based-metering-and-billing)
          Metering and Billing
        • [
          APIOps and Automation](/products/apiops-automation)
          APIOps and Automation
        • [
          API Observability](/products/kong-konnect/features/api-observability)
          API Observability
        • [Why Kong?](/company/why-kong)Why Kong?
      • CLOUD
      • [Cloud API Gateways](/products/kong-konnect/features/dedicated-cloud-gateways)Cloud API Gateways
      • [Need a self-hosted or hybrid option?](/products/kong-enterprise)Need a self-hosted or hybrid option?
      • COMPARE
      • [Considering AI Gateway alternatives? ](/performance-comparison/ai-gateway-alternatives)Considering AI Gateway alternatives?
      • [Kong vs. Postman](/performance-comparison/kong-vs-postman)Kong vs. Postman
      • [Kong vs. MuleSoft](/performance-comparison/kong-vs-mulesoft)Kong vs. MuleSoft
      • [Kong vs. Apigee](/performance-comparison/kong-vs-apigee)Kong vs. Apigee
      • [Kong vs. IBM](/performance-comparison/ibm-api-connect-vs-kong)Kong vs. IBM
      • GET STARTED
      • [Sign Up for Kong Konnect](/products/kong-konnect/register)Sign Up for Kong Konnect
      • [Documentation](https://developer.konghq.com/)Documentation
      • FOR PLATFORM TEAMS
      • [Developer Platform](/solutions/building-developer-platform)Developer Platform
      • [Kubernetes and Microservices](/solutions/build-on-kubernetes)Kubernetes and Microservices
      • [Observability](/solutions/observability)Observability
      • [Service Mesh Connectivity ](/solutions/service-mesh-connectivity)Service Mesh Connectivity
      • [Kafka Event Streaming](/solutions/kafka-stream-api-management)Kafka Event Streaming
      • FOR EXECUTIVES
      • [AI Connectivity](/ai-connectivity)AI Connectivity
      • [Open Banking](/solutions/open-banking)Open Banking
      • [Legacy Migration](/solutions/legacy-api-management-migration)Legacy Migration
      • [Platform Cost Reduction](/solutions/api-platform-consolidation)Platform Cost Reduction
      • [Kafka Cost Optimization](/solutions/reduce-kafka-cost)Kafka Cost Optimization
      • [API Monetization](/solutions/api-monetization)API Monetization
      • [AI Monetization](/solutions/ai-monetization)AI Monetization
      • [AI FinOps](/solutions/ai-cost-governance-finops)AI FinOps
      • FOR AI TEAMS
      • [Agent Gateway](/agent-gateway)Agent Gateway
      • [AI Governance](/solutions/ai-governance)AI Governance
      • [AI Security](/solutions/ai-security)AI Security
      • [AI Cost Control](/solutions/ai-cost-optimization-management)AI Cost Control
      • [Agentic Infrastructure](/solutions/agentic-ai-workflows)Agentic Infrastructure
      • [MCP Production](/solutions/mcp-production-and-consumption)MCP Production
      • [MCP Traffic Gateway](/solutions/mcp-governance)MCP Traffic Gateway
      • FOR DEVELOPERS
      • [Mobile App API Development](/solutions/mobile-application-api-development)Mobile App API Development
      • [GenAI App Development](/solutions/power-openai-applications)GenAI App Development
      • [API Gateway for Istio](/solutions/istio-gateway)API Gateway for Istio
      • [Decentralized Load Balancing](/solutions/decentralized-load-balancing)Decentralized Load Balancing
      • BY INDUSTRY
      • [Financial Services](/solutions/financial-services-industry)Financial Services
      • [Healthcare](/solutions/healthcare)Healthcare
      • [Higher Education](/solutions/api-platform-for-education-services)Higher Education
      • [Insurance](/solutions/insurance)Insurance
      • [Manufacturing](/solutions/manufacturing)Manufacturing
      • [Retail](/solutions/retail)Retail
      • [Software & Technology](/solutions/software-and-technology)Software & Technology
      • [Transportation](/solutions/transportation-and-logistics)Transportation
      • [See all Solutions](/solutions)See all Solutions
  • [Pricing](/pricing)Pricing
      • DOCUMENTATION
      • [Kong Konnect](https://developer.konghq.com/konnect/)Kong Konnect
      • [Kong Gateway](https://developer.konghq.com/gateway/)Kong Gateway
      • [Kong Mesh](https://developer.konghq.com/mesh/)Kong Mesh
      • [Kong AI Gateway](https://developer.konghq.com/ai-gateway/)Kong AI Gateway
      • [Kong Event Gateway](https://developer.konghq.com/event-gateway/)Kong Event Gateway
      • [Kong Insomnia](https://developer.konghq.com/insomnia/)Kong Insomnia
      • [Plugin Hub](https://developer.konghq.com/plugins/)Plugin Hub
      • EXPLORE
      • [Blog](/blog)Blog
      • [Learning Center](/blog/learning-center)Learning Center
      • [eBooks](/resources/e-book)eBooks
      • [Reports](/resources/reports)Reports
      • [Demos](/resources/demos)Demos
      • [Customer Stories](/customer-stories)Customer Stories
      • [Videos](/resources/videos)Videos
      • EVENTS
      • [API + AI Summit](/events/conferences/api-ai-summit)API + AI Summit
      • [Agentic Era World Tour](/agentic-era-world-tour)Agentic Era World Tour
      • [Webinars](/events/webinars)Webinars
      • [User Calls](/events/user-calls)User Calls
      • [Workshops](/events/workshops)Workshops
      • [Meetups](/events/meetups)Meetups
      • [See All Events](/events)See All Events
      • FOR DEVELOPERS
      • [Get Started](https://developer.konghq.com/)Get Started
      • [Community](/community)Community
      • [Certification](/academy/certification)Certification
      • [Training](https://education.konghq.com)Training
      • COMPANY
      • [About Us](/company/about-us)About Us
      • [We're Hiring!](/company/careers)We're Hiring!
      • [Press Room](/company/press-room)Press Room
      • [Contact Us](/company/contact-us)Contact Us
      • [Kong Partner Program](/partners)Kong Partner Program
      • [Enterprise Support Portal](https://support.konghq.com/s/)Enterprise Support Portal
      • [Documentation](https://developer.konghq.com/?_gl=1*tphanb*_gcl_au*MTcxNTQ5NjQ0MC4xNzY5Nzg4MDY0LjIwMTI3NzEwOTEuMTc3MzMxODI2MS4xNzczMzE4MjYw*_ga*NDIwMDU4MTU3LjE3Njk3ODgwNjQ.*_ga_4JK9146J1H*czE3NzQwMjg1MjkkbzE4OSRnMCR0MTc3NDAyODUyOSRqNjAkbDAkaDA)Documentation
  • [](/search)
  • [Login](https://cloud.konghq.com/login)Login
  • [Book Demo](/contact-sales)Book Demo
  • [Get Started](/products/kong-konnect/register)Get Started
[Blog](/blog)Blog
  • [AI Gateway](/blog/tag/ai-gateway)AI Gateway
  • [AI Security](/blog/tag/ai-security)AI Security
  • [AIOps](/blog/tag/aiops)AIOps
  • [API Security](/blog/tag/api-security)API Security
  • [API Gateway](/blog/tag/api-gateway)API Gateway
|
    • [API Management](/blog/tag/api-management)API Management
    • [API Development](/blog/tag/api-development)API Development
    • [API Design](/blog/tag/api-design)API Design
    • [Automation](/blog/tag/automation)Automation
    • [Service Mesh](/blog/tag/service-mesh)Service Mesh
    • [Insomnia](/blog/tag/insomnia)Insomnia
    • [Event Gateway](/blog/tag/event-gateway)Event Gateway
    • [View All Blogs](/blog/page/1)View All Blogs
We're Entering the Age of AI Connectivity [Read more](/blog/news/the-age-of-ai-connectivity)Read moreProducts & Agents:
    • [Kong AI Gateway](/products/kong-ai-gateway)Kong AI Gateway
    • [Kong API Gateway](/products/kong-gateway)Kong API Gateway
    • [Kong Event Gateway](/products/event-gateway)Kong Event Gateway
    • [Kong Metering & Billing](/products/usage-based-metering-and-billing)Kong Metering & Billing
    • [Kong Insomnia](/products/kong-insomnia)Kong Insomnia
    • [Kong Konnect](/products/kong-konnect)Kong Konnect
  • [Documentation](https://developer.konghq.com)Documentation
  • [Book Demo](/contact-sales)Book Demo
  1. Home
  2. Blog
  3. Engineering
  4. 5 Best Practices for Securing AI Microservices at Scale
[Engineering](/blog/engineering)Engineering
April 2, 2026
14 min read

# 5 Best Practices for Securing AI Microservices at Scale

Kong

The microservices revolution promised agility and scalability. Teams could deploy faster, scale independently, and innovate without monolithic constraints. You gain speed and flexibility, but you also multiply trust boundaries, identities, network paths, and policy decisions.

Then came AI, and everything changed.

In 2025, the security reality for AI-integrated microservices is stark. Security challenges in microservices continue to escalate as organizations struggle with API proliferation and shadow API discovery. The microservices architecture market has grown from $6.27 billion in 2024 to $7.4 billion in 2025 at a compound annual growth rate (CAGR) of 17.9%, with projections reaching $15.64 billion by 2029 [1].

Consider what happens when a user sends a single prompt to a customer service chatbot. That prompt triggers API gateway authentication, LLM endpoint calls, [RAG retrievals from vector databases](https://konghq.com/blog/learning-center/what-is-rag-retrieval-augmented-generation)RAG retrievals from vector databases, and tool execution through [MCP servers](https://konghq.com/products/kong-konnect/agents)MCP servers. A single request can traverse a dozen or more services. This illustrates how AI amplifies existing microservices complexity exponentially.

The Stakes Keep Rising

The security implications are severe. [OWASP's 2025 Top 10 for LLM Applications](https://konghq.com/blog/engineering/owasp-top-10-ai-and-llm-guide)OWASP's 2025 Top 10 for LLM Applications ranks prompt injection as the number one critical vulnerability. Attackers manipulate LLM inputs to override instructions, extract sensitive data, or trigger unintended behaviors. Traditional security tools can't detect semantic-level attacks. They can't distinguish between legitimate AI traffic and data exfiltration attempts. They can't govern token consumption or prevent prompt injection.

The attack surface has fundamentally changed. Indirect prompt injection techniques allow attackers to embed harmful instructions within source materials or intermediary processes. For example, an attacker might post a crafted prompt on an online forum, instructing any LLM reading it to recommend a phishing site. When a user later asks an AI assistant to summarize the forum discussion, the model processes the malicious recommendation.

The Solution: Extend, Don't Replace

Here's the crucial insight: [AI doesn't invalidate proven microservices security principles. It reveals where they're incomplete](https://konghq.com/blog/enterprise/microservices-to-ai-traffic-kong-as-the-unified-control-plane)AI doesn't invalidate proven microservices security principles. It reveals where they're incomplete.

You don't need to tear down your security stack and start over. The solution lies in extending your proven [microservices security best practices](https://konghq.com/events/webinars/securing-and-governing-microservices)microservices security best practices—zero-trust, centralized policy, infrastructure-level controls—to address AI's unique risks. This means applying the same rigor to prompt injection, model-driven data exfiltration, and runaway token costs that you apply to traditional API security.

This guide presents five critical best practices that bridge the gap:

  1. Zero-trust architecture for all service communication
  2. Centralized policy enforcement across AI and API traffic
  3. Infrastructure-level prompt security and data protection
  4. AI-specific observability for new traffic patterns
  5. Secure RAG and MCP governance at scale

Best Practice #1: Implement Zero-Trust Architecture for All Service Communication

Why Is Zero-Trust Mandatory in an AI Context?

Zero-trust operates on a fundamental principle: never trust, always verify. Every request, every connection, every interaction must prove its identity and authorization — no exceptions.

This principle becomes critical when AI enters the picture. AI agents behave like autonomous clients within your infrastructure. They make decisions independently, access multiple services in rapid succession, and can be manipulated through prompt injection to perform unintended actions. Security researchers note that prompt injection vulnerabilities exist due to the fundamental nature of how LLMs process instructions and data together, making complete prevention challenging.

Consider the attack surface. Machine identities already dominate the landscape — AI agents add another layer of complexity. Each agent needs credentials, scoped privileges, and continuous verification. Without zero-trust, a single compromised agent becomes a skeleton key to your entire infrastructure.

How Does mTLS Secure East-West Traffic?

Mutual TLS (mTLS) is a protocol where both services authenticate each other with certificates before any data is exchanged — providing the foundation for zero-trust communication and eliminating man-in-the-middle attacks.

Implementation requirements:

  • Certificate generation: Issue unique certificates for each service and AI endpoint
  • Rotation policies: Enforce 90-day maximum certificate validity
  • Dedicated CA: Use an internal Certificate Authority for service certificates
  • Automated renewal: Implement automatic certificate rotation before expiration
  • Revocation lists: Maintain CRLs for compromised credentials

mTLS becomes especially critical for AI workloads. When an LLM endpoint communicates with a vector database for RAG retrieval, both sides must verify identity. When an AI agent calls internal APIs, every connection needs encryption and authentication.

Why Isn't IP-Based Access Control Enough for AI Services?

IP addresses mean nothing in containerized environments — services spin up, scale, and disappear constantly, making cryptographic identity the only reliable foundation for access control.

Each service, including AI agents and LLM endpoints, needs its own cryptographic identity. Apply the principle of least privilege rigorously:

AI-specific requirements:

  • Service accounts: Issue unique identities for AI agents with time-bound credentials
  • Dynamic authorization: Adapt permissions based on context, not just identity
  • Short-lived tokens: Limit AI agent credentials to 15–30 minute lifespans
  • Scope limitations: Restrict read/write access based on actual requirements

The importance of identity-based controls is underscored by real-world incidents where services trusted network location over verified identity, leading to exploits.

How Does a Service Mesh Help Enforce Zero-Trust at Scale?

A service mesh automates zero-trust enforcement across all microservices by injecting sidecar proxies alongside each workload — eliminating the inconsistency that comes from managing these controls manually.

These proxies handle:

  • Automatic mTLS: Encryption and authentication without code changes
  • Certificate management: Automated rotation and distribution
  • Policy enforcement: Consistent access control across all services
  • Traffic management: Load balancing and circuit breaking

For hybrid and multi-cloud deployments common in APAC, a service mesh ensures consistency. The same zero-trust policies apply whether services run in Kubernetes, VMs, or legacy infrastructure.

Kong Mesh provides turnkey zero-trust capabilities:

type: Mesh
name: ai-production
mtls:
  enabledBackend: built-in
  backends:
    - name: built-in
      type: builtin
      conf:
        rotation:
          expiration: 24h

Apply once. Every new microservice and AI pod inherits zero-trust automatically.

Key Insight: "If zero-trust secures the connection, centralized policy secures the behavior."

Best Practice #2: Centralize Security Policy Enforcement Across AI and API Traffic

Why Do Fragmented Security Controls Create Blind Spots?

When different traffic types are managed by different tools, the result is inconsistent enforcement, multiple dashboards with no shared visibility, and exploitable gaps between them.

Organizations typically manage traffic in silos. User-facing APIs go through an API gateway. LLM endpoints hide behind custom authentication. Internal gRPC calls bypass both. Add AI endpoints to this, and the complexity becomes exponential.

Different teams implement different controls:

  • Platform teams secure user APIs with OAuth and rate limiting
  • ML teams build custom authentication for LLM endpoints
  • Data teams use another approach for internal microservices

This fragmentation creates exploitable weaknesses. Attackers find the least protected endpoint and pivot from there.

Should AI Traffic Be Treated Differently from API Traffic?

No — AI traffic is API communication that requires the same rigorous controls. Whether the caller is a human, microservice, or AI agent, consistent policies must apply.

Essential controls for all traffic:

Authentication & Authorization

  • OAuth 2.0/OIDC for external clients
  • Service-to-service authentication via mTLS or JWT
  • API keys with rotation for AI services
  • Role-based access control with fine-grained permissions

Rate Limiting & Governance

  • Request limits for traditional APIs (100 requests/minute)
  • Token budgets for LLM endpoints (100,000 tokens/hour)
  • Concurrency caps for internal services
  • Cost controls preventing runaway AI spending

Audit & Compliance

  • Comprehensive logging for all requests
  • Correlation IDs across service calls
  • Regulatory compliance tracking
  • Anomaly detection and alerting

What Does a Unified Control Plane Give You?

A unified control plane is a single enforcement layer that applies the same policies, logging, and rules engine across all traffic types — eliminating the inconsistency of tool-per-team approaches.

Managing policies across diverse services and protocols demands centralization. A unified control plane provides:

  • Single source of truth: One dashboard for all policies
  • Consistent enforcement: Same rules engine for all traffic types
  • Unified logging: Centralized audit trails
  • Cross-cutting concerns: Authentication, rate limiting, and monitoring in one place

Organizations investing in unified security platforms prevent vulnerabilities more effectively than those using fragmented tools.

[Kong AI Gateway](https://konghq.com/products/kong-ai-gateway)Kong AI Gateway extends existing API gateway capabilities to AI traffic seamlessly. Your existing security policies automatically apply to AI endpoints, and AI-specific controls like prompt guards and token management layer on top — without creating new silos.

Best Practice #3: Enforce Prompt Security and Data Protection at the Infrastructure Layer

What New Attack Paths Does AI Introduce?

Unlike traditional API attacks, AI-specific threats operate at the semantic level — manipulating the meaning of inputs rather than exploiting authentication or protocol weaknesses, which makes them invisible to standard security tooling.

Traditional API security validates authentication, checks authorization, and logs requests — but it cannot inspect the semantic content of prompts. As of mid-2025, OWASP's Top 10 for LLM Applications ranks prompt injection as LLM01, identifying it as a critical vulnerability capable of causing sensitive data disclosure, unauthorized function access, and arbitrary command execution. Because prompt injection exploits the stochastic nature of how models process input, fool-proof prevention remains an open problem.

Modern attacks include:

  • Prompt injection: Override system instructions with malicious commands
  • Context poisoning: Inject false data into RAG retrievals
  • Data exfiltration: Trick models into revealing training data or context
  • Semantic manipulation: Alter model behavior through carefully crafted inputs

How Should Prompt Security Be Enforced Before Inputs Reach the Model?

Infrastructure-level prompt security acts as a first line of defense by inspecting and sanitizing prompts before they reach the LLM — stopping attacks at the boundary rather than relying on the model to reject them.

Pattern Detection

  • Block known injection patterns ("ignore previous instructions")
  • Identify system prompt extraction attempts
  • Detect role-playing manipulations
  • Flag suspicious instruction sequences

Semantic Analysis

  • Use secondary models to classify prompt intent
  • Detect output format manipulation attempts
  • Identify unauthorized information requests
  • Score prompts for malicious probability

Context Isolation

  • Separate user input from system instructions
  • Use structured prompt templates
  • Implement clear data source delimiters
  • Prevent external data injection

Security research has identified over 30 distinct prompt injection techniques across ecosystems. Infrastructure-level guards must evolve continuously to counter these threats.

How Can Sensitive Data Be Protected Automatically in AI Pipelines?

Automatic data protection applies PII detection and redaction at both the input and output layer — preventing sensitive information from reaching the model and blocking it from appearing in generated responses.

Without these controls, sensitive information leaks through model outputs, often without any visible signal that a violation has occurred.

Pre-processing Protection

  • Detect and mask credit cards, SSNs, and passport numbers
  • Redact emails and phone numbers
  • Tokenize personally identifiable information
  • Replace names with placeholders

Post-processing Validation

  • Scan generated text for leaked PII
  • Block responses containing sensitive patterns
  • Alert on potential exfiltration attempts
  • Maintain comprehensive audit logs

For APAC organizations, automatic PII protection is also a compliance requirement. Regulations such as Australia's Privacy Act 1988 and Japan's APPI mandate appropriate technical and organizational safeguards for personal information — with specific controls determined by each organization's risk assessment.Centralize These Controls

Best Practice #4: Extend Observability to Capture AI-Specific Traffic Patterns

Why Does Traditional APM Miss AI Security Signals?

Standard monitoring tools track latency, errors, and throughput — but AI workloads produce a different class of signal that these tools are not designed to capture.

Critical AI-specific signals that standard APM misses include:

  • Token consumption patterns indicating attacks
  • Abnormal prompt sequences suggesting reconnaissance
  • Agent concurrency anomalies revealing potential abuse
  • Semantic cache performance affecting costs
  • Model routing anomalies indicating hijacking

Security professionals increasingly recognize that agentic AI and autonomous systems present significant attack vectors. You can't defend against what you can't see.

What Metrics Should You Track for AI Observability?

AI observability requires a distinct set of metrics covering both security signals and operational health — standard infrastructure metrics alone are insufficient.

security-metrics-ai-microservices
operational-metrics-ai-microservices

How Does Observability Serve Both Security and Cost Governance?

Enhanced AI visibility simultaneously detects security threats and prevents runaway costs — the same signals that indicate an attack often indicate a budget problem.

Security Detection

  • Token spikes reveal data exfiltration attempts
  • Repeated prompts indicate attack reconnaissance
  • Unusual agent patterns suggest compromise
  • Service call patterns expose lateral movement

Cost Control

  • Identify expensive prompts for optimization
  • Detect runaway agents before budget drain
  • Track feature ROI with precision
  • Allocate costs to teams accurately

According to IBM's 2025 Cost of a Data Breach Report, the global average cost of a data breach fell 9% to $4.44 million — but US-specific costs rose 9% to a record $10.22 million, driven by increased regulatory fines. Faster identification and containment, enabled by better observability, directly impacts where your organization lands in that range.

Why Is End-to-End Tracing Critical for AI Request Flows?

Without end-to-end tracing, a single AI request that spans multiple services becomes impossible to debug — and during a breach, impossible to investigate before damage is done.

A typical AI request flow touches multiple layers:

  1. User submits prompt → API gateway
  2. Gateway validates and enriches the request
  3. Request routes to the appropriate LLM
  4. LLM triggers the RAG pipeline
  5. Multiple services provide context
  6. Response flows back through the gateway
  7. Post-processing and filtering applied
  8. Final response delivered

Without correlation IDs linking each step, tracing a single session across dozens of services becomes extremely difficult — potentially allowing attackers to complete data exfiltration before patterns emerge.

[Kong's AI-specific analytics via Konnect](https://konghq.com/products/kong-konnect/features/api-observability)Kong's AI-specific analytics via Konnect provide token usage dashboards, LLM performance metrics, anomaly detection alerts, cost allocation reports, and security incident correlation — all in a single control plane.

Best Practice #5: Secure RAG Pipelines and MCP Servers Centrally

How Do RAG and MCP Expand the AI Attack Surface?

RAG pipelines and MCP servers dramatically increase internal service communication — each introducing new endpoints, retrieval paths, and autonomous agent actions that fall outside traditional API security coverage.

RAG (Retrieval-Augmented Generation)

  • Each query triggers vector database searches
  • Multiple knowledge bases provide context
  • Embeddings retrieve sensitive documents
  • Results aggregate before LLM processing

MCP (Model Context Protocol)

  • Agents autonomously discover internal tools
  • Each tool represents an API endpoint
  • Agents chain multiple tools together
  • Actions execute without human oversight

Security research highlights the risk of malicious actors impersonating trusted tools or injecting poisoned prompt templates that silently alter agent behavior — a supply chain vulnerability that remains a significant challenge for many organizations.

How Should RAG Retrieval Paths Be Secured?

Every RAG endpoint requires the same authentication and authorization controls applied to external APIs — unauthenticated retrieval paths are an exploitable boundary between user input and sensitive document stores.

Authentication Requirements

  • Service accounts for vector database access
  • Short-lived OAuth2 tokens
  • Certificate-based authentication for internal calls
  • API key rotation policies

Authorization Controls

  • Row-level security on documents
  • User context propagation through the pipeline
  • Attribute-based access control (ABAC)
  • Cross-tenant isolation

Data Protection

  • Encryption at rest for embeddings
  • TLS 1.3 for retrieval communications
  • Comprehensive audit logging
  • Data lineage tracking

Why Do MCP Servers Need the Same Governance as APIs?

MCP tools are API endpoints — they expose internal capabilities to autonomous agents and must be governed with the same authentication, rate limiting, and audit controls applied to any external-facing API.

yaml

mcpPolicy:
  name: agent-tool-access
  tools:
    - name: check-inventory
      auth: jwt
      rate_limit: 100/minute
      allowed_agents: [customer-service]
    
    - name: process-refund
      auth: mtls
      rate_limit: 10/minute
      allowed_agents: [customer-service]
      audit: enhanced
    
    - name: send-notification
      auth: oauth2
      rate_limit: 1000/hour
      allowed_agents: [all]

"The more powerful your AI agent, the more disciplined your control plane needs to be."

How Does Kong Centralize RAG and MCP Security?

Kong AI Gateway secures both RAG pipelines and MCP servers through a single control plane — converting existing APIs into governed MCP tools and applying retrieval policies once across all AI applications.

Auto-generate secure MCP servers

  • Convert existing APIs to MCP tools
  • Inherit authentication and authorization
  • Apply rate limiting automatically
  • Enable comprehensive logging

Central RAG pipeline governance

yaml

ragPolicy:
  name: finance-kb-access
  vectorDb: kong-vectordb-sg
  allowedCollections:
    - quarterly-reports
    - product-disclosures
  maxTokens: 4096
  piiRedaction: enabled

Apply once. Reuse across all AI applications.

Conclusion: Extend, Don't Replace, Your Security Fundamentals

The AI revolution doesn't require abandoning proven security principles. It demands extending them to address new challenges.

As of mid-2025, these five practices represent the current best-practice consensus for AI-era microservices security. The threats are real and growing, with security research continuing to identify new prompt injection patterns and attack techniques that bypass existing defenses.

But so are the defenses. By implementing zero-trust architecture, centralizing policy enforcement, protecting against prompt injection, extending observability, and securing advanced AI patterns, organizations build resilient systems that capture AI's benefits while managing its risks.

Remember: AI spotlights where existing security is incomplete, not invalid. Organizations that recognize this distinction and act on it will thrive in the AI era.

Next Steps

Ready to secure your AI-augmented microservices at scale? Kong's unified API platform provides the comprehensive foundation you need.

[Request a Demo](https://konghq.com/contact-sales)Request a Demo to see Kong in action.

Explore our solutions:

  • [Kong Mesh](https://konghq.com/products/kong-mesh)Kong Mesh - Zero-trust across hybrid clouds
  • [Kong AI Gateway](https://konghq.com/products/kong-gateway)Kong AI Gateway - Centralized AI and API governance
  • [Documentation](https://docs.konghq.com)Documentation - Implementation guides and best practices

Frequently Asked Questions

What is zero-trust architecture and why does it matter for AI microservices? Zero-trust is a security model that requires every request, connection, and interaction to verify its identity and authorization — no implicit trust is granted based on network location. For AI microservices, this is critical because AI agents act as autonomous clients that make independent decisions, access multiple services rapidly, and are vulnerable to prompt injection. Without zero-trust, a single compromised agent can move laterally across your entire infrastructure.

What is prompt injection and why can't traditional security tools stop it? Prompt injection is an attack where malicious input manipulates an LLM into overriding its instructions, disclosing sensitive data, or executing unintended actions. OWASP ranks it as the top critical vulnerability for LLM applications in 2025. Traditional security tools operate at the protocol layer — they validate authentication and log requests, but cannot inspect the semantic content of a prompt, making these attacks invisible to standard tooling.

How is AI traffic different from regular API traffic? Functionally, it isn't — and that's the point. AI traffic is API communication and should be governed with the same controls: authentication, rate limiting, audit logging, and access control. The difference is that AI traffic introduces additional signals to monitor, such as token consumption, prompt patterns, and model routing behavior, and additional attack vectors like context poisoning and semantic manipulation.

What is mTLS and when should it be used in AI infrastructure? Mutual TLS (mTLS) is a protocol where both the client and server authenticate each other with certificates before exchanging data. Unlike standard TLS, which only authenticates the server, mTLS provides bidirectional identity verification. It should be used for all east-west (service-to-service) traffic in AI infrastructure — particularly between LLM endpoints, vector databases, and internal APIs.

What is a RAG pipeline and what are its security risks? Retrieval-Augmented Generation (RAG) is a pattern where an LLM queries external knowledge sources — typically vector databases — to supplement its responses with relevant context. Each retrieval introduces an unauthenticated boundary between user input and sensitive documents. Security risks include context poisoning (injecting false data into retrievals), unauthorized document access through insufficient authorization controls, and PII exposure through unredacted embeddings.

What is MCP and why does it need API-level governance? Model Context Protocol (MCP) is a standard that allows AI agents to autonomously discover and invoke internal tools. Each tool is an API endpoint exposed to an agent that can chain multiple tools together and execute actions without human oversight. Without governance, MCP tools become uncontrolled access points into internal systems — requiring the same authentication, rate limiting, and audit controls as any external-facing API.

Why is traditional APM insufficient for AI workloads? Standard application performance monitoring tracks latency, errors, and throughput. AI workloads generate a different class of signal: token consumption anomalies, abnormal prompt sequences, agent concurrency patterns, and model routing behavior. These signals indicate both security threats and runaway costs, but are invisible to tools not designed to capture them.

How should organizations handle PII in AI pipelines? PII protection must be applied at both the input and output layer. Pre-processing should detect and mask sensitive data — credit card numbers, SSNs, emails — before prompts reach the model. Post-processing should scan generated responses for leaked PII and block any output containing sensitive patterns. For organizations in APAC, this is also a regulatory requirement under frameworks such as Australia's Privacy Act 1988 and Japan's APPI.

What credentials should AI agents use, and how long should they be valid? Each AI agent should have a unique service account with a cryptographic identity and scoped permissions limited to its actual operational requirements. Credentials should be short-lived — 15 to 30 minutes maximum — and automatically rotate. This limits the blast radius of a compromised agent identity.

What does a unified control plane provide that separate security tools don't? A unified control plane enforces the same policies, logging, and rules engine across all traffic types from a single location. Separate tools create inconsistent enforcement, multiple dashboards with no shared visibility, and exploitable gaps between teams. A unified approach means the same authentication, rate limiting, and audit controls apply whether traffic is coming from a human, a microservice, or an AI agent.


References

  1. OWASP Foundation. (2025). OWASP Top 10 for LLM Applications 2025. [https://owasp.org/www-project-top-10-for-large-language-model-applications/](https://owasp.org/www-project-top-10-for-large-language-model-applications/)https://owasp.org/www-project-top-10-for-large-language-model-applications/
  2. IBM Corporation & Ponemon Institute. (2025). Cost of a Data Breach Report 2025. [https://www.ibm.com/reports/data-breach](https://www.ibm.com/reports/data-breach)https://www.ibm.com/reports/data-breach
  3. The Business Research Company. (2025). Microservices Architecture Global Market Report 2025. [https://www.giiresearch.com/report/tbrc1843907-microservices-architecture-global-market-report.html](https://www.giiresearch.com/report/tbrc1843907-microservices-architecture-global-market-report.html)https://www.giiresearch.com/report/tbrc1843907-microservices-architecture-global-market-report.html
  4. CheckPoint Software. (2025). OWASP Top 10 for LLM Applications 2025: Prompt Injection. [https://www.checkpoint.com/cyber-hub/what-is-llm-security/prompt-injection/](https://www.checkpoint.com/cyber-hub/what-is-llm-security/prompt-injection/)https://www.checkpoint.com/cyber-hub/what-is-llm-security/prompt-injection/
- [Microservices](/blog/tag/microservices)Microservices- [Agentic AI](/blog/tag/agentic-ai)Agentic AI- [AI Security](/blog/tag/ai-security)AI Security- [Enterprise AI](/blog/tag/enterprise-ai)Enterprise AI

## More on this topic

_eBooks_

## AI Projects in Regulated Sectors: Strategies & Insights

_Videos_

## From APIs to AI Agents: Building Real AI Workflows with Kong

## See Kong in action

Accelerate deployments, reduce vulnerabilities, and gain real-time visibility. 

[Get a Demo](/contact-sales)Get a Demo
**Topics**
- [Microservices](/blog/tag/microservices)Microservices- [Agentic AI](/blog/tag/agentic-ai)Agentic AI- [AI Security](/blog/tag/ai-security)AI Security- [Enterprise AI](/blog/tag/enterprise-ai)Enterprise AI
Kong

Recommended posts

# From Microservices to AI Traffic — Kong as the Unified Control Plane

[Enterprise](/blog)EnterpriseMarch 30, 2026

The Anatomy of Architectural Complexity Modern architectures now juggle three distinct traffic patterns. Each brings unique demands. Traditional approaches treat them separately. This separation creates unnecessary complexity. North-South API Traf

Kong
[](https://konghq.com/blog/enterprise/microservices-to-ai-traffic-kong-as-the-unified-control-plane)

# Governing Claude Code: How To Secure Agent Harness Rollouts with Kong AI Gateway

[Engineering](/blog)EngineeringMarch 7, 2026

Claude Code is Anthropic's agentic coding and agent harness tool. Unlike traditional code-completion assistants that suggest the next line in an editor, Claude Code operates as an autonomous agent that reads entire codebases, edits files across mult

Alex Drag
[](https://konghq.com/blog/engineering/claude-code-governance-with-an-ai-gateway)

# Agentic AI Governance: Managing Shadow AI and Risk for Competitive Advantage

[Enterprise](/blog)EnterpriseJanuary 30, 2026

Why Risk Management Will Separate Agentic AI Winners from Agentic AI Casualties Let's be honest about what's happening inside most enterprises right now. Development teams are under intense pressure to ship AI features. The mandate from leadership

Alex Drag
[](https://konghq.com/blog/enterprise/agentic-ai-governance-managing-shadow-ai-risk)

# Model Context Protocol (MCP) Security: How to Restrict Tool Access Using AI Gateways

[Engineering](/blog)EngineeringFebruary 3, 2026

MCP servers expose all tools by default. There are two problems with this: security (agents get capabilities they shouldn't have) and performance (too many tools degrade LLM tool selection). The solution? Put a gateway between agents and MCP server

Deepak Grewal
[](https://konghq.com/blog/engineering/mcp-tool-governance-security-meets-context-efficiency)

# Managing the Chaos: How AI Gateways Enable Scalable AI Connectivity

[Enterprise](/blog)EnterpriseMarch 16, 2026

Executive Summary AI adoption has moved past the "honeymoon phase" and into the "operational chaos" phase. As enterprises juggle multiple LLM providers, skyrocketing token costs, and "Shadow AI" usage, the need for a centralized control plane has be

Kong
[](https://konghq.com/blog/enterprise/ai-gateways-for-scalable-ai-connectivity)

# Securing Enterprise AI: OWASP Top 10 LLM Vulnerabilities Guide

[Engineering](/blog)EngineeringJuly 31, 2025

Introduction to OWASP Top 10 for LLM Applications 2025 The OWASP Top 10 for LLM Applications 2025 represents a significant evolution in AI security guidance, reflecting the rapid maturation of enterprise AI deployments over the past year. The key up

Michael Field
[](https://konghq.com/blog/engineering/owasp-top-10-ai-and-llm-guide)

# Beyond Static Routing: Modernizing API Logic with Conditional Policy Execution

[Engineering](/blog)EngineeringApril 15, 2026

Imagine you have a single Service, order-api . You want to apply a strict rate limit to most traffic, but you want to bypass that limit—or apply a different one—if the request contains a specific X-App-Priority: High header. Previously, you had t

Hugo Guerrero
[](https://konghq.com/blog/engineering/conditional-policy-execution)

## Ready to see Kong in action?

Get a personalized walkthrough of Kong's platform tailored to your architecture, use cases, and scale requirements.

[Get a Demo](/contact-sales)Get a Demo

## step-0

  • ## Company

    • [About Kong](/company/about-us)About Kong
    • [Customers](/customer-stories)Customers
    • [Careers](/company/careers)Careers
    • [Press](/company/press-room)Press
    • [Events](/events)Events
    • [Contact](/company/contact-us)Contact
    • [Pricing](/pricing)Pricing
      • Terms
      • Privacy
      • Trust and Compliance
  • ## Platform

    • [Kong AI Gateway](/products/kong-ai-gateway)Kong AI Gateway
    • [Kong Konnect](/products/kong-konnect)Kong Konnect
    • [Kong Gateway](/products/kong-gateway)Kong Gateway
    • [Kong Event Gateway](/products/event-gateway)Kong Event Gateway
    • [Kong Insomnia](/products/kong-insomnia)Kong Insomnia
    • [Documentation](https://developer.konghq.com)Documentation
    • [Book Demo](/contact-sales)Book Demo
  • ## Compare

    • [AI Gateway Alternatives](/performance-comparison/ai-gateway-alternatives)AI Gateway Alternatives
    • [Kong vs Apigee](/performance-comparison/kong-vs-apigee)Kong vs Apigee
    • [Kong vs IBM](/performance-comparison/ibm-api-connect-vs-kong)Kong vs IBM
    • [Kong vs Postman](/performance-comparison/kong-vs-postman)Kong vs Postman
    • [Kong vs Mulesoft](/performance-comparison/kong-vs-mulesoft)Kong vs Mulesoft
  • ## Explore More

    • [Open Banking API Solutions](/solutions/open-banking)Open Banking API Solutions
    • [API Governance Solutions](/solutions/api-governance)API Governance Solutions
    • [Istio API Gateway Integration](/solutions/istio-gateway)Istio API Gateway Integration
    • [Kubernetes API Management](/solutions/build-on-kubernetes)Kubernetes API Management
    • [API Gateway: Build vs Buy](/campaign/secure-api-scalability)API Gateway: Build vs Buy
    • [Kong vs Apigee](/performance-comparison/kong-vs-apigee)Kong vs Apigee
  • ## Open Source

    • [Kong Gateway](https://developer.konghq.com/gateway/install/)Kong Gateway
    • [Kuma](https://kuma.io/)Kuma
    • [Insomnia](https://insomnia.rest/)Insomnia
    • [Kong Community](/community)Kong Community

Kong enables the connectivity layer for the agentic era – securely connecting, governing, and monetizing APIs and AI tokens across any model or cloud.

  • English
  • Japanese
  • Frenchcoming soon
  • Spanishcoming soon
  • Germancoming soon
© Kong Inc. 2026
Interaction mode