DUPLICATE

Monitoring the health of your production system involves keeping track of various data points in real time in order to derive insights from them. Day to day, monitoring can provide early indications of problems, giving the team time to investigate and fix before a system fails completely. If youre running tests in production, such as canary releases or blue-green deployments, a monitoring tool is essential for measuring the impact of the release so you can decide whether to roll back or roll out.
Most monitoring solutions provide a dashboard so you can see the state of your system and configurable alerts to notify you when values pass a specified threshold. When setting alerts, keep in mind the signal-to-noise ratio; if a threshold is set too low, teams soon learn to ignore them and will miss the important ones. Over time, you may get to know the signs that mean something is likely to fail, and you can adjust the thresholds accordingly to give you time to take pre-emptive action.
In a microservices system, traffic is typically funneled through an API gateway which provides access to the upstream services. This makes the gateway a good vantage point for integrating a monitoring solution. Kongs API gateway supports a number of plugins for monitoring the health of your system with options to store the data yourself or send it to a hosted service. Kong Enterprise includes Kong Vitals to monitor metrics for upstream microservices, such as request counts and status codes.
While monitoring collects metrics and aggregates data over time to show trends, distributed tracing focuses on a single operation. Although originally developed as a way of measuring and improving performance, distributed tracing is also valuable when debugging in a microservice-based system.
Unlike a monolithic application where developers can use a stack trace to understand the context of an error, with microservices, a single request can spawn multiple requests to individual services hosted on different machines. If a failure occurs at some point in the system, every interaction between that service and other parts of the system is a possible cause, and looking at the individual log files won't give you the full context. Likewise, any increase in latency could be attributable to any or even several components. In order to debug the issue, you need to be able to piece together the chain of requests that led to it. Distributed tracing makes this possible.
With distributed tracing, an identifier is applied to every request coming into the system and propagated to each request sent to the individual microservices as well as any child requests that they generate. To provide a complete trace, the identifier should also be included in the response sent to the client. This means that if a user reports an error, the identifier in the response they received can be used to replay the entire transaction, locate the relevant logs and identify the root cause.
It's best to implement distributed tracing early on, as it requires instrumentation of all the code in your system. Missing a span (i.e., any one of the individual segments in a transaction) will create a blind spot in your trace, which can make it harder to get to the bottom of a problem. With Kong's API gateway, you can enable distributed tracing with the Zipkin plugin, which adds identifiers to requests and reports data to a Zipkin server.
One of the advantages of microservices over a monolithic architecture is the ability to deploy changes quickly. By identifying issues in production proactively and being able to debug the cause of the problem quickly, you can react and roll out fixes (or roll back changes) promptly and minimize the financial or reputational damage from a failure. Having identified the cause of the problem and deployed a fix, the next step is to add an automated test to prevent a similar issue occurring in the future. Creating a system of continuous feedback and improvement helps to build a more robust product and increases confidence in your system, without slowing down the development and deployment process. That in turn means that teams can continue to innovate and respond to evolving user needs, ensuring your product remains valuable to your users.


On December 23, 2024, the security research team at CloudSEK completed a year-long investigation of the cloud-based API testing tool Postman. CloudSEK’s findings revealed that more than 30,000 publicly accessible Postman workspaces had been leakin

In this blog post, we'll talk about the significant challenge of managing and governing a growing number of APIs across multiple teams in an organization — and how Control Plane Groups are a clear solution to avoid the chaos of inconsistent policies

We're very excited to announce Kong Mesh 2.12 to the world! Kong Mesh 2.12 delivers two very important features: SPIFFE / SPIRE support, which provides enterprise-class workload identity and trust models for your mesh, as well as a consistent Kuma R

While many organizations mistakenly believe a single tool can solve all their API security woes, the truth is far more complex. This blog post will dismantle the myth of the "silver bullet" and demonstrate how a comprehensive, defense-in-depth strat

Picture this: It's 3 AM, and your phone erupts with alerts. Within minutes, you're drowning in a tsunami of notifications—hundreds of them—while your company's critical services hang by a thread. Your monitoring dashboard looks like a Christmas tree
Get a personalized walkthrough of Kong's platform tailored to your architecture, use cases, and scale requirements.