How to Track Service Level Objectives with Kong and OpenTelemetry
In this blog post, we will explore how organizations can leverage Kong and OpenTelemetry to establish and monitor Service Level Objectives (SLOs) and manage error budgets more effectively. By tracking performance metrics and error rates against predefined thresholds, teams can prioritize their work based on the impact on user experience and business objectives. This approach helps optimize the balance between innovation and reliability.
To achieve our goal of tracking SLOs, we will utilize Kong Gateway in conjunction with the OpenTelemetry plugin, OTEL collector, and Dynatrace.
What is an SLA, SLI, and SLO?
Before diving in, it's essential to understand the definitions of SLA, SLI, and SLO, and why they are important to grasp before we delve into the practical implementation of "Tracking SLOs using Kong and OpenTelemetry."
Service level agreement (SLA)
Often, you might hear questions like, "What is the SLA to resolve this ticket? Should it be resolved within an hour, a day, or a week?" The answers depend on the severity of the ticket and the resolution time defined in the SLA document. An SLA is a formal agreement between a service provider and a customer that outlines the expected level of service. In our context, we will define SLAs for API/service availability and error rates.
Service level indicator (SLI)
When implementing services using APIs, we measure their performance using metrics such as latency, throughput, and error rates to ensure the application system meets customer expectations. An SLI is a quantitative metric used to measure the performance of a service. SLIs can be expressed as rates, averages, percentiles, etc.
Service level objective (SLO)
Once we define the SLIs, we need to set target values (thresholds) or ranges of target values that represent the desired level of service expected by the customer. For example, the number of errors in a month should be less than 1%. We will explore detailed examples in the following sections.
Understanding components required to track SLO
To track SLO using Kong and OpenTelemetry, we need to understand the components involved. Below is an overview of the end-to-end flow:
- Design overview: The design illustrates the flow of SLO tracking using Kong, OpenTelemetry, and Dynatrace. You can use any supported OpenTelemetry (OTel) collector and backend servers to collect and capture traces, logs, and metrics. Kong provides excellent documentation on supported OpenTelemetry collectors and backends.
data:image/s3,"s3://crabby-images/d5bc8/d5bc82598eed528f2527f446f6c0f8092ea211ad" alt=""
- API exposure through Kong: APIs are exposed through the Kong Enterprise gateway, and we use the OpenTelemetry protocol to generate logs, traces, and metrics for our APIs. To achieve this, we need to add Kong's either globally or locally (specific to a route or service). In the plugin configuration, we specify the endpoints for logs and traces receivers. Don't worry, I will show you the practical steps in the next section.
- OTel Collector components: The OTel Collector consists of three main parts: receiver, processor, and exporter. The receiver collects logs, traces, and metrics from the Kong OpenTelemetry plugin. The processor performs necessary transformations, such as generating metrics from traces. Finally, the exporter sends the logs, traces, and/or metrics to the OTel backend server. In our case, we are using the Dynatrace collector. To enable metrics, we need to use the supported contrib versions of the collector.
- Visualization and monitoring with Dynatrace: To visualize and monitor the logs, traces, and performance metrics of our APIs, we will install Dynatrace. We can monitor logs and traces using Dynatrace's Logs and Distributed Traces applications. Additionally, we can set up email alerts, for example, if the error count reaches a defined threshold.
Enough about the theoretical part — now it's time to perform some practical steps and see everything in action. Are you ready?
Setting up SLO tracking platform using Kong, OpenTelemetry, and Dynatrace
We've already discussed the importance and functionality of each component and tool required to track SLOs using Kong and OpenTelemetry. Now, let's get hands-on with the practical steps.
Define threshold for SLO
First things first, we need to define the thresholds (SLOs) for the APIs. These definitions should align with customer expectations and organizational SLAs. For simplicity, I've defined two SLOs:
data:image/s3,"s3://crabby-images/022d9/022d9b517212c73083675a74fa5e67af1d8e66f8" alt=""
- Request Latency: We have set a target value 600ms for request latency per API. This means we expect the processing time for each API in our application to be no more than 600 milliseconds.
- Availability: We have defined a 99% availability metric per hour for our application. This means that, on average, all APIs should not be unavailable for more than 1%. For example, if our system receives 100 API requests in an hour and only 1 request fails, this represents a 1% failure rate, which is acceptable according to our Availability SLO.
Now, let's proceed with the installation. The diagram below shows the installation sequence.
data:image/s3,"s3://crabby-images/f6418/f641848c4197832cbe6be34bb4a24290968f9465" alt=""
Install Dynatrace
We assume you already have Dynatrace installed. If not, you can install Dynatrace on your server or create an account on the SaaS version of Dynatrace.
Since the installation of Dynatrace is not our main focus, I will skip the detailed installation steps. However, you can follow these steps to set up your SaaS Dynatrace account:
- Visit the Dynatrace website
- Start your free trial: Click on the "Start free trial" button. You will be prompted to fill out a registration form with your details.
- Fill out the registration form: Provide the necessary information, such as your name, email address, company name, and other relevant details.
- Verify your email: After submitting the form, you will receive a verification email. Click on the link in the email to verify your account.
- Set up your environment: Once your account is verified, you can log in to the Dynatrace platform and start setting up your monitoring environment.
After installing Dynatrace, you should have Dynatrace Account with URL https://ENV_ID.apps.dynatrace.com/. Here, ENV_ID
is your Dynatrace environment ID
Install Dynatrace OTel Collector
I am using an Ubuntu-based EC2 instance and Docker compose to install the Dynatrace OTel Collector. Follow the steps below to install the appropriate Dynatrace OTel Collector:
docker-compose.yml
otel-collector-config.yaml
Run the following command to start the Dynatrace OTel Collector:
You should see output like this:
Verify the running container:
You should see output like this:
For detailed documentation on how to install the Dynatrace collector, refer to the Dynatrace Collector Official Documentation.
Note: Ensure that port 4318 is accessible from the Kong Gateway server. In this case, I modified the EC2 Security Group to open port 4318 for inbound traffic.
Configure Kong Gateway
Assuming you already have Kong Gateway installed, you need to modify the kong.conf file with the following parameters and redeploy the Kong Gateway. This enables tracing in Kong Gateway and should be done before using the OpenTelemetry plugin.
Save and exit the file:
Restart Kong:
To capture debug-level logs, you can also enable:
If you have made all the changes correctly, great job! Now it's time to add the OpenTelemetry plugin to a route, service, or at the global level.
Configure Route/Service with OpenTelemetry Plugin
Add the OpenTelemetry plugin to your route or service using Kong Manager or the Admin API. Here is the Admin API command to do so:
Ensure that the logs and traces endpoints are OpenTelemetry collector endpoints and not the direct endpoints of Dynatrace.
Tracking performance and error count as per SLO
I configured the OpenTelemetry plugin for my "HealthService" service and triggered a few requests through Postman. Now, it's time to check if the traces are being captured in the Dynatrace "Distributed Traces" application service, as shown below.
data:image/s3,"s3://crabby-images/d2bc4/d2bc4e522ea9e262367b4a2afefe50634a317523" alt=""
data:image/s3,"s3://crabby-images/78c44/78c44c1c0f8908a1bf30ff40730db20f8a1bbb39" alt=""
- By monitoring the performance of the above APIs, we noticed that three API requests had processing times greater than 600ms. This means these service requests are not reliable according to our defined Request Latency SLO.
- By monitoring the logs of the Kong Gateway through Dynatrace's "Logs" application service, as shown below, we noticed several 404 error messages. This clearly indicates that these service requests are not reliable according to our defined Availability SLO.
data:image/s3,"s3://crabby-images/46e34/46e3475f9f2ede9cec506c7460850200dbd6a317" alt=""
data:image/s3,"s3://crabby-images/4b1ed/4b1ed202dad30ba171eb5e0a7dad0ab2ecf9fa97" alt=""
- Based on the above monitoring, relevant teams need to prioritize API performance improvement work as soon as possible.
- We can also use Dynatrace's "Service Level Objective Definitions" service to automate SLO monitoring, or we can use any other supported tools which your organization prefers.
Conclusion
Let's summarize what we have achieved:
- Understanding SLA, SLI, and SLO: We began by learning the definitions and importance of Service Level Agreements (SLA), Service Level Indicators (SLI), and Service Level Objectives (SLO).
- Components required to track SLOs: We explored the necessary components for tracking SLOs, including Kong, OpenTelemetry, and Dynatrace.
- Defining SLOs: We defined target values for Request Latency and Availability.
- Installation and configuration: We installed Dynatrace, set up the OpenTelemetry Collector, and modified the Kong configuration to export logs, traces, and metrics to Dynatrace.
- Tracking and monitoring: Finally, we tracked, visualized, and monitored the defined service performance according to the SLOs.
In summary, organizations can effectively use Kong Gateway with the OpenTelemetry plugin and monitoring and alerting tools to track and manage SLOs, ensuring a balance between innovation and reliability.
Power your APIs with Kong Gateway
data:image/s3,"s3://crabby-images/5c958/5c9589a124c76160b0063b2de1670960ab67b679" alt=""