Observability is a critical part of Kong's API Gateway. In this post, we'll describe two options to monitor Kong Gateway using Prometheus.
Prometheus is an open source system monitor toolkit built at SoundCloud that is now widely adopted. StatsD was originally a simple daemon developed by Etsy to aggregate and summarize application metrics. Prometheus provides a StatsD exporter to collect metrics that are sent in StatsD format.
Kong Gateway supports both of the above for integrating with Prometheus. This enables Prometheus to pull metrics directly from the gateway or using a StatsD exporter in between to offload some work from the gateway.
Kong Gateway is built on top of OpenResty/Nginx, which is a multi-process single-threaded architecture. To collect and aggregate metrics from different processes, we implement the Prometheus plugin with shared memory.
Nginx handles requests in a non-blocking way and is normally very efficient. However, every read and write operation to the shared memory requires a mutex lock to lock the critical section and will block all worker processes from processing requests. When the plugin is used to monitor metrics with high cardinality, it can affect Kong Gateway performance significantly, especially increasing the long tail latencies. We recommend using the StatsD plugin as an alternative for such use cases.
We’ll explain how to use these two plugins in the following sections.

Prometheus
As the investigation progressed, we found that the Prometheus plugin collects metrics with some expensive function calls because it stores many high cardinality metrics in Nginx's shared memory. Therefore when the Prometheus service performed its periodic pull for the metrics, it triggered high overhead in Nginx and affected the real request latency. (See the issues on GitHub to get more information).
So unlike the old release version, in Kong Gateway 3.0 the Prometheus plugin doesn't export status codes, latencies, bandwidth, and upstream health check metrics by default to avoid the costly overhead of collecting metrics.
Because these metrics will need to be added up or reset during the life of each connection — and these metrics have some different labels — they take up a lot of memory space, which needs to be traversed when Prometheus polls for metrics leading to higher latencies. They can still be turned on manually by setting config status_code_metrics, lantency_metrics, bandwidth_metrics, and upstream_health_metrics respectively. Enabling those metrics will impact the performance if you have a lot of Kong entities, therefore we recommend using the StatsD plugin with the push model for those cases.
In a previous version of Kong Gateway, we found some performance issues with the Prometheus plugin. For example, in the production environment of one of our enterprise customers, they found that the request from Prometheus to pull metrics caused sporadic spikes in latencies for other requests — sometimes as much as three seconds.