Starting with the 3.3 release, Kong Gateway includes a new implementation of the internal queues that are used by several plugins to decouple the production of data in the proxy path and its submission to a receiving server, such as a log server. We'll walk you through why queues are needed, why a new implementation was required, how the new implementation works and how queues are configured.
Why Are Queues Needed?
Some plugins like http-log and Datadog need to send information about the request that was processed by Kong Gateway to another server – called the upstream server throughout this post. Typically, this sending is done in the log phase of the plugin to avoid introducing any latency to request or response. Directly sending the data from the log handler to the upstream server on every request is not desirable, as that creates a large number of concurrent upstream requests if Kong is under high load.
The solution is to introduce a queue into which items to be sent are placed. One process is then started, taking entries off the queue and sending them to the upstream server. This approach not only has the advantage of reducing the possible concurrency on the upstream server, it also helps to deal with temporary outages due to network issues or administrative changes being made.
Furthermore, queue entries can be grouped into batches, allowing treatment of the whole batch. This allows sending them to the upstream server in one request, potentially reducing resource usage both in Kong Gateway and in the upstream server.
Queues have two ends. On the producing end, entries are put onto the queue, and on the consuming end, they are taken off the queue and processed. Typically, entries are put onto the queue by a plugin callback function executing in the log phase of processing a request. This happens in the context of an individual request handled by Kong Gateway and is referred to as the "consumer side" throughout this article. The actual handling and processing of the queued data is performed by a callback function provided by the plugin. It runs in the context of a timer that is started by the queue library, and referred to as the "consumer side".
Limitations of the Previous Queue Implementation
An early form of queueing was introduced into the http-log plugin in version 1.0.0 of Kong Gateway in 2018, and a revised implementation was provided in Version 1.1.2 in 2019. Both implementations suffered from a couple of limitations:
Queues were unbounded, so it was possible that a worker's memory was completely exhausted if an upstream server went out of service and entries kept piling up in the queue.
When Kong Gateway was gracefully shut down with unsent queued items being present, these items would be discarded.
There was very little observability into the state of internal queues and their activity.
We found that addressing these issues in the existing queue implementation would be difficult, so a completely new queue design was developed. One major issue with the existing implementation was that it used several timers to deal with substates of queue consumption. The number of timers that were created was high, creating performance and resource consumption issues inside of the Gateway and making it difficult to handle graceful shutdown in a controlled manner.
Furthermore, the previous implementation used producer-side logic of multiple queue entries into one batch that could then be sent to the upstream server in one unit, reducing the number of requests that need to be made. This meant that in each plugin instance, there would be a local buffer containing the current batch of entries, and a timer that'd eventually expire and cause the current batch to be put onto the queue and the next one to be started. This approach had a couple of downsides:
Batches always contained just the entries collected by one plugin instance, giving up on the possibility to group entries across all instances configured to send to the same upstream server.
Each plugin instance needed a separate timer in order to control the batching process, which could create a substantial increase in timer usage in large installations.
Due to the buffering in each plugin instance, flushing all buffered data in the event of a graceful shutdown would have been complicated.
As the queue contains batches of entries, limiting the number of entries that could be waiting on the queue would be more involved than determining the length of the queue itself.
Due to these reasons, we decided that it would be better to perform the batching of entries on the consumer side of the queue.
How the New Queue Implementation Works
The new queue implementation significantly improves upon the old one in a number of ways, and adds some new features. It should be noted that these queues serve as structural elements for components within Kong Gateway, not as a generalized, application level queueing implementation. In particular, queues only live in the main memory of each worker process and are not shared between workers. Also, as they are main memory based, queued content is not preserved under abnormal operational situations like loss of power or unexpected worker process shutdown due to memory shortage or program errors.
Queue Capacity Limit
The old queue implementation initially did not have any capacity limits, so it was possible that a worker process exhausted all available memory if data was produced fast enough and the upstream server would be slow or unavailable. A temporary fix to that issue was eventually released that put a hard-coded limit on the number of batches that could be sitting on a queue at any given time, but it was difficult to configure precisely and not documented beyond its existence.
In the new implementation, the maximum number of entries that can be queued at any time can be controlled by the max_entries queue parameter. When a queue has this many entries queued and another entry is enqueued by the producer, the oldest entry on the queue is deleted to make space for the new entry.
The queue code provides warning log entries when it reaches a capacity threshold of 80% and when it has started to delete entries from the queue. Log entries are also written to indicate that the situation has normalized. We plan to improve the observability of queue operations in an upcoming effort to generalize log, metrics and trace reporting across all of Kong Gateway.
Each plugin puts each entry to be sent onto the queue, and the consumer code creates a batch of up to the defined maximum size from the queue and sends it to the upstream server. The timing of batch creation is moved to the consuming side, removing the need to create additional timers on the producer (plugin) side. Queue flushing in the event of a graceful shutdown is straightforward to implement by periodically checking for the shutdown event on the consumer side and stopping to wait for more entries to be batched when a shutdown was initiated. Finally, queue entry capacity calculations don't require introspection of queue entries as the number of entries of the queue is what needs to be and can now be limited. Note that string capacity limits still require tracking of the queue contents, but only on one level.
Reduced Timer Usage
As explained, timers were used in the old queue implementation both on the producer side, in order to control batch creation, as well as on the consumer side, to control retries. For each retry, an additional timer would be created. All these timers could add up to significant numbers, and because new timers cannot be created while Kong is shutting down, flushing queues in that event would not be possible using the normal processing logic. For that reason, and to make the overall logic of queue processing easier to understand, the consumer code was rewritten using semaphores to facilitate the communication between producers and the consumer and to control retries.
Only one timer per queue is now used to start queue processing in the background. Once the queue is empty, the timer handler terminates and a new timer is created as soon as a new entry is pushed onto the queue.
Flush on Worker Shutdown
With the architectural change to use only one timer per queue, handling graceful shutdown became straightforward. Effectively, if anything needs to be flushed when a shutdown is initiated, the timer handler function for the queue will already be running.
Retry Logic and Parametrization
The old implementation of queues provided a retry mechanism, but it was not easy to control how retries were attempted. In the new implementation, we decided to overhaul both the algorithm and its parameterization to make tuning it easier and more transparent.
When a batch fails to process, the queue library can automatically retry, under the assumption that the failure is temporary (i.e. due to network problems or upstream unavailability). Before retrying, the library waits a certain amount of time. The wait time for the first retry is determined by the initial_retry_delay queue parameter. After each subsequent failed retry of the batch, the wait time is doubled, providing an exponential back off pattern for retries. The maximum amount of time to wait between retries can be determined by the max_retry_delay queue parameter. Finally, the max_retry_time parameter can be set to determine for how long a failed batch should be retried overall.
When reimplementing the queue library, it was determined that it would be best if queue related parameters for all plugins were unified into a common schema. This differs from the earlier configuration method for these parameters, where they were set alongside other plugin parameters at the same level. The unified queue parameter set consists of the following parameters:
Configuring the Batching Behavior
The batching behavior of queues is controlled by the max_batch_size and max_coalescing_delay parameters. max_batch_size defines the maximum number of entries that can be put into one batch. It must be small enough so that the resulting chunk of data can actually be handled by the upstream server. It is important to assess this parameter carefully as, depending on how max_coalescing_delay is set, the actual batches sent under normal operating conditions can be much smaller. Only in abnormal situations the number of entries will be reaching max_batch_size. Inadequate configuration and testing of this parameter may lead to amplified issues during high load scenarios.
When max_batch_size is set to 1, each entry that is put onto the queue is sent immediately to the upstream server. In this configuration, the upstream server must, on average, be able to process as many requests per second as Kong Gateway processes requests using the plugin sending to that upstream. Plugin queues can only provide for buffering in transient failure or overload situations.
The max_coalescing_delay parameter determines how long the queuing system will wait for more entries to arrive on the queue before building a batch and sending it to the upstream. It should be set so that the number of requests made to the upstream server stays within acceptable limits while not accumulating too much sent data in the worker process. The latter is of particular importance when graceful shutdown is a consideration, as queues will need to be flushed within the shutdown grace period before the worker process is forcefully shut down.
Configuring Capacity Limits
The capacity of all queues can be configured through the max_entries queue parameter. It defines the number of entries that can at most be waiting on a queue at any given point in time. As many plugins use higher level data structures as queue entries, it is generally not possible to convert between a certain number of entries and actual main memory usage. For that reason, it is important to back any assumptions about sensible parameter values by experiments that involve a configuration close to the production configuration. For low to medium load situations, the default parameters should provide for some headroom to ride over transient outages.
Also note that the capacity parameters are per queue, and the number of queues in the system also need to be considered when tuning. Generally, every plugin instance uses a separate queue. The exception is the http-log plugin, which uses one queue per upstream server, effectively sharing the queue between all plugin instances that send their logs to the same upstream server. Note that queues are always local to an nginx worker process, so the total number of queues in a system equals to the number of per per worker multiplied by the number of workers configured.
In many cases,you will want to set the per-queue limit so that the overall memory usage of Kong Gateway will stay within the system’s limits facing an outage of a small number, but not all upstreams of all queues. It is also advisable to perform automatic on-line log analysis of Kong Gateway’s logs and trigger an operational alarm when the queue 80% capacity limit is reached. It is planned to extend Kong Gateway’s observability mechanisms, which will eventually make it possible to directly monitor queue utilization by the way of metrics that are published by Kong Gateway.
The default retry behavior of queues should be suitable for a number of situations. Some users may want to implement different retry policies, and some are exemplified here:
If no retries are desired, the max_retry_time queue parameter should be set to zero. This will cause batches that fail not to be retried. A log message will be provided.
No exponential back off
If no exponential back off between retries is desired (see “Retry logic and parametrization”), the initial_retry_delay and max_retry_delay should both be set to the desired delay between retries. The absolute number of maximum retries can then be set by setting max_retry_time to the max_retry_delay multiplied by the maximum number of retries that should be made for a batch.
Migrating Plugin Parameters
By default, Kong Gateway will automatically convert the queueing related parameters used in previous releases to the new queue parameters, when possible. The exception is the old max_retries parameter, which limited the number of retries to an absolute value. This parameter is not carried over and users must reconfigure their plugins using the new parameters as outlined above.
The following conversions are automatically made for the plugins that used queues in Kong Gateway:
These conversions are automatically performed at runtime when Kong Gateway finds the legacy parameter to be set to a value different from the old default value. Thus, if a user did not make manual adjustments to a parameter, the defaults for the new parameters will be applied.
When a user configures one of the plugins using a legacy parameter, Kong Gateway will log a deprecation message.
This article provided an overview of the design of the new queueing library introduced in Kong Gateway 3.3. You should now be familiar with how queues work and how they can be tuned to specific load situations. As of release 3.3, queues are used by the Datadog, HTTP Log, OpenTelemetry, StatsD and Zipkin Plugins. We expect them to be incorporated into more plugins as the need arises.