Rate limiting in distributed systems
Synchronization Policies
If you want to enforce a global rate limit when using a cluster of multiple nodes, you must set up a policy to enforce it. If each node were to track its rate limit, a consumer could exceed a global rate limit when sending requests to different nodes. The greater the number of nodes, the more likely the user will exceed the global limit.
The simplest way to enforce the limit is to set up sticky sessions in your load balancer so that each consumer gets sent to exactly one node. The disadvantages include a lack of fault tolerance and scaling problems when nodes get overloaded.
A better solution that allows more flexible load-balancing rules is to use a centralized data store such as Redis or Cassandra. A centralized data store will collect the counts for each window and consumer. The two main problems with this approach are increased latency making requests to the data store and race conditions, which we will discuss next.

Race Conditions
One of the most extensive problems with a centralized data store is the potential for race conditions in high concurrency request patterns. This issue happens when you use a naïve "get-then-set" approach, wherein you retrieve the current rate limit counter, increment it, and then push it back to the datastore. This model's problem is that additional requests can come through in the time it takes to perform a full cycle of read-increment-store, each attempting to store the increment counter with an invalid (lower) counter value. This allows a consumer to send a very high rate of requests to bypass rate limiting controls.

One way to avoid this problem is to put a "lock" around the key in question, preventing any other processes from accessing or writing to the counter. A lock would quickly become a significant performance bottleneck and does not scale well, mainly when using remote servers like Redis as the backing datastore.
A better approach is to use a "set-then-get" mindset, relying on atomic operators that implement locks in a very performant fashion, allowing you to quickly increment and check counter values without letting the atomic operations get in the way.
The increased latency is another disadvantage of using a centralized data store when checking the rate limit counters. Unfortunately, even checking a fast data store like Redis would result in milliseconds of additional latency for every request.
Make checks locally in memory to make these rate limit determinations with minimal latency. To make local checks, relax the rate check conditions and use an eventually consistent model. For example, each node can create a data sync cycle that will synchronize with the centralized data store. Each node periodically pushes a counter increment for each consumer and window to the datastore. These pushes atomically update the values. The node can then retrieve the updated values to update its in-memory version. This cycle of converge → diverge → reconverge among nodes in the cluster is eventually consistent.

The periodic rate at which nodes converge should be configurable. Shorter sync intervals will result in less divergence of data points when spreading traffic across multiple nodes in the cluster (e.g., when sitting behind a round robin balancer). Whereas longer sync intervals put less read/write pressure on the datastore and less overhead on each node to fetch new synced values.
Quickly set up rate limiting with Kong API Gateway
Kong is an open source API gateway that makes it very easy to build scalable services with rate limiting. It scales perfectly from single Kong Gateway nodes to massive, globe-spanning Kong clusters.
Kong Gateway sits in front of your APIs and is the main entry point to your upstream APIs. While processing the request and the response, Kong Gateway will execute any plugin that you have decided to add to the API.

Kong API Gateway's rate limiting plug-in is highly configurable. It:
- Offers flexibility to define multiple rate limit windows and rates for each API and consumer
- Includes support for local memory, Redis, Postgres, and Cassandra backing datastores
- Offers a variety of data synchronization options, including synchronous and eventually consistent models
You can quickly try Kong Gateway on one of your dev machines to test it out. My favorite way to get started is to use the AWS cloud formation template since I get a pre-configured dev machine in just a few clicks. Just choose one of the HVM options, and set your instance sizes to use t2.micro as these are affordable for testing. Then ssh into a command line on your new instance for the next step.
Adding an API on Kong Gateway
The next step is adding an API on Kong Gateway using the admin API, which can be done by setting up a Route and a Service entity. We will use httpbin, a free testing service for APIs, as our example upstream service,. The get URL will mirror back my request data as JSON. We also assume Kong Gateway is running on the local system at the default ports.
Now Kong Gateway is aware that every request sent to "/test" should be proxied to httpbin. We can make the following request to Kong Gateway on its proxy port to test it:
It's alive! Kong Gateway received the request and proxied it to httpbin, which has mirrored back the headers for my request and my origin IP address.
Adding Basic Rate-Limiting
Let's go ahead and protect it from an excessive number of requests by adding the rate-limiting functionality using the community edition Rate-Limiting plugin, with a limit of 5 requests per minute from every consumer:
If we now make more than five requests, Kong Gateway will respond with the following error message:
Looking good! We have added an API on Kong Gateway, and we added rate-limiting in just two HTTP requests to the admin API.
It defaults to rate limiting by IP address using fixed windows and synchronizes across all nodes in your cluster using your default datastore. For other options, including rate limiting per consumer or using another datastore like Redis, please see the documentation.
For a more detailed tutorial, check out Protecting Services With Kong Gateway Rate Limiting >>
The Advanced Rate Limiting plugin adds support for the sliding window algorithm for better control and performance. The sliding window prevents your API from being overloaded near window boundaries, as explained in the sections above. For low latency, it uses an in-memory table of the counters and can synchronize across the cluster using asynchronous or synchronous updates. This gives the latency of local thresholds and is scalable across your entire cluster.
The first step is start a free trial of Konnect. You can then configure the rate limit, the window size in seconds, and how often to sync the counter values. It's straightforward to use, and you can get this robust control with a simple API call:
The enterprise edition also supports Redis Sentinel, which makes Redis highly available and more fault-tolerant. You can read more in the rate limiting plugin documentation.
Other features include an admin GUI, more security features like role-based access control, analytics and professional support. If you're interested in learning more about the Enterprise edition, just contact Kong's sales team to request a demo.
Want more tutorials?