How We Got a 12% Increase in RPS and a 37% Drop in Latency
Continuous Performance Testing
At Kong, we run performance testing in CI in every commit or pull request that has a potential performance impact, as well as on each release. Thanks to the performance testing framework and its integration with Github Actions, we can easily get basic metrics like RPS and latency. Also, flame graphs to pinpoint the significant part that draws down performance.
With that workflow in place, we figured one of the most significant parts of Kong's hotpath is Nginx variable accesses.
The flame graph above is generated by counting each function involved in executing the test. The wider the bar appears in the flame graph, the more CPU time is taken in that function. In the above flame graph, resty.core.var:var_get and resty.core.var:var_get are APIs to access Nginx variables; they add up to take 21% of samples in Kong's access phase.
An Nginx variable starts with an "$" in the Nginx configuration; it’s widely used in almost every Nginx module. We usually rely on Nginx variables to read the current request's state (like $host for the current host) or set some state for the current request (like $request_uri for rewriting path). Nginx variables are also used to interact with other Nginx modules. It also allows OpenResty to read the module's state (through variable read) or set their behaviors (by variable write).
If you are already familiar with Kong, you may see variables being used in many places, especially when Kong is routing the requests. Our team spotted the Nginx variable access as a potential bottleneck because it took a large chunk in the flame graph.
To solve the bottleneck, let's first look at how Nginx variables are accessed in OpenResty.
Nginx Variable in OpenResty
To access variables in OpenResty, we use the ngx.var API, including getting and setting the variable. The Lua part of ngx.var, implemented at lua-resty-core, uses a Lua metatable to turn ordinary Lua table access into variable get and set operations.
Let's examine var_get. The function ultimately invokes ngx_http_lua_ffi_var_get through FFI. The implementation of ngx_http_lua_ffi_var_get is in ngx_http_lua_variable.c of lua-nginx-module. The first part deals with regex captures first, which aren’t relevant for this article. Then comes this part:
The core logic uses ngx_hash_strlow, which calculates the hash of the variable's name accessed and stores the lower case form of name_data. Then with the hash, we can perform a hash table lookup and find the instance of the variable for the current request with ngx_http_get_variable. Note that Nginx configuration declares the variables themselves at config parsing time, but variable values are only available in the ngx_request_t instance at run time.
Nginx Variable in Nginx Modules
Let's hold the analysis first and look at how the Nginx module access variables. We will use ngx_http_memcached_module as an example. The module expects a special variable "$memcached_key" to exist to use as the key for caching. Two parts seem to relate to this variable. First one is in function ngx_http_memcached_pass:
This function is the handler for the memcache_pass directive, so it will be called during the config phase, i.e., when Nginx loads its configuration. Note the function ngx_http_get_variable_index, where the numeric index is retrieved from the current conf context cf and stored into a member index of the current module's context mlcf.
The second occurrence is in function ngx_http_memcached_create_request, which explained in its name, is being called in every request:
Note function ngx_http_get_indexed_variable is used to get variable value by passing the index previously stored in module context mlcf.
What Are the Differences
Now we can identify the difference between how Nginx variables are being used. If we search for occurrences of ngx_http_get_variable versus ngx_http_get_indexed_variable in Nginx official modules source code, most modules use the later one; there are two exceptions: the perl module and the ssi module:
One might already notice that those using ngx_http_get_indexed_variable always expect the variable to be accessed already in configuration time, where ngx_http_get_variable is used in cases where the variable appears in the dynamically interpreted script. This is also the reason OpenResty chooses the hash-based API to access variables, as it's nearly impossible to know what variables will be used afterward before actually running the Lua code.
Why Accessing Unindexed Variable Is Slow
From the flame graph above, we can see that ngx_hash_strlow and ngx_hash_find are also taking noticeable time in each request. Moreover, the hash that is also being used only contains those variables that are "known at configuration time," including variables defined by set directive and some common ones like $host. Variables and those prefix variables like $http_ are not included in hash and will go through a loop for matching, ngx_http_get_variable can be described as the following pseudocode:
On the other hand, ngx_http_get_indexed_variable uses array index to access element variables, no hash find or prefix matching happens.
In this case, is there a way to use the faster path in the ngx_http_get_indexed_variable?
Get Variable "Indexed"
The answer from our side is to proactively tell OpenResty the common variables that will always be used later, so they can be indexed at config time and accessed with the index at run time.
Firstly we introduce a new directive lua_kong_load_var_index to ensure Nginx will index a variable; you can even use it to index prefix variables like $http_HEADER to speed up such use cases. Since this function is called in the config phase as ngx_http_get_variable_index requires, it's not easy to immediately return those indexes to Lua. If it's not previously indexed, just calling ngx_http_get_variable_index will also populate the index into the Nginx variable index array.
Secondly, we introduce a new FFI function to let Lua code pull all known indexes and their corresponding names from Nginx's variable index array.
We use the returned indexes and names on the Lua side to build a Lua table that maps names to numeric indexes.
Finally, we introduce a new FFI function to get variables by index. With get, ngx_http_get_indexed_variable is sufficient to return the variable value; with set, we will use index to find the variable and invoke its set_handler or set the value directly. On the Lua side, we first lookup the Lua table for numeric index and fall back to the original unindexed way only if the index is not found.
Those operations are a drop-in replacement for the current ngx.var API. They're all done in our lua-kong-nginx-module, so end users don't (and can't) manage or modify indexes when accessing variables. The wrong index number could lead to undefined behaviors. Users only need to add the lua_kong_load_var_index directive to cache desired variables to gain performance. The result is that the existing code will automatically take advantage of indexed variable access with the existing safe OpenResty Lua API.
We did a ballpark test of accessing variables with and without an index. It showed our approach boosted the speed at 73% for set and 85% for get.
We also observed our base test for Kong has a 12% increase of requests per second (RPS) and a 37% drop in latency.
In the flame graph with our improvements, we can shrink the samples down to less than 10% in the access phase.
Performance optimization doesn't have an end here. That includes the ones we perform for Nginx variables. There are a couple of ways we can further improve. We can add a function style API like kong.var.get(key) and kong.var.set(key, value) to directly invoke FFI functions without the need for metatable magic.