Scaling up an API proxy in OCaml
At Mashape, we’ve been building a new product, APIAnalytics, to let anyone easily track and monitor anything related to HTTP in their applications. To be successful, it needs to be extremely easy to plug any application into it. There’s many ways to achieve that with as little friction as possible, such as agents and libraries for every popular language, but for now let’s talk about HTTP proxies.
Some of our requirements for HARchiver, our official lightweight proxy for APIAnalytics are:
- No dependencies
- Sub-millisecond latency
- Blazing®™ Fast©
- 100% reliable
We’re going to be running HARchiver on the live Mashape.com API hub, so any failure is out of question because it could take down the apps of the hundreds of thousands of developers leveraging the Mashape platform.
After eliminating all the languages that require some sort of runtime (Node, Java, etc.) and the ones that don’t prioritize safety and correctness (C, C++, etc.), we’re left with not many options. Rust is also still way too immature and untested. Fortunately, it turns out to be a perfect use case for my new favorite language, OCaml.
For those unfamiliar with it, OCaml is a functional programming language that values (in this order): speed, correctness and safety. It sits close to D, Dart and ASP on the 2014/Q1 programming language rankings, it was released in 1996 and has been gaining a fair amount of popularity in the last few years.
Surprisingly, reaching Feature Complete took less time than expected. It’s the performance tuning that turned out to be more challenging, through no fault of the language itself. OCaml currently lives in a weird state, split between 2 Standard Libraries (the included one is aimed at compiler development and hence is quite minimal). One is made by Jane Street Capital, called Core, while the other is a community-built one, called Batteries. The same exact thing happens when it comes to picking an Async/IO engine. Jane Street makes “Async”, but the community has mostly rallied around Lwt. I’m using both Core and Lwt.
Making it faster
All the following numbers are on a fairly weak laptop i5-4200U CPU, testing against a local nginx server serving a small (110 bytes) static file. I’ll be using siege for the load testing. Each test is run 3 times, keeping only the median result.
To provide a baseline, let’s put a second nginx server and configure it as a reverse proxy. HARchiver will need to achieve similar performance despite doing a lot more processing.
All the siege tests are run with
siege -c 500 -b -t 10S
That means: 500 concurrent client connections, each trying to complete as many GET calls as possible, for 10 seconds.
siege -> nginx proxy -> nginx server
And now, let’s test it with the first, unoptimized version of HARchiver between the client and the server, all three running on the same machine.
siege -> HARchiver -> nginx server
Briefly, this is what HARchiver needs to do:
- Parse the incoming request, grab the headers and the query
- Open a connection to the remote host and pass everything through
- Parse the response, grab the headers and the query
- Compute some timers, sizes, count the length of the request and response bodies, then create a JSON representation of the whole exchange
- Send it off via ZMQ
Still, it doesn’t look so great considering that the average response time was doubled.
After plenty of small performance improvements, I used the excellent Lwt_pool module to maintain a group of system DNS resolvers and split the load between them.
After noticing insane DNS traffic to the system resolver, I wrote a homemade generic async caching system, leveraging the fantastic OCaml type system.
Unfortunately since the beginning of this project, under heavy load (800+ client threads), HARchiver would crash with some really ugly errors:
(Unix.Unix_error "Invalid argument" select "")
Fatal error: exception (Unix.Unix_error "Operation not permitted" send "")
That, and noticing that it would happen with less threads if it was restarted immediately after crashing led me to realize that Lwt was using the Linux kernel’s select() syscall as its engine to keep track of async tasks. After a few tweaks to switch to the libev engine, all the stability problems disappeared and the performance improvements were great:
And there we go, all the objectives are met, the performance is decently close to the extremely performant nginx while doing way more processing for each request. If you’ve been paying attention, you might have noticed that the average response time is 5ms longer than the baseline. This is because HARchiver is more taxed at our current benchmark settings than nginx and the response times suffer as a result. Under reasonable production load (up to 500 req/sec) the response time compared to going straight to the server is no more than 1ms longer.
A Node.js implementation
Finally, let’s compare it to a quick and dirty Node.js test implementation that does all the same processing that the OCaml proxy does.
Even though it hasn’t been optimized as much as the OCaml version, the difference is massive.