Microservices Testing Guide
With microservices architectures being adopted more and more widely, organizations need to adapt their testing strategy in order to capitalize on the advantages of a loosely coupled system.
The shift towards microservices is closely related to both the rise in popularity of agile software development practices and DevOps cultures. These three trends share a common goal: building products that are responsive to user needs while maintaining high quality and high availability. A system built on a microservices architecture consists of multiple loosely coupled services, each responsible for a single business function. Breaking down a system into multiple individual services means that each service can be developed in parallel by autonomous teams, speeding up development cycles. Loose coupling between services provides the flexibility to update or replace these individual services with limited impact on the rest of the system. This is further supported by deploying to a containerized, cloud-hosted infrastructure, where services can be scaled independently according to need.
The approach to software testing often used for traditional, monolithic architectures – running tests in a staging environment after code completion before deploying the entire application to production – would undermine many of the benefits of a microservices architecture. With a monolithic system, the whole system must be deployed each time a change is released, which means the internal dependencies are always available to test. For a microservices-based application, where the aim is to develop and deploy each service independently in order to enable shorter development cycles, a testing strategy that covers the individual services, connections between services and the functionality of the entire system is required.
Testing is the process of checking that your code and your application as whole behave as you expect them to in a given set of circumstances. In the software industry, approaches to testing range from attempting to verify every possible edge case (with the aim that users will never encounter a bug) to “if it compiles, ship it.” In reality, there is no one true approach that can be applied to all cases. As with so many aspects of software development, context is everything. The appropriate level of testing for a particular application depends on a number of factors, including the purpose of the application, the resources available and the level of risk that is considered acceptable.
Assuming you’re going to invest in some degree of testing on your microservices-based application, what can be said is that you should aim to automate many of those tests. Manual testing is not only a labor-intensive, time-consuming activity, it’s also inefficient and error-prone when used for repetitive tasks. To put it bluntly, people get bored when asked to follow a test script and click through the same set of steps over and over again. By automating the process, tests can be run more frequently and consistently, and testers can focus on adding value with other types of testing.
Furthermore, automated tests can be incorporated into a continuous integration and continuous deployment pipeline to provide the team with feedback on the state of their application as part of the build and deployment process. However, even automated tests can be slow to run, particularly if they involve spinning up dependent services to test against, so the key is to choose which tests to run at each stage of the pipeline so that the results can be usefully fed back into the next development cycle. After all, what’s the point of running tests if you don’t act on them?
The Test Pyramid
The Testing Pyramid, originally described by Mike Cohn in the context of agile and service-oriented architectures (SOA), remains a useful model for approaching testing in a microservices context. The premise is that, for any given system, the bulk of automated testing should take the form of unit tests, followed by a smaller number of system tests, with UI tests making up the minority.
The pyramid reflects the cost and time involved in writing tests versus the value of running those tests; unit tests are quick to write, test very specific functionality with little duplication and are relatively quick to run, whereas testing the same test cases through the GUI would take much longer and involve testing GUI elements repeatedly with no additional value.
While the general principle that the number of tests should decrease as their granularity increases is still good advice, the trend towards using container-based, cloud-hosted infrastructure for microservices-based systems adds a new layer of complexity. As Cindy Sridharan describes, these are distributed, complex systems, which means their behavior can be unpredictable, and it is not possible to emulate exactly what will happen in production in a staging environment. This means some level of post-production testing is beneficial for providing a high quality, highly available system.
Types of Test
Before we consider the various types of test, it’s worth noting that names used to describe tests vary between teams and organizations, and you can easily go down a rabbit hole with these discussions if you’re so inclined. Ultimately, what a test is called doesn’t matter, as long as the people working together on development of a system all understand what they mean by a particular name.
At the bottom of the testing pyramid are unit tests, which focus on the smallest testable unit in your codebase, typically a method or a function. Unit tests are written in the same language as the rest of your code and verify that a particular unit of your codebase does what you intend it to do. Being small and lightweight, unit tests can be run automatically with every commit or every build, providing developers with immediate feedback on their work so they can address bugs before they lose context.
Unit tests follow the same simple structure as any other test: given a particular circumstance, when x happens, then the result should be y. The given element introduces the potential for dependencies on other classes or functions. Dependencies on other collaborators can either be allowed (so-called sociable unit tests) or replaced with test doubles in order to ensure isolation (so-called solitary unit tests). A test double, such as a mock or a stub, replaces a class or function with a fake version that returns a response that you define at the start of the test. There’s no need to adopt sociable or solitary unit testing to the exclusion of the other; you can use both in the same program according to what is most appropriate to your situation.
There are plenty of unit test frameworks and tools out there to make writing unit tests for your code easier. If you’re practicing Test Driven Development (TDD), you start by writing the test before you write the code that satisfies the test. Even if you’re not following TDD principles, writing unit tests as you develop your application is good practice and avoids them being treated as an optional extra to add later if there is time. On the other hand, the “test coverage” metric should not become a goal in itself, let alone a KPI for management to track. Striving to meet an arbitrary percentage of test coverage does not necessarily improve the quality of a product and may come at the expense of other more valuable work. As we said before, context is everything.
In his automated test pyramid, Mike Cohn included a middle layer of “service tests” that covered test cases that cannot be addressed with unit tests but which can be run without using the GUI. In the microservices context, the service test layer can be broken down further. Each microservice typically consists of multiple modules, and neither the interactions between these nor their behavior as a whole are covered by unit tests. Similarly, the interactions between individual microservices, via REST APIs or other message-based network protocols, also need to be tested.
Often microservices depend on calls to external modules, such as a data store or file system, and to other services. Integration tests verify that the interactions between these modules and services work as intended.
As with unit tests, when writing integration tests, you can choose whether to use a test double to control dependencies, or test with the actual dependency. In a monolithic architecture, testing against the actual dependency is relatively straightforward, as the modules all exist in the same process. However, in a microservices-based system, testing with the actual dependency not only takes longer (because the calls have to be made over the network) but also requires that the module or service be spun up and the network connections established in order to run the test. This can have a significant impact on the time involved and can cause tests to fail if the external service is not available. It also undermines the advantage of building a system of loosely coupled microservices, namely that each service can be developed and deployed independently.
For these reasons, integration testing for microservices is often broken down into testing the integration against a test double and testing that the double matches the external module with a contract test. This isolates whether the integration functions correctly from whether the external module behaves as expected. A double can be a simple mock or stub with pre-defined logic, or a more sophisticated API simulator or virtual service that mimics more complex behavior created with tools such as Wiremock and Hoverfly. The test double can either exist in-process (such as an in-memory database) or be accessed over a network protocol using a tool such as mountebank.
While an integration test looks at the interactions between two modules, a component test verifies the behavior of an entire component in isolation. How you define a component depends on your application, but a useful starting point is to think of each individual microservice as a component. In this case, the tests will check that the business function for which the microservice is responsible is in fact met.
As with integration tests, test doubles can be used to simulate external components that the system under test depends on. Running a test double in process is usually quicker, whereas using a test double over the wire also exercises the integration point.
While the same functionality could also be tested with end-to-end tests of the entire system, running a component test removes dependencies on the rest of the system (which may be at different stages of development and which would have to be spun up in a test environment). Running a component test is quicker and shortens the feedback loop, which means developers can act on the results sooner.
Contracts are used to define what an external service, such as another microservice in the same system or a third-party system, will provide to the sub-system under test. When test doubles have been used to enable an integration test or a component test, an automated contract test can be added to verify that the assumptions in the test double are still valid by checking that the real service still meets the terms of the contract. If the contract test fails, then the interface has changed – either intentionally or unintentionally – and the integration test is no longer valid.
Within an organization, a consumer-driven contract approach can help teams working on different services remain aware of dependencies on their service and alert them when those needs are no longer met. The team building the service that calls the API (the consumer) writes the tests that define what they need from the interface, using a tool such as Pact. The consumer shares these tests with the team providing the API (the producer). The producer team can then include the tests in their build pipeline to ensure they continue to meet their consumers’ needs. If the contract is broken, this is a prompt for the relevant teams to have a conversation about what they need from the service, rather than demand for the change to be reverted.
The consumer-driven contract approach is particularly useful in larger enterprises where teams inevitably become more siloed. For smaller organizations with co-located teams, an API contract may be too heavyweight; teams can rely on talking to each other to remain aware of dependencies.
Where a system makes calls to a third-party service over a public API, it’s not usually possible to rely on a consumer-driven contract. Instead, an external API contract can serve to alert the team when a change is introduced that has not been otherwise communicated or noticed. When this happens, the system, the integration tests and the contract tests all need to be updated to reflect the changes to the third-party service.
End-to-End and UI Tests
At the top of the testing pyramid are UI tests. These are sometimes referred to as end-to-end tests, although it is possible to test your system’s UI without performing an end-to-end test. Tests that are driven though the UI can be automated (using tools such as Selenium, for example), but they tend to be brittle and time-consuming, both to write and to run. For these reasons, it’s sensible to limit UI tests to cases that cannot be covered by finer-grained tests further down the pyramid, hence the overlap with end-to-end tests.
End-to-end tests exercise your entire application by emulating user workflows. This is where a deployment pipeline that builds a testing or staging environment for your entire system can save a lot of manual effort. Even if all previous tests in your pipeline have passed, it’s only by running end-to-end tests that you can verify that the system does what it was built to do. As well as testing the “happy paths,” end-to-end tests should also test failure routes to ensure errors are handled as expected.
A complete staging or testing environment is also a good place to conduct other cross-system tests, such as load testing and performance benchmarking, before releasing your changes to production.
Despite the many layers of automated testing described above, there is still value in some types of manual testing. Ideally, these tests should be conducted in a testing or staging environment, which can be refreshed easily to avoid contamination between tests.
Exploratory testing gives testers free rein to try and break the system as creatively as they can. By definition, exploratory testing should not follow a script. Instead, the aim is to look at the system from new angles. The results of exploratory testing can then be fed into the automated tests to prevent the same issues recurring in future.
User acceptance testing ensures that the system meets the user needs and the business objectives for building the application. Often acceptance testing is carried out by people who have not been involved in writing the code, such as product managers or usability specialists, and who can therefore bring a fresh perspective to the tests.
Testing in Production
To some organizations, the idea of testing on real end-users is totally anathema. However, particularly in a cloud-hosted environment, the reality is that end users are always testers of our systems because it’s impossible to test every single possible combination of circumstances that could occur. While the automated and manual pre-production testing described above can provide a certain degree of confidence in a system, extending your test strategy into the production environment will only increase the quality and availability of the system.
Chaos engineering was developed specifically to test complex, distributed systems operating at scale. This is a type of failure testing, usually run in production, that verifies the resiliency of the system when something goes wrong. Despite the name, the aim should be to contain the negative impact of each experiment and minimize pain to end users. Although running in production is recommended in order to perform a realistic test, there is no reason not to start by testing the proposed hypothesis in a staging environment in order to catch any avoidable errors.
Keeping a close eye on the health and use of your production system can provide early warnings before a service fails for platform reasons, such as lack of disk space or compute resources. Most monitoring systems cover these metrics as standard. Extending your monitoring solution to look at business metrics or KPIs in real-time can provide further insights into the state of your system and flag up potential issues before any real damage is done. For example, for an online shopping service, a drop in the expected number of transactions might indicate a failure in a related part of the system that is preventing users from completing a purchase.
While monitoring provides information on your platform and business metrics, if you want to understand performance bottlenecks or debug complex issues in a system where interactions touch multiple services, distributed tracing is required. Distributed tracing instruments your services so that individual requests can be tracked and the details recorded centrally. Armed with trace data, developers can identify where latency has crept into the system to slow down requests or identify the root cause of a problem. If you’re using the Kong API gateway to manage and route requests through your system, you can add the Zipkin plugin to enable distributed tracing in your production environment.
Like the canary in a coal mine, a canary release is a way of testing a new version of your software in production while limiting the potential damage if something goes wrong, as the changes can be rolled back easily. One of the benefits of a microservices-based system is that each service can be deployed independently. With a canary release, you deploy the new version of a service to production while the old version is still live and divert a small proportion of the traffic to the new service and monitor performance. If the service fails or any performance metrics are negatively impacted, all traffic is diverted to the old, stable version while the logs are analyzed and the issues fixed. If no issues are found, more and more traffic is diverted to the new service until the old version is no longer receiving requests and can be taken offline.
If you’re using the Kong API gateway, you can set up a canary release using the Canary Release plugin. The Canary Release plugin is ideal if you have multiple instances of each version of the service and supports more sophisticated behavior, such as targeting the canary to specific user groups and progressing the release automatically.
Using blue-green deployments makes the most of an automated deployment pipeline by providing a staging environment that is as close to production as possible with a simple method for rolling back a release in the event of a failure. A blue-green deployment requires two almost-identical environments. At any time, one is staging and the other is in production. Once you’ve run your pre-production tests in staging (which might be blue at this point) and are ready to release, you switch traffic from the green environment to the blue one. The blue environment is now live, but the green one is still available if you need to roll back the changes. Once you’re confident that the changes are stable, the green environment can become a staging environment for the next change to be released. Running a blue-green deployment with the Kong API gateway is straight-forward and just requires a simple update though the admin API.
Closing the Feedback Loop
The process of testing your software is never complete; users will use your system in unexpected ways, external dependencies will change without warning, and environments will evolve. Whenever an issue is found during exploratory testing or in production, or a test fails higher up the automated testing pyramid, it provides an opportunity to write a test that catches the root cause of the error. Rather than writing an end-to-end test for an issue found during exploratory testing or in production, try to cover the root cause of the failure with a unit test or a service test. Pushing tests down the pyramid means developers can get feedback on their code sooner, before they lose context and before other work is built on top of it.
In the context of a microservices-based application, using the test pyramid model allows testing to be conducted early on in the development process while accounting for dependencies in a loosely coupled system. This in turn means microservices can be deployed independently, enabling small, incremental changes and frequent releases which deliver value to your users regularly. By getting your system into your users’ hands sooner, you can validate whether it meets their needs and collect feedback to incorporate into your next development cycle. Automated testing is essential to a continuous integration and deployment pipeline and drives a virtuous circle of agile software development, but it doesn’t mean you should stop there. Manual testing and post-production testing are both important elements in a testing strategy, and their findings can be used to make the development process more efficient over time.
Want to learn more?
Request a demo to talk to our experts to answer your questions and explore your needs.