Does anyone care about your non-functional tests? (part 2)

This is part 2 of the non-functional testing topic I tried covering and performance testing in specific. If you are interested in the previous blog post click here to check it out.

TL;DR Just skip to “The end result” section below if you want to read what we ended up doing,

Performance tests basics

Before we dive into our setup on performance testing a brief explanation of terms might be helpful.

Different subsets of performance testing

Performance testing is a wide term, and it can have different variations and subsets. In general, the aim of these kinds of non-functional tests is to understand how your system behaves under a certain load, how it handles close to production traffic, what happens when you stress it with more traffic than it was supposed to handle, etc.

In our case we decided to start with Load and Stress testing.

Load testing is meant to test your application at the expected levels you experience on your production servers. This way you can simulate production-like load and measure how your system behaves.

Stress testing on the other hand is testing your application with a load much more than what you usually expect. It is a great way to check what will happen if you experience sudden spikes in traffic for which the current setup you have won’t have the horsepower to process.

The tool – jMeter

As a tool to help us run our performance tests, we chose one of the most popular options – jMeter. But we wanted to be able to run distributed tests from multiple nodes that can generate high load when needed. And we wanted to leverage an external cloud provider, so we have a consistent setup against which we can compare different runs.

One of the recommended solutions was to setup a master-slave jMeter nodes (or primary-secondary as probably better suited these days).

There are a lot of tutorials online which can help you if you decide to go down that path.

The issue we had with that kind of approach is for it to work, all the nodes should be in the same network, or at least the primary node should be able to “see” the secondary nodes.

Distributed testing in Cloud environment?

In the case of AWS (as we do for this particular use case) this would mean we can setup several EC2 instances in the same VPC (virtual private cloud).

Sound easy, right?

Yes, it is, but it also means that we have to take care of provisioning the EC2 instances, monitor when they are up and healthy, and after our tests are done to terminate them, so we can optimize costs.

This looks like a lot of setup and heavy lifting for what we wanted.

Also, some of our tests were running for around 10 minutes, but EC2 is paid by the hour, which would mean we will be paying 5x more for each run.

There should be a better way!

We have already used ECS (Elastic Container Service) for quite a lot of projects and its flexibility was quite impressive. Can this approach be useful here?

All you have to do (this statement is oversimplified) is to Dockerize your service and run it in AWS serverless infrastructure. This way you don’t care about EC2 instances, you don’t have to monitor them, manually stop them when you need to, etc. The only issue we faced was that these distributed instances can’t communicate with each other directly (they are not in a single network), so the concept of primary-secondary was not going to work.

The plan

After some back and forth, some failed ideas, and some hair inevitably lost we ended up with the following:

ECS cluster in AWS that can run multiple tasks through Fargate.
Each task runs a docker image that we have previously pushed to ECR (Elastic Container Registry).
The results of the performance tests are stored in an S3 bucket.
When all tasks are done, a job is run to import those results into Grafana, where they are aggregated and displayed.

ECS cluster

Our cluster is setup to execute tasks in Fargate. Through the aws-cli interface, all those steps can be automated, so you don’t need to do anything manually after the initial setup is done.

Above you can see 2 tasks running in parallel executing performance tests.

Docker image

Our jMeter docker container is a simple image, based on linux:apline with installed dependencies for jMeter and S3.

The end result

On each run, which occurs every night or can be triggered manually when needed the following happens:

The docker image is being built, so if there are changes in the repo, they are also packages into the new version of the image
Tag the image. Currently, we just replace the latest version of it with the latest tag, but if you want to be able to run the previous version of that image a versioning schema has to be introduced.
The image is being pushed to the ECR, which is just a private docker image repository in AWS.
Fargate tasks are being run based on the new image. Depending on the parameters with which the build was executed we can control how much load we require. It can be just 2 tasks ( == 2 separate instances) or 200.
The deployment script waits until all tasks report they have finished executing. This means tests were run and the results are stored in an S3 bucket (each run groups all results in S3 so we know which belongs to a given run).
The next step is to grab all the reports from S3 and import them to Grafana. We import those records directly to the Influx DB underneath Grafana. Each run stored the results with a given tag, so we can group runs based on the load being executed, etc. In Grafana we have already created a bunch of graphics (which we still iterate onto and improve) and monitor all runs from there.

An example data gathered from a performance run can be seen below (that’s one of the good runs, we have some pretty bad as well 😉 ) :

Throughput (in thousands) per minute on the top left; Average response time in ms on the top right; Successful vs failed request at the bottom

This whole setup enables us to constantly monitor our performance metrics. If the development team had published a release candidate change that degrades the performance of the application this can be detected before our customers start calling.

For example, look at the success rate by request data below. It is clear that with one of the request we have a serious issue and this only came into the spotlight with these tests:

We know how much traffic our servers can handle, how much time it takes for the scaling procedure to kick in, potential bottlenecks that the team can address first, etc., etc.

Besides monitoring the data coming from the performance tests themselves we also have to monitor the application under test and how it behaves. With the help of the metrics provided from CloudWatch that’s pretty easy to setup

The beauty of working with data is there is no need to guess anymore!

This post was written by Iskren Dimov