Continuous Delivery testing levels

This blog post is a summary of thoughts discussed between me, Andreas Rehn (@andreasrehn) and Patrik Boström (@patbos).

A key part of Continuous Delivery is automated testing and even the simplest delivery pipeline will consist of several different testing stages. There is unit tests, integration tests, user acceptance tests etc. But what defines the different test levels?

We realized that we often mean different things regarding each testing level and this was especially true when talking about integration tests. For me, integration tests can be tests that test the integrations within one component, e.g. testing an internal API or integration between a couple of business objects interacting with each other, a database etc. This is how the Arquillian (an integration testing framework for Java) community is referring to integration testing. Another kind of integration tests are those testing an actual integration with e.g. a third party web service. What we’ve been referring to when talking about integration tests in the context of Continuous Delivery, is testing a component in a fully integrated environment and testing the component from the outside, rather than the inside, so called black box testing. These are often more functional by nature.

We came to the conclusion that we would like to redefine the terminology for the latter type of integration testing to avoid confusion and fuzziness. Since these kind of tests are more functional tests, testing the behavior and flows of the component, we decided to start calling these types of tests component tests instead. That leaves us with the following levels of testing in the early stages of a delivery pipeline:

* Unit tests
* Smoke tests
* Component tests
* Integration tests

When should you run the different tests? You want feedback as soon as possible but you don’t want to have a too big test suite too early in the pipeline as this could severely delay the feedback. It’s inefficient to force developers to run a five+ minute build before each commit. Therefore you should divide your test suite into different phases. The first phases typically includes unit tests and smoke tests. The second phase will run the component tests in a fully integrated production like environment. The third phase will execute integration tests, e.g. with Arquillian. Certain integration tests will not need to be run in a fully integrated environment, depending on the context, but there are definitely benefits of running all of them in such an environment. These tests can also test integrations towards databases, third party dependencies etc.

To be fully confident in the quality of your releases you need to make use of these different tests as they all fulfill a specific purpose. It is worth considering though, in what phase certain tests should be placed as you don’t want rerun tests in different phases. If you’re validating an algorithm, the unit test phase is probably the most appropriate phase, while testing your database queries fits well into the integration test phase and user interface and functional tests as component tests. This raises the question, how much should you actually test? As that is a topic on its own, we’ll leave that for another time.

Conclusion:
Unit tests – testing atomic pieces of code on their own. Typically tested with a unit testing framework
Integration tests – putting atomic pieces together to moving parts, testing integration points, internal APIs, database interactions etc. Typically tested with Arquillian and/or with a unit testing framework along with mocks and stubs.
Component tests – functional tests of the component, so called black box testing. Often tested with Selenium, acceptance testing frameworks or through web service calls, depending on the component. Also a subject for testing with Arquillian.

Tommy Tynjä
@tommysdk

Testing the presence of log messages with java.util.logging

Sometimes there is value in creating a unit test to assert that a specific log message actually gets printed. It might be for audit logs or making sure that system misconfigurations get logged properly. A couple of years ago my colleague Daniel blogged about how to create a custom Log4j appender and to use that in your unit tests to assert the presence of certain log messages. Read about it here.

Today I was resolving an issue in the Arquillian (the open source integration testing framework for Java) codebase. This involved in logging a warning in a certain use case. I obviously wanted to test my code by adding a test case for the different use cases, asserting that the log message got printed out correctly. I’ve used the approach of asserting log messages in unit tests many times in the past, but I’ve always used Log4j in those cases. This time around I was forced to solve the problem for plain java.util.logging (JUL) which Arquillian uses. Fun, as I’m always up for a challenge.

What I did was similar to the log4j approach. I need to add a custom log handler which I attach to the logger in the affected class. I create an outputstream, which I attach to a StreamHandler. I then attach the StreamHandler to the logger. As long as I have a reference to the output stream, I can then get the logged contents and use that in my assertions. Example below using JUnit 4:

private static Logger log = Logger.getLogger(AnnotationDeploymentScenarioGenerator.class.getName()); // matches the logger in the affected class
private static OutputStream logCapturingStream;
private static StreamHandler customLogHandler;

@Before
public void attachLogCapturer()
{
  logCapturingStream = new ByteArrayOutputStream();
  Handler[] handlers = log.getParent().getHandlers();
  customLogHandler = new StreamHandler(logCapturingStream, handlers[0].getFormatter());
  log.addHandler(customLogHandler);
}

public String getTestCapturedLog() throws IOException
{
  customLogHandler.flush();
  return logCapturingStream.toString();
}

… then I can use the above methods in my test case:

@Test
public void shouldLogWarningForMismatchingArchiveTypeAndFileExtension() throws Exception
{
  final String expectedLogPart = "unexpected file extension";

  new AnnotationDeploymentScenarioGenerator().generate(
        new TestClass(DeploymentWithMismatchingTypeAndFileExtension.class));

  String capturedLog = getTestCapturedLog();
  Assert.assertTrue(capturedLog.contains(expectedLogPart));
}

 

Tommy Tynjä
@tommysdk

Test data – part 1

When you run an integration or system test, i.e. a test that spans one or more logical or physical boundaries in the system, you normally need some test data, as most non­trivial operations depends on some persistent state in the system. Even if the test tries to follow the advice of favoring to verify behavior over state, you may still need specific input to even achieve a certain behavior. For example, if you want to test an order flow for a specific type of product, you must know how to add a product of that type to the basket, e.g. knowing a product name.

But, and here is the problem, if you don’t have strict control of that data it may change over time, so suddenly your test will fail.

When unit testing, you’ll want to use mocks or fakes for dependencies (and have well factored code that lets you easily do that), but here I’m talking about tests where you specifically want to use the real dependency.

Basically, there are only two robust ways to manage test data:

  1. Each tests creates the data it needs.
  2. Create a managed set of data that covers all of your test needs.

You can also use a combination of the two.

For the first strategy, either you have an idempotent approach so that you just ensure a certain state, or, you create and delete the data for each run. In some cases you can use transactions to be able to safely parallelize your tests and not modify persistent state. Just open one at the start of the test and then abort it instead of committing at the end. Obviously you cannot test functionality that depends on transactions this way.

The second strategy is a lot easier if you already have a clear separation between reference data, application data and transactional data.

By reference data I mean data that change with very low frequency and that often is of limited size and has a list or key/value structure. Examples could be a list of supported languages or zip code to address lookup. This should be fairly easy to keep in one authoritative, version controlled location, either in bulk or as deltas.

The term application data is not as established as reference data. It is data that affects the behavior of the application. It is not modified by normal end user actions, but is continuously modified by developers or administrators. Examples could be articles in a CMS or sellable products in an eCommerce website. This data is crucial for tests. It’s typically the data that tests use as input or for assertions.

The challenge here is to keep the production data and the test data set in synch. Ideally there should be a process that makes it impossible (or at least hard) to update the former without updating the second. However, there are often many complicating factors: the data can be in another system owned by another team and without a good test double, the data can be large, or it can have complex relationships or dependencies that sometimes very few fully grasp. Often it is managed by non­technical people so their tool set, knowledge and skills are different.

Unit or component tests can often overcome these challenges by using a strategy to mock systems or create arbitrary test data and verify behavior and not exact state, but acceptance tests cannot do that. We sometimes need to verify that a specific product can be ordered, not a fictional one created by the test.

Finally, transactional data is data continuously created by the application. It is typical large, fast growing and of medium complexity. Example could be orders, article comments and logs.

One challenge here is how to handle old, ‘obsolete’ data. You may have data stored that is impossible to generate in the current application because the business rules (and the corresponding implementation) have changed. For the test data it means you cannot use the application to create the test data if that was you strategy. Obviously, this can make the application code more complicated, and for the test code, hopefully you have it organized so it’s easy to correlate the acceptance test to the changed business rule and easy to change them accordingly. The tests may get more complicated because there can now e.g. be different behavior for customers with an ‘old’ contract. This may be hard for new developers in the team that only know of the current behavior of the app. You may even have seemingly contradicting assertions.

Another problem can be the sheer size. This can be remediated by having a strategy for aggregating, compacting and/or extracting data. This is normally easy if you plan for it up front, but can be hard when your database is 100 TB. I know that hardware is cheap, but having a 100 TB DB is inconvenient.

The line between application data and transactional data is not always clear cut. For example when an end user performs an action, such as a purchase, he may become eligible for certain functionality or products, thus having altered the behavior of the application. It’s still a good approach though to keep the order rows and the customer status separated.

I hope to soon write more on the tougher problems in automated testing and of managing test data specifically.

Marcus Philip
@marcus_phi