All posts by Tommy Tynjä

Twitter: @tommysdk

Docker on Mac OS X using CoreOS

Docker is on everybodys lips these days. It’s an open-source software project that leverages from Linux kernel resource isolation to allow independent so called containers to run within a single Linux instance, thus avoiding overhead of virtual machines/hypervisors while still offering full container isolation. Docker is therefore a feasible approach for automated and scalable software deployments.

Many developers (including myself) are nowdays developing on Mac OS X, which is not Linux. It is however possible to use Docker on OS X but one should be aware of what this implies. As OS X is not based on Linux and therefore lacks the kernel features which would allow you to run Docker containers natively on your system, you still need to have a Linux host somewhere. Docker provides you with something called boot2docker which essentially is a Linux distribution (based on Tiny Core Linux) built specifically for running Docker containers.

In case you want to use a more general Linux VM, if you want to use it for other tasks than just running Docker containers for instance, an alternative for boot2docker is to use CoreOS as your Docker host VM. CoreOS is quite lightweight (but is obviously bigger than boot2docker) and comes bundled with Docker. Setting up a fresh CoreOS instance to run with Vagrant is easy:

mkdir ~/coreos
cd ~/coreos
echo 'VAGRANTFILE_API_VERSION = "2"
Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
config.vm.box = "coreos"
config.vm.box_url = "http://storage.core-os.net/coreos/amd64-generic/dev-channel/coreos_production_vagrant.box"
config.vm.network "private_network",
ip: "192.168.0.100"
end' > Vagrantfile
vagrant up
vagrant ssh
core@localhost ~ $ docker --version
Docker version 0.9.0, build 2b3fdf2

Now you have a CoreOS Linux VM available which you can use as your Docker host.

If you want to mount a directory to be shared between OS X and your CoreOS host, just add the following line with the proper existent paths in the Vagrantfile:
config.vm.synced_folder "/Users/tommy/share", "/home/core/share", id: "core", :nfs => true, :mount_options => ['nolock,vers=3,udp']

Happy hacking!

Past and upcoming events

We have been unusually busy at Diabol during the past few months, speaking at various software conferences on a variety of Continuous Delivery related topics. We enjoy sharing our experiences and to meet new and familiar faces to discuss topics that we’re passionate about, such as Continuous Delivery, DevOps and automation.

We have also arranged a Continuous Delivery seminar of our own, which attracted 20 top IT-management professionals from various well known Swedish enterprises. The seminar was a great success, with interesting presentations and good discussions among the attendees.

The next upcoming event where we will be presenting is the first edition of the Continuous Delivery Conference in Bussum, Netherlands on December 4th. Andreas Rehn will present “From dinosaur to unicorn in 12 months: how to push continuous delivery maturity to the next level”.

Past events where we have been presenting lately, together with video recording or presentation material:

If you plan to attend a conference where we’re speaking or just attending, come by and say hi! We look forward talking to you!

Diabol proudly presents Continuous Delivery seminar

Diabol is proud to arrange a seminar completely dedicated to Continuous Delivery, to be kicked off in less than a week on September 30th in Stockholm. This event is an exclusive invite-only event where the top IT-management attendees will learn how Continuous Delivery can help their organization in becoming more efficient in developing and delivering software. Our hand-picked speakers will present how Continuous Delivery and delivery process automation have changed their respective organizations in becoming lean business machines. Instead of dealing with painful manual repetitive tasks which are commonly associated with a traditional release and deploy process, their employees can now focus on innovation and to create business value.

Event speakers:

  • Stefan Berg, former CIO at Com Hem will present: “From average to top performer in less than a year!”
  • Tomas Riha, Agile Architect at Volvo Group Telematics will present: “From hobby project to Continuous Delivery as a Service for the entire organization”

Make sure you keep visiting this channel for more news on Continuous Delivery!

Feature switches in practice

Feature switches (or feature flags, toggles etc) is a programming technique which has gained a lot of attention through the concepts of Trunk Based Development and Continuous Delivery. Feature switches allows you to shield not yet production ready code while still being committed to mainline in version control. This allows you to work on development tasks on mainline and to continuously integrate your code while avoiding the burdens of branching. Another useful benefit is that you can decide which functionality to run in production by switching functionality on/off. The best thing is that this technique is very easy to implement, you basically just need to start doing it! In this blog post I’ll show you how easy it is to do this in Java.

In my current project we are integrating to a third party service which our system depends heavily on. While our system will continue to work if that third party service becomes unavailable, it still means a loss in revenue to the business. Therefore we want to be able to monitor this integration point closely and provide mechanisms to be able to troubleshoot it efficiently. As the communication between these systems are web service based through SOAP, we found it very useful to be able to log the entire payloads sent and received between the two systems. This feature is an ideal candidate for feature switching.

I implemented a feature which allows us to decide in runtime whether we should log every SOAP message sent and received to a file system. This would also happen asynchronously to not affect application throughput too much. This feature would be switched off in production by default, but would allow us to turn it on if we needed to troubleshoot integration failures.

The most basic feature switch to implement would just be a simple if-statement:

boolean xmlLogFeatureIsEnabled = false;
if (xmlLogFeatureIsEnabled) {
	logToFile(xml);
}

But instead of hardcoding the feature switch state, we want this to be dynamically evaluated so we can change the behavior on a running system without the need for restarts or too much manual labor. To be able to do this we use a small framework called Togglz, which allows you to very easily create feature switches which you then can manage in runtime.

First, we create a feature definition enumeration which implements org.togglz.core.Feature:

public enum FeatureDefinition implements Feature {

    @Label("Log XML to file")
    LOG_XML_TO_FILE;

    public boolean isActive() {
        return FeatureContext.getFeatureManager().isActive(this);
    }
}

Then, we implement org.togglz.core.manager.TogglzConfig which will keep track of the feature states:

@ApplicationScoped
public class FeatureConfiguration implements TogglzConfig {

    @Resource
    private Datasource datasource;

    public Class<? extends Feature> getFeatureClass() {
        return FeatureDefinition.class;
    }

    public StateRepository getStateRepository() {
        return new CachingStateRepository(new JDBCStateRepository(datasource), 10, TimeUnit.MINUTES);
    }

    public UserProvider getUserProvider() {
        return new NoOpUserProvider();
    }
}

We use dependency injection in our project, so this allows us to easily inject a datasource in our feature configuration which Togglz can use to store the feature states in. We then apply a 10 minute cache for the feature state reload so that Togglz won’t have to look up the state in the database for each time a feature state is evaluated. Please note that you might want to implement the configuration a bit more robust than in the example above. When we want to switch a feature on/off it is merely a matter of updating a database column value.

At last, we just change the if-statement encapsulating the feature method call to:

if (FeatureDefinition.LOG_XML_TO_FILE.isActive()) {
    logToFile(xml);
}

And that’s it! This is all we need to do to be able to dynamically switch features on/off in a running Java system. This technique is very useful when exercising Continuous Delivery ways of working where each commit is a potential production release. As you can see, feature switches allows you to commit your changes to version control without necessarily expose them to your end users.

To see this in action, feel free to check out my Togglz example project which uses a simple servlet to demonstrate the behavior.

 

Tommy Tynjä
@tommysdk

Slimmed down immutable infrastructure

Last weekend we had a hackathon at Diabol. The topics somehow related to DevOps and Continuous Delivery. My group of four focused on slim microservices with immutable infrastructure. Since we believe in automated delivery pipelines for software development and infrastructure setup, the next natural step would be to merge these two together. Ideally, one would produce a machine image that contains everything needed to run the current application. The servers would be immutable, since we don’t want anyone doing manual changes to a running environment. Rather, the changes should be checked in to version control and a new server would be created based on the automated build pipeline for the infrastructure.

The problem with traditional machine images running on e.g. VMware or Amazon is that they tend to very large in size, a couple of gigabytes is not an unusual size. Images of that size become cumbersome to work with as they take a long time to create and ship over a network. Therefore it is desirable to keep server images as small as possible, especially since you might create and tear down servers ad-hoc for e.g. test purposes in your delivery pipeline. Linux is a very common server operating system but many Linux distributions are shipped with features that we are very unlikely to ever be using on a server, such as C compilers or utility programs. But since we adopt immutable servers, we don’t even need things as editors, man pages or even ssh!

Docker is an interesting solution for slimmed down infrastructure and full stack machine images which we evaluated during the hackathon. After getting our hands dirty after a couple of hours, we were quite pleased with its capabilities. We’ll definitely keep it on our radar and continue with our evaluation of it.

Since we’re mostly operating in the Java space, I also spent some time looking at how we could save some size on our machine images by potentially slimming down the JVM. Since a delivery pipeline will be triggered several times a day to deploy, test etc, every megabyte saved will increase the pipeline throughput. But why should you slim down the JVM? Well the JVM also contains features (or libraries) that are highly unlikely to ever be used on a server, such as audio, the awt and Swing UI frameworks, JavaFX, fonts, cursor images etc. The standard installation of the Java 8 JRE is around 150 MB. It didn’t take long to shave off a third of that size by removing libraries such as the aforementioned ones. Unfortunately the core library of Java, rt.jar is 66 MB of size, which is a constraint for the minimal possible size of a working JVM (unless you start removing the class files inside it too). Without too much work, I was able to safely remove a third of the size of the standard JRE installation, landing on a bit under 100 MB of size and still run our application. Although this practice might not be suitable for production use of technical or even legal reasons, it’s still interesting to see how much we typically install on our severs although it’ll never be used. The much anticipated project Jigsaw which will introduce modularity to Java SE has been postponed several times. Hopefully it can be incorporated into Java 9, enabling us to decide which modules we actually want to use for our particular use case.

Our conclusion for the time spent on this topic during the hackathon is that Docker is an interesting alternative to traditional machine image solutions, which not only allows, but also encourages slim servers and immutable infrastructure.

Tommy Tynjä
@tommysdk

Test categorization in deployment pipelines

Have you ever gotten tired of waiting for those long running tests in CI to finish so you can get feedback on your latest code change? Chances are that you have. A common problem is that test suites tend to grow too large, making the feedback loop an enemy instead of a companion. This is a problem when building devilvery pipelines for Continuous Delivery, but also for more traditional approaches to software development. A solution to this problem is to divide your test suite into separate categories, or stages, where tests are grouped according to similarity or type. The categories can then be arranged to execute the quickest and those most likely to fail first, to enable faster feedback to the developers.

An example of a logical grouping of tests in a deployment pipeline:

Commit stage:
* Unit tests
* Component smoke tests
These tests execute fast and will be executed by the developers before commiting changes into version control.

Component tests:
* Component tests
* Integration tests
These tests are to be run in CI and can be further categorized so that e.g. component tests that are most likely to catch failures will execute first, before more thorough testing.

End user tests:
* Functional tests
* User acceptance tests
* Usability/exploratory testing

As development continues, it is important to maintain these test categories so that the feedback loop can be kept as optimal as possible. This might involve moving tests between categories, further splitting up test suites or even grouping categories that might be able to run in parallel.

How is this done in practice? You’ve probably encountered code bases where all these different kind of tests, unit, integration, user acceptance tests have all been scattered throughout the same test source tree. In the Java world, Maven is a commonly used build tool. Generally, its model supports running unit and integration tests separately out of the box, but it still expects tests to be in the same structure, differentiated only with a naming convention. This isn’t practical if you have hundreds or thousands of tests for a single component (or Maven module). To have a maintainable test structure and make effective use of test categorization, splitting up tests in different source trees is desirable, for example such as:

src/test – unit tests
src/test-integration – integration tests
src/test-acceptance – acceptance tests

Gradle is a build tool which makes it easy to leverage from this kind of test categorization. Changing build tool is something that might not be practically possible for many reasons, but it is fully possibile to leverage from Gradles capabilities from your existing build tool. You want to use the right tool for the job, right? Gradle is an excellent tool for this kind of job.

Gradle makes use of source sets to define what source code tree is production code and which is e.g. test code. You can easily define your own source sets, which is something you can use to categorize your tests.

Defining the test categories in the example above can be done in your build.gradle such as:

sourceSets {
  main {
    java {
      srcDir 'src/main/java'
    }
    resources {
      srcDir 'src/main/resources'
    }
  }
  test {
    java {
      srcDir 'src/test/java'
    }
    resources {
      srcDir 'src/test/resources'
    }
  }
  integrationTest {
    java {
      srcDir 'src/test-integration/java'
    }
    resources {
      srcDir 'src/test-integration/resources'
    }
    compileClasspath += sourceSets.main.runtimeClasspath
  }
  acceptanceTest {
    java {
      srcDir 'src/test-acceptance/java'
    }
    resources {
      srcDir 'src/test-acceptance/resources'
    }
    compileClasspath += sourceSets.main.runtimeClasspath
  }
}

To be able to run the different test suites, setup a Gradle task for each test category as appropriate for your component, such as:

task integrationTest(type: Test) {
  description = "Runs integration tests"
  testClassesDir = sourceSets.integrationTest.output.classesDir
  classpath += sourceSets.test.runtimeClasspath + sourceSets.integrationTest.runtimeClasspath
  useJUnit()
  testLogging {
    events "passed", "skipped", "failed"
  }
}

task acceptanceTest(type: Test) {
  description = "Runs acceptance tests"
  testClassesDir = sourceSets.acceptanceTest.output.classesDir
  classpath += sourceSets.test.runtimeClasspath + sourceSets.acceptanceTest.runtimeClasspath
  useJUnit()
  testLogging {
    events "passed", "skipped", "failed"
  }
}

test {
  useJUnit()
  testLogging {
    events "passed", "skipped", "failed"
  }
}

Unit tests in src/test will be run by default. To run integration-tests located in src/test-integration, invoke the integrationTest task by executing “gradle integrationTest”. To run acceptance tests located in src/test-acceptance, invoke the acceptanceTest task by executing “gradle acceptanceTest”. These commands can then be used to tailor your test suite execution throughout your deployment pipeline.

A full build.gradle example file that shows how to setup test categories as described above can be found on GitHub.

The above example shows how tests can be logically grouped to avoid waiting for that one big test suite to run for hours, just to report a test failure on a simple test case that should have been reported instantly during the test execution phase.


Tommy Tynjä
@tommysdk

Continuous Delivery testing levels

This blog post is a summary of thoughts discussed between me, Andreas Rehn (@andreasrehn) and Patrik Boström (@patbos).

A key part of Continuous Delivery is automated testing and even the simplest delivery pipeline will consist of several different testing stages. There is unit tests, integration tests, user acceptance tests etc. But what defines the different test levels?

We realized that we often mean different things regarding each testing level and this was especially true when talking about integration tests. For me, integration tests can be tests that test the integrations within one component, e.g. testing an internal API or integration between a couple of business objects interacting with each other, a database etc. This is how the Arquillian (an integration testing framework for Java) community is referring to integration testing. Another kind of integration tests are those testing an actual integration with e.g. a third party web service. What we’ve been referring to when talking about integration tests in the context of Continuous Delivery, is testing a component in a fully integrated environment and testing the component from the outside, rather than the inside, so called black box testing. These are often more functional by nature.

We came to the conclusion that we would like to redefine the terminology for the latter type of integration testing to avoid confusion and fuzziness. Since these kind of tests are more functional tests, testing the behavior and flows of the component, we decided to start calling these types of tests component tests instead. That leaves us with the following levels of testing in the early stages of a delivery pipeline:

* Unit tests
* Smoke tests
* Component tests
* Integration tests

When should you run the different tests? You want feedback as soon as possible but you don’t want to have a too big test suite too early in the pipeline as this could severely delay the feedback. It’s inefficient to force developers to run a five+ minute build before each commit. Therefore you should divide your test suite into different phases. The first phases typically includes unit tests and smoke tests. The second phase will run the component tests in a fully integrated production like environment. The third phase will execute integration tests, e.g. with Arquillian. Certain integration tests will not need to be run in a fully integrated environment, depending on the context, but there are definitely benefits of running all of them in such an environment. These tests can also test integrations towards databases, third party dependencies etc.

To be fully confident in the quality of your releases you need to make use of these different tests as they all fulfill a specific purpose. It is worth considering though, in what phase certain tests should be placed as you don’t want rerun tests in different phases. If you’re validating an algorithm, the unit test phase is probably the most appropriate phase, while testing your database queries fits well into the integration test phase and user interface and functional tests as component tests. This raises the question, how much should you actually test? As that is a topic on its own, we’ll leave that for another time.

Conclusion:
Unit tests – testing atomic pieces of code on their own. Typically tested with a unit testing framework
Integration tests – putting atomic pieces together to moving parts, testing integration points, internal APIs, database interactions etc. Typically tested with Arquillian and/or with a unit testing framework along with mocks and stubs.
Component tests – functional tests of the component, so called black box testing. Often tested with Selenium, acceptance testing frameworks or through web service calls, depending on the component. Also a subject for testing with Arquillian.

Tommy Tynjä
@tommysdk

Testing the presence of log messages with java.util.logging

Sometimes there is value in creating a unit test to assert that a specific log message actually gets printed. It might be for audit logs or making sure that system misconfigurations get logged properly. A couple of years ago my colleague Daniel blogged about how to create a custom Log4j appender and to use that in your unit tests to assert the presence of certain log messages. Read about it here.

Today I was resolving an issue in the Arquillian (the open source integration testing framework for Java) codebase. This involved in logging a warning in a certain use case. I obviously wanted to test my code by adding a test case for the different use cases, asserting that the log message got printed out correctly. I’ve used the approach of asserting log messages in unit tests many times in the past, but I’ve always used Log4j in those cases. This time around I was forced to solve the problem for plain java.util.logging (JUL) which Arquillian uses. Fun, as I’m always up for a challenge.

What I did was similar to the log4j approach. I need to add a custom log handler which I attach to the logger in the affected class. I create an outputstream, which I attach to a StreamHandler. I then attach the StreamHandler to the logger. As long as I have a reference to the output stream, I can then get the logged contents and use that in my assertions. Example below using JUnit 4:

private static Logger log = Logger.getLogger(AnnotationDeploymentScenarioGenerator.class.getName()); // matches the logger in the affected class
private static OutputStream logCapturingStream;
private static StreamHandler customLogHandler;

@Before
public void attachLogCapturer()
{
  logCapturingStream = new ByteArrayOutputStream();
  Handler[] handlers = log.getParent().getHandlers();
  customLogHandler = new StreamHandler(logCapturingStream, handlers[0].getFormatter());
  log.addHandler(customLogHandler);
}

public String getTestCapturedLog() throws IOException
{
  customLogHandler.flush();
  return logCapturingStream.toString();
}

… then I can use the above methods in my test case:

@Test
public void shouldLogWarningForMismatchingArchiveTypeAndFileExtension() throws Exception
{
  final String expectedLogPart = "unexpected file extension";

  new AnnotationDeploymentScenarioGenerator().generate(
        new TestClass(DeploymentWithMismatchingTypeAndFileExtension.class));

  String capturedLog = getTestCapturedLog();
  Assert.assertTrue(capturedLog.contains(expectedLogPart));
}

 

Tommy Tynjä
@tommysdk

Writing integration tests with an in-memory Mongo DB

As I mentioned in my previous post, I’ve been working closely to the document oritented NoSQL database Mongo DB lately. As an advocate of sustainable systems development (test driven development that is), I took the lead in our team for designing tests for our business logic towards Mongo. For relational databases there are a lot of options, a common solution for testing against relational databases is to use an in-memory database such as H2. For NoSQL databases the options are not always as generous from an automated test perspective. Luckily, we found an in-memory version for Mongo DB from Flapdoodle which is easy to use and fits our use case perfectly. If your (Java) code base relies on Maven, just add the following dependency:

<dependency>
    <groupId>de.flapdoodle.embed</groupId>
    <artifactId>de.flapdoodle.embed.mongo</artifactId>
    <version>1.27</version>
    <scope>test</scope>
</dependency>

Then, in your test class (here based on JUnit), use the provided classes to start an in-memory Mongo DB, preferrably in a @Before method, and tear down the instance in a @After method, as in the example below:

public class TestInMemoryMongo {

    private static final String MONGO_HOST = "localhost";
    private static final int MONGO_PORT = 27777;
    private static final String IN_MEM_CONNECTION_URL = MONGO_HOST + ":" + MONGO_PORT;

    private MongodExecutable mongodExe;
    private MongodProcess mongod;
    private Mongo mongo;

    /**
     * Start in-memory Mongo DB process
     */
    @Before
    public void setup() throws Exception {
        MongodStarter runtime = MongodStarter.getDefaultInstance();
        mongodExe = runtime.prepare(new MongodConfig(Version.V2_0_5, MONGO_PORT, Network.localhostIsIPv6()));
        mongod = mongodExe.start();
        mongo = new Mongo(MONGO_HOST, MONGO_PORT);
    }

    /**
     * Shutdown in-memory Mongo DB process
     */
    @After
    public void teardown() throws Exception {
        if (mongod != null) {
            mongod.stop();
            mongodExe.stop();
        }
    }

    @Test
    public void shouldAssertSomeInteractionWithMongo() {
        // Create a connection to Mongo using the IN_MEM_CONNECTION_URL property
        // Execute some business logic towards Mongo
        // Assert the expected the behaviour
    }

    protected MongoPersistenceContext getMongoPersistenceContext() {
        // Returns an instance of the class containing business logic towards Mongo
        return new MongoPersistenceContext();
    }
}

… then you have an in-memory Mongo DB available for your test cases. The private member named “mongo” in the example above is of type com.mongodb.Mongo, which you can use for interaction with the Mongo database. As you notice, all you need is basically JUnit och Mongo, nothing more. But life is not always as easy as the simplest examples. In our case, we leverage from EJBs and other components which relies on running inside an container. As I’m contributor to the JBoss Arquillian project, a Java framework for integration tests, I was obviously curious about trying the approach of making the in-memory available when executing tests inside the container. If you use Arquillian already like we do, the transition is smooth. Just make sure to put the in-memory Mongo and Java driver on your classpath. The following Arquillian test extends the JUnit test class above, but with a bean injection for the business class under test:

@RunWith(Arquillian.class)
public class TestInMemoryMongoWithArquillian extends TestInMemoryMongo {

    @Deployment
    public static Archive getDeployment() {
        return ShrinkWrap.create(WebArchive.class, "mongo.war")
                .addClass(MongoPersistenceContext.class)
                .addAsManifestResource(EmptyAsset.INSTANCE, "beans.xml")
                .addAsWebInfResource(new StringAsset("<web-app></web-app>"), "web.xml")
                .addAsLibraries(DependencyResolvers.use(MavenDependencyResolver.class)
                        .artifact("de.flapdoodle.embed:de.flapdoodle.embed.mongo:jar:1.27")
                        .artifact("org.mongodb:mongo-java-driver:jar:2.9.1")
                        .resolveAs(JavaArchive.class));
    }

    @Inject MongoPersistenceContext mpc;

    @Override
    protected MongoPersistenceContext getMongoPersistenceContext() {
        return mpc;
    }
}

I’ve uploaded a full example project, based on Java 7, Maven, Mongo DB, Arquillian and Apache TomEE (embedded) on GitHub to demonstrate how to use an in-memory Mongo DB in a unit test as well as with Arquillian inside TomEE. This should serve as a good base when starting to write automated tests for your business logic towards Mongo.

 

Tommy Tynjä
@tommysdk

Applying indexes in Mongo DB

For the past year I’ve been consulting at a client where we’ve been using the document oriented NoSQL database Mongo DB in production (currently v2.0.5). Primarily we store PDF documents together with some arbitrary metadata, but in some use cases we also store a lot of completely dynamic documents, where there might be no similar columns shared between documents in the same collection.

For one of our collections, which holds PDF documents in a GridFS structure (~ 1 TB of data/node), we sometimes need to query for documents based on a couple of certain keys. If these keys are not indexed, queries can take a very long time to execute. How indexes are handled are very well explained in the Mongo documentation. Per default, GridFS provides us indexes for the _id, filename + uploadDate fields. To view indexes on the current collection, execute the following command from the Mongo shell:

db.myCollection.getIndexes();

Ideally, you want your indexes to reside completely within RAM. The following command returns the size of the current index:

db.myCollection.totalIndexSize();

To apply a new index for keyX and keyY, execute the following command:

db.myCollection.ensureIndex({"keyX":1, "keyY":1});

Applying the index took roughly about a minute per node in our environment. The index should be displayed when executing db.myCollection.getIndexes();

{
        "v" : 1,
        "key" : {
                "keyX" : 1,
                "keyY" : 1
        },
        "ns" : "myDatabase.myCollection",
        "name" : "keyX_1_keyY_1"
}

After applying the index, assure that the total size of the index is still managable (less than avaiable memory). Now, executing a query based on the indexed fields should yield its result much faster:

db.myCollection.find({"keyX":9281,"keyY":3270});
Tommy Tynjä
@tommysdk