Category Archives: Continuous Integration

Amazon Cloud Certification

AWS is extremely hot right now — according to Ryan Kroonenburg from A Cloud Guru, it is estimated there is a skills shortage of approximately 1.7 million certified AWS professionals worldwide. The cloud is now the uncontested default for startups and for reorganisations of IT infrastructure in general.

Up until quite recently, there used to be five AWS certifications:

AWS certifications, 5

The Associate ones can be approached in any order, but it’s considered a good idea to start with the Solutions Architect Associate as it lays the foundations for almost everything else, then the Developer Associate, and finally the Sysops Administrator Associate certification.

The two Professional ones are considered difficult, the AWS Solution Architect Professional in particular: some people say it’s one of the toughest IT certifications in the field. Both require extensive preparation; practical experience with AWS is of course invaluable.

To progress to the Professional certifications, Associate level certifications are required:

AWS certifications, tiers

To qualify for the Architect Professional, you must be certified as an Architect Associate. The Devops certification requires you have either the Developer Associate or the Sysops Associate.

After working every day in the Amazon cloud since 2012, I thought it might be a good idea to get certification for the kind of work I’ve already been doingEven more so since many of my colleagues at Diabol AB already have AWS certifications and have worked in the cloud for years. 

So, a few days ago I passed the AWS Solution Architect Associate certification.

aws-certified-solutions-architect-associate

My end goal is to take all five. In addition to the Architect Professional, the Devops Professional certification is particularly interesting given that Diabol AB specialises in Continuous Delivery and has deep expertise in Devops and Lean/Agile methodologies.

So, to prepare for the Pro ones, I’m taking all three Associate certifications. Since their contents overlap to a large extent — I’d say they are 70% similar — it makes sense to study them together. If all goes well, I’ll have all three before the end of this month.

After that, I need to concentrate on a few AWS technologies and services which I haven’t used yet — AWS is such a rich environment now that nobody knows the entire platform: you have to specialise — before I go for the DevOps Professional. I don’t know exactly when I’ll sit for that. And finally, after some further preparation, the AWS Solution Architect Professional.

And to anyone who has one of the Associate level certifications, I’d say: get the two other ones. It’s not that much work.

In the coming weeks, I’ll be posting updates covering the various certifications, including thoughts and reflections on the contents and relative difficulties of the respective exams, and also a little on how I’ve studied for them.

The power of Jenkins JobDSL

At my current client we define all of our Jenkins jobs with Jenkins Job Builder so that we can have all of our job configurations version controlled. Every push to the version control repository which hosts the configuration files will trigger an update of the jobs in Jenkins so that we can make sure that the state in version control is always reflected in our Jenkins setup.

We have six jobs that are essentially the same for each of our projects but with only a slight change of the input parameters to each job. These jobs are responsible for deploying the application binaries to different environments.

Jenkins Job Builder supports templating, but our experience is that those templates usually will be quite hard to maintain in the long run. And since it is just YAML files, you’re quite restricted to what you’re able to do. JobDSL Jenkins job configurations on the other hand are built using Groovy. It allows you to insert actual code execution in your job configurations scripts which can be very powerful.

In our case, with JobDSL we can easily just create one common job configuration which we then just iterate over with the job specific parameters to create the necessary jobs for each environment. We also can use some utility methods written in Groovy which we can invoke in our job configurations. So instead of having to maintain similar Jenkins Job Builder configurations in YAML, we can do it much more consise with JobDSL.

Below, an example of a JobDSL configuration file (Groovy code), which generates six jobs according to the parameterized job template:

class Utils {
    static String environment(String qualifier) { qualifier.substring(0, qualifier.indexOf('-')).toUpperCase() }
    static String environmentType(String qualifier) { qualifier.substring(qualifier.indexOf('-') + 1)}
}

[
    [qualifier: "us-staging"],
    [qualifier: "eu-staging"],
    [qualifier: "us-uat"],
    [qualifier: "eu-uat"],
    [qualifier: "us-live"],
    [qualifier: "eu-live"]
].each { Map environment ->

    job("myproject-deploy-${environment.qualifier}") {
        description "Deploy my project to ${environment.qualifier}"
        parameters {
            stringParam('GIT_SHA', null, null)
        }
        scm {
            git {
                remote {
                    url('ssh://git@my_git_repo/myproject.git')
                    credentials('jenkins-credentials')
                }
                branch('$GIT_SHA')
            }
        }
        deliveryPipelineConfiguration("Deploy to " + Utils.environmentType("${environment.qualifier}"),
                Utils.environment("${environment.qualifier}") + " environment")
        logRotator(-1, 30, -1, -1)
        steps {
            shell("""deployment/azure/deploy.sh ${environment.qualifier}""")
        }
    }
}

If you need help getting your Jenkins configuration into good shape, just contact us and we will be happy to help you! You can read more about us at diabol.se.

Tommy Tynjä
@tommysdk

Test categorization in deployment pipelines

Have you ever gotten tired of waiting for those long running tests in CI to finish so you can get feedback on your latest code change? Chances are that you have. A common problem is that test suites tend to grow too large, making the feedback loop an enemy instead of a companion. This is a problem when building devilvery pipelines for Continuous Delivery, but also for more traditional approaches to software development. A solution to this problem is to divide your test suite into separate categories, or stages, where tests are grouped according to similarity or type. The categories can then be arranged to execute the quickest and those most likely to fail first, to enable faster feedback to the developers.

An example of a logical grouping of tests in a deployment pipeline:

Commit stage:
* Unit tests
* Component smoke tests
These tests execute fast and will be executed by the developers before commiting changes into version control.

Component tests:
* Component tests
* Integration tests
These tests are to be run in CI and can be further categorized so that e.g. component tests that are most likely to catch failures will execute first, before more thorough testing.

End user tests:
* Functional tests
* User acceptance tests
* Usability/exploratory testing

As development continues, it is important to maintain these test categories so that the feedback loop can be kept as optimal as possible. This might involve moving tests between categories, further splitting up test suites or even grouping categories that might be able to run in parallel.

How is this done in practice? You’ve probably encountered code bases where all these different kind of tests, unit, integration, user acceptance tests have all been scattered throughout the same test source tree. In the Java world, Maven is a commonly used build tool. Generally, its model supports running unit and integration tests separately out of the box, but it still expects tests to be in the same structure, differentiated only with a naming convention. This isn’t practical if you have hundreds or thousands of tests for a single component (or Maven module). To have a maintainable test structure and make effective use of test categorization, splitting up tests in different source trees is desirable, for example such as:

src/test – unit tests
src/test-integration – integration tests
src/test-acceptance – acceptance tests

Gradle is a build tool which makes it easy to leverage from this kind of test categorization. Changing build tool is something that might not be practically possible for many reasons, but it is fully possibile to leverage from Gradles capabilities from your existing build tool. You want to use the right tool for the job, right? Gradle is an excellent tool for this kind of job.

Gradle makes use of source sets to define what source code tree is production code and which is e.g. test code. You can easily define your own source sets, which is something you can use to categorize your tests.

Defining the test categories in the example above can be done in your build.gradle such as:

sourceSets {
  main {
    java {
      srcDir 'src/main/java'
    }
    resources {
      srcDir 'src/main/resources'
    }
  }
  test {
    java {
      srcDir 'src/test/java'
    }
    resources {
      srcDir 'src/test/resources'
    }
  }
  integrationTest {
    java {
      srcDir 'src/test-integration/java'
    }
    resources {
      srcDir 'src/test-integration/resources'
    }
    compileClasspath += sourceSets.main.runtimeClasspath
  }
  acceptanceTest {
    java {
      srcDir 'src/test-acceptance/java'
    }
    resources {
      srcDir 'src/test-acceptance/resources'
    }
    compileClasspath += sourceSets.main.runtimeClasspath
  }
}

To be able to run the different test suites, setup a Gradle task for each test category as appropriate for your component, such as:

task integrationTest(type: Test) {
  description = "Runs integration tests"
  testClassesDir = sourceSets.integrationTest.output.classesDir
  classpath += sourceSets.test.runtimeClasspath + sourceSets.integrationTest.runtimeClasspath
  useJUnit()
  testLogging {
    events "passed", "skipped", "failed"
  }
}

task acceptanceTest(type: Test) {
  description = "Runs acceptance tests"
  testClassesDir = sourceSets.acceptanceTest.output.classesDir
  classpath += sourceSets.test.runtimeClasspath + sourceSets.acceptanceTest.runtimeClasspath
  useJUnit()
  testLogging {
    events "passed", "skipped", "failed"
  }
}

test {
  useJUnit()
  testLogging {
    events "passed", "skipped", "failed"
  }
}

Unit tests in src/test will be run by default. To run integration-tests located in src/test-integration, invoke the integrationTest task by executing “gradle integrationTest”. To run acceptance tests located in src/test-acceptance, invoke the acceptanceTest task by executing “gradle acceptanceTest”. These commands can then be used to tailor your test suite execution throughout your deployment pipeline.

A full build.gradle example file that shows how to setup test categories as described above can be found on GitHub.

The above example shows how tests can be logically grouped to avoid waiting for that one big test suite to run for hours, just to report a test failure on a simple test case that should have been reported instantly during the test execution phase.


Tommy Tynjä
@tommysdk

How to validate your yaml files from command line

I like using Hiera with Puppet. In my  puppet pipeline I just added YAML syntax validation for the Hiera files in the compile step. Here’s how:

# ...
GIT_DIFF_CMD="git diff --name-only --diff-filter=ACMR $OLD_REVISION $REVISION"
declare -i RESULT=0
set +e # Don't exit on error. Collect the errors instead.
YAML_PATH_LIST=`$GIT_DIFF_CMD | grep -F 'hieradata/' | grep -F '.yaml'`
echo 'YAML files to check syntax:'; echo "$YAML_PATH_LIST"; echo "";
for YAML_PATH in $YAML_PATH_LIST; do
  ruby -e "require 'yaml'; YAML.load_file('${YAML_PATH}')"
  RESULT+=$?
done
# ...
exit $RESULT

The line in bold does the actual validation.

If you read my previous post you can see that we have managed to migrated to git. Hurray!

Is your delivery pipeline an array or a linked list?

The fundamental data structure of a delivery pipeline and its implications

A delivery pipeline is a system. A system is something that consists of parts that create a complex whole, where the essence lies largely in the interaction between the parts. In a delivery pipeline we can see the activities in it (build, test, deploy, etc.) as the parts, and their input/output as the interactions. There are two fundamental ways to define interactions in order to organize a set of parts into a whole, a system:

  1. Top-level orchestration, aka array
  2. Parts interact directly with other parts, aka linked list

You could also consider sub-levels of organization. This would form a tree. The sub-level of interaction could be defined in the same way as its parents or not.

My question is: Is one approach better than the other for creating delivery pipelines?

I think the number one requirement on a pipeline is maintainability. So better here would mean mainly more maintainable, that is: easier and quicker to create, to reason about, to reuse, to modify, extend and evolve even for a large number of complex pipelines. Let’s review the approaches in the context of delivery pipelines:

1. Top-level orchestration

This means having one config (file) that defines the whole pipeline. It is like an array.

An example config could look like this:

globals:
  scm: commit
  build: number
triggers:
  scm: github org=Diabol repo=delivery-pipeline-plugin.git
stages:
  - name: commit
    tasks:
      - build
      - unit_test
  - name: test
    vars:
      env: test
    tasks:
      - deploy: continue_on_fail=true
      - smoke_test
      - system_test
  - name: prod
    vars:
      env: prod
    tasks:
      - deploy
      - smoke_test

The tasks, like build, is defined (in isolation) elsewhere. TravisBamboo and Go does it this way.

2. Parts interact directly

This means that as part of the task definition, you have not only the main task itself, but also what should happen (e.g. trigger other jobs) when the task success or fails. It is like a linked list.

An example task config:

name: build
triggers:
  - scm: github org=Diabol repo=delivery-pipeline-plugin.git
steps:
  - mvn: install
post:
  - email: committer
    when: on_fail
  - trigger: deploy_test
    when: on_success

The default way of creating pipelines in Jenkins seems to be this approach: using upstream/downstream relationships between jobs.

Tagging

There is also a supplementary approach to create order: Tagging parts, aka Inversion of Control. In this case, the system materializes bottom-up. You could say that the system behavior is an emerging property. An example config where the tasks are tagged with a stage:

- name: build
  stage: commit
  steps:
    - mvn: install
    ...

- name: integration_test
  stage: commit
  steps:
    - mvn: verify -PIT
  ...

Unless complemented with something, there is no way to order things in this approach. But it’s useful for adding another layer of organization, e.g. for an alternative view.

Comparisons to other systems

Maybe we can enlighten our question by comparing with how we organize other complex system around us.

Example A: (Free-market) Economic Systems, aka getting a shirt

1. Top-level organization

Go to the farmer, buy some cotton, hand it to weaver, get the fabric from there and hand that to the tailor together with size measures.

2. Parts interact directly

There are some variants.

  1. The farmer sells the cotton to the weaver, who sells the fabric to the tailor, who sews a lot of shirts and sells one that fits.
  2. Buy the shirt from the tailor, who bought the fabric from the weaver, who bought the cotton from the farmer.
  3. The farmer sells the cotton to a merchant who sells it to the weaver. The weaver sells the fabric to a merchant who sells it to the tailor. The tailor sells the shirts to a store. The store sells the shirts.

The variations is basically about different flow of information, pull or push, and having middle-mens or not.

Conclusion

Economic systems tends to be organized the second way. There is an efficient system coordination mechanism through demand and supply with price as the deliberator, ultimately the system is driven by the self-interest of the actors. It’s questionable whether this is a good metaphor for a delivery pipeline. You can consider deploying the artifact as the interest of a deploy job , but what is the deliberating (price) mechanism? And unless we have a common shared value measurement, such as money, how can we optimize globally?

Example B: Assembly line, aka build a car

Software process has historically suffered a lot from using broken metaphors to factories and construction, but lets do it anyway.

1. Top-level organization

The chief engineer designs the assembly line using the blueprints. Each worker knows how to do his task, but does not know what’s happening before or after.

2. Parts interact directly

Well, strictly this is more of an old style work shop than an assembly line. The lathe worker gets some raw material, does the cylinders and brings them to the engine assembler, who assembles the engine and hands that over to …, etc.

Conclusion

It seems the assembly line approach has won, but not in the tayloristic approach. I might do the wealth of experiences and research on this subject injustice by oversimplification here, but to me it seems that two frameworks for achieving desired quality and cost when using an assembly line has emerged:

  1. The Toyota way: The key to quality and cost goals is that everybody cares and that the everybody counts. Everybody is concerned about global quality and looks out for improvements, and everybody have the right to ‘stop the line’ if their is a concern. The management layer underpins this by focusing on the long term goals such as the global quality vision and the learning organization.
  2. Teams: A multi-functional team follows the product from start to finish. This requires a wider range of skills in a worker so it entails higher labour costs. The benefit is that there is a strong ownership which leads to higher quality and continuous improvements.

The approaches are not mutually exclusive and in software development we can actually see both combined in various agile techniques:

  • Continuous improvement is part of Scrum and Lean for Software methodologies.
  • It’s all team members responsibility if a commit fails in a pipeline step.

Conclusion

For parts interacting directly it seems that unless we have an automatic deliberation mechanism we will need a ‘planned economy’, and that failed, right? And top-level organization needs to be complemented with grass root level involvement or quality will suffer.

Summary

My take is that the top-level organization is superior, because you need to stress the holistic view. But it needs to be complemented with the possibility for steps to be improved without always having to consider the whole. This is achieved by having the team that uses the pipeline own it and management supporting them by using modern lean and agile management ideas.

Final note

It should be noted that many desirable general features of a system framework that can ease maintenance if rightly used, such as inheritance, aggregation, templating and cloning, are orthogonal to the organizational principle we talk about here. These features can actually be more important for maintainability. But my experience is that the organizational principle puts a cap on the level of complexity you can manage.

Marcus Philip
@marcus_phi