How to validate your yaml files from command line

I like using Hiera with Puppet. In my  puppet pipeline I just added YAML syntax validation for the Hiera files in the compile step. Here’s how:

# ...
GIT_DIFF_CMD="git diff --name-only --diff-filter=ACMR $OLD_REVISION $REVISION"
declare -i RESULT=0
set +e # Don't exit on error. Collect the errors instead.
YAML_PATH_LIST=`$GIT_DIFF_CMD | grep -F 'hieradata/' | grep -F '.yaml'`
echo 'YAML files to check syntax:'; echo "$YAML_PATH_LIST"; echo "";
  ruby -e "require 'yaml'; YAML.load_file('${YAML_PATH}')"
# ...
exit $RESULT

The line in bold does the actual validation.

If you read my previous post you can see that we have managed to migrated to git. Hurray!

Introducing Delivery Pipeline Plugin for Jenkins

In Continuous Delivery visualisation is one of the most important areas. When using Jenkins as a build server it is now possible with the Delivery Pipeline Plugin to visualise one or more delivery pipelines in the same view even in full screen. Perfect for information radiators.

The plugin uses the upstream/downstream dependencies of jobs to visualize the pipelines.

Fullscreen view
Work view

A pipeline consists of several stages, usually one stage will be the same as one job in Jenkins. An example of a pipeline which can consist of both build, unit test, packaging and analyses the pipeline can be quite long if every Jenkins job is a stage. So in the Delivery Pipeline Plugin it is possible to group jobs into the same stage, calling the Jenkins jobs tasks instead.





The version showed in the header is the version/display name of the first Jenkins job in the pipeline, so the first job has to define the version.

The plugin also has possibility to show what we call a Aggregated View which shows the latest execution of every stage and displays the version for  that stage.

Is your delivery pipeline an array or a linked list?

The fundamental data structure of a delivery pipeline and its implications

A delivery pipeline is a system. A system is something that consists of parts that create a complex whole, where the essence lies largely in the interaction between the parts. In a delivery pipeline we can see the activities in it (build, test, deploy, etc.) as the parts, and their input/output as the interactions. There are two fundamental ways to define interactions in order to organize a set of parts into a whole, a system:

  1. Top-level orchestration, aka array
  2. Parts interact directly with other parts, aka linked list

You could also consider sub-levels of organization. This would form a tree. The sub-level of interaction could be defined in the same way as its parents or not.

My question is: Is one approach better than the other for creating delivery pipelines?

I think the number one requirement on a pipeline is maintainability. So better here would mean mainly more maintainable, that is: easier and quicker to create, to reason about, to reuse, to modify, extend and evolve even for a large number of complex pipelines. Let’s review the approaches in the context of delivery pipelines:

1. Top-level orchestration

This means having one config (file) that defines the whole pipeline. It is like an array.

An example config could look like this:

  scm: commit
  build: number
  scm: github org=Diabol repo=delivery-pipeline-plugin.git
  - name: commit
      - build
      - unit_test
  - name: test
      env: test
      - deploy: continue_on_fail=true
      - smoke_test
      - system_test
  - name: prod
      env: prod
      - deploy
      - smoke_test

The tasks, like build, is defined (in isolation) elsewhere. TravisBamboo and Go does it this way.

2. Parts interact directly

This means that as part of the task definition, you have not only the main task itself, but also what should happen (e.g. trigger other jobs) when the task success or fails. It is like a linked list.

An example task config:

name: build
  - scm: github org=Diabol repo=delivery-pipeline-plugin.git
  - mvn: install
  - email: committer
    when: on_fail
  - trigger: deploy_test
    when: on_success

The default way of creating pipelines in Jenkins seems to be this approach: using upstream/downstream relationships between jobs.


There is also a supplementary approach to create order: Tagging parts, aka Inversion of Control. In this case, the system materializes bottom-up. You could say that the system behavior is an emerging property. An example config where the tasks are tagged with a stage:

- name: build
  stage: commit
    - mvn: install

- name: integration_test
  stage: commit
    - mvn: verify -PIT

Unless complemented with something, there is no way to order things in this approach. But it’s useful for adding another layer of organization, e.g. for an alternative view.

Comparisons to other systems

Maybe we can enlighten our question by comparing with how we organize other complex system around us.

Example A: (Free-market) Economic Systems, aka getting a shirt

1. Top-level organization

Go to the farmer, buy some cotton, hand it to weaver, get the fabric from there and hand that to the tailor together with size measures.

2. Parts interact directly

There are some variants.

  1. The farmer sells the cotton to the weaver, who sells the fabric to the tailor, who sews a lot of shirts and sells one that fits.
  2. Buy the shirt from the tailor, who bought the fabric from the weaver, who bought the cotton from the farmer.
  3. The farmer sells the cotton to a merchant who sells it to the weaver. The weaver sells the fabric to a merchant who sells it to the tailor. The tailor sells the shirts to a store. The store sells the shirts.

The variations is basically about different flow of information, pull or push, and having middle-mens or not.


Economic systems tends to be organized the second way. There is an efficient system coordination mechanism through demand and supply with price as the deliberator, ultimately the system is driven by the self-interest of the actors. It’s questionable whether this is a good metaphor for a delivery pipeline. You can consider deploying the artifact as the interest of a deploy job , but what is the deliberating (price) mechanism? And unless we have a common shared value measurement, such as money, how can we optimize globally?

Example B: Assembly line, aka build a car

Software process has historically suffered a lot from using broken metaphors to factories and construction, but lets do it anyway.

1. Top-level organization

The chief engineer designs the assembly line using the blueprints. Each worker knows how to do his task, but does not know what’s happening before or after.

2. Parts interact directly

Well, strictly this is more of an old style work shop than an assembly line. The lathe worker gets some raw material, does the cylinders and brings them to the engine assembler, who assembles the engine and hands that over to …, etc.


It seems the assembly line approach has won, but not in the tayloristic approach. I might do the wealth of experiences and research on this subject injustice by oversimplification here, but to me it seems that two frameworks for achieving desired quality and cost when using an assembly line has emerged:

  1. The Toyota way: The key to quality and cost goals is that everybody cares and that the everybody counts. Everybody is concerned about global quality and looks out for improvements, and everybody have the right to ‘stop the line’ if their is a concern. The management layer underpins this by focusing on the long term goals such as the global quality vision and the learning organization.
  2. Teams: A multi-functional team follows the product from start to finish. This requires a wider range of skills in a worker so it entails higher labour costs. The benefit is that there is a strong ownership which leads to higher quality and continuous improvements.

The approaches are not mutually exclusive and in software development we can actually see both combined in various agile techniques:

  • Continuous improvement is part of Scrum and Lean for Software methodologies.
  • It’s all team members responsibility if a commit fails in a pipeline step.


For parts interacting directly it seems that unless we have an automatic deliberation mechanism we will need a ‘planned economy’, and that failed, right? And top-level organization needs to be complemented with grass root level involvement or quality will suffer.


My take is that the top-level organization is superior, because you need to stress the holistic view. But it needs to be complemented with the possibility for steps to be improved without always having to consider the whole. This is achieved by having the team that uses the pipeline own it and management supporting them by using modern lean and agile management ideas.

Final note

It should be noted that many desirable general features of a system framework that can ease maintenance if rightly used, such as inheritance, aggregation, templating and cloning, are orthogonal to the organizational principle we talk about here. These features can actually be more important for maintainability. But my experience is that the organizational principle puts a cap on the level of complexity you can manage.

Marcus Philip