Category Archives: DevOps

Conference retrospectives

The first week of June was a busy one for us with representation on a couple of conferences taking place in Stockholm, the inaugural CoDe Continuous Delivery & DevOps conference on June 2nd and Agila Sverige (Agile Sweden) on June 3rd and 4th. Here’s a small retrospective on both of those events.

The CoDe conference was quite small (a bit less than 200 people) but had a very good vibe to it. We were silver sponsors of the event and we had our Jenkins Delivery Pipeline Plugin on display in our booth which gained a lot of interest from attendees. It led to many interesting discussions on CI/CD servers, automation, capabilities and visualization. Our feeling was that the attendees had a lot of genuine interest, knowledge and awareness about Continuous Delivery which fueled these interesting discussions.

Our Marcus Philip presented “Continuous Delivery for Databases” at the CoDe conference, where he talked about how to transition a legacy database application with a lot of PL/SQL code into a streamlined process where all changes are version controlled, traceable and automatically deployed. The talk covered the whole spectrum of the problem, starting with why they weren’t doing Continuous Delivery, why they should do it, thoughts on different solutions to the problem and how they did the actual implementation and how it panned out. Many attendees enjoyed the presentation and Marcus also got the opportunity to present the talk again at the Oracle Stockholm Meetup on June 16th. Cool!

Slides from the talk can be found here: https://speakerdeck.com/marcusphi/continuous-delivery-for-databases.

Agila Sverige is a conference which strongly encourages discussions among the conference attendees. All talks on the conference are lightning talks and there is plenty of time allocated for open space discussions and breaks, allowing you to network and share experiences with others. The conference is in Swedish but is probably one of the best when it comes to agile software development practices. Two or three years ago many talks and discussions on this conference focused on “why to practice agile development practices”. This year it was apparent that the industry has matured on all levels since the majority of talks and open space sessions rather focused on “how to practice agile development practices”.

Tommy Tynjä presented a visionary talk on how tomorrow’s software can be structured and delivered in his presentation “Next Generation Continuous Delivery”. The talk gained a lot of interest, the room was packed for the presentation and Tommy had some interesting discussions on the topic afterwards.

Slides from his talk can be found here: http://www.slideshare.net/tommysdk/next-generation-continuous-delivery
There is also a video recording available at: https://agilasverige.solidtango.com/video/next-generation-continuous-delivery.

We give our thumbs up for both of these conferences and we hope to see you on any of these events next year as well! We always look forward to meet people to discuss thoughts, experiences, problems and visions regarding Continuous Delivery, DevOps and agile software development methodologies. But if you don’t want to wait until next year, don’t hesitate to contact us!

Next week is conference week!

Next week will be a conference heavy one in Stockholm! The inaugural CoDe Continuous Delivery & DevOps conference takes place on June 2nd while Agila Sverige (Agile Sweden) occupies June 3rd and 4th. Both conferences target agile software development practices and Diabol is very well represented on these events. We will have one speaker on each conference presenting talks related to Continuous Delivery.

Marcus Philip will present “Continuous Delivery for databases” on the CoDe conference while Tommy Tynjä will give his talk “Next Generation Continuous Delivery” on Agila Sverige where he takes a look on the software of tomorrow can be built and delivered.

So if you’re interested in learning more about Continuous Delivery and what we do, don’t hesitate to come by!

See you on a conference very soon!

AWS Cloudformation introduction

AWS Cloudformation is a concept for modelling and setting up your Amazon Web Services resources in an automatic and unified way. You create templates that describe your AWS resoruces, e.g. EC2 instances, EBS volumes, load balancers etc. and Cloudformation takes care of all provisioning and configuration of those resources. When Cloudformation provisiones resources, they are grouped into something called a stack. A stack is typically represented by one template.

The templates are specified in json, which allows you to easily version control your templates describing your AWS infrastrucutre along side your application code.

Cloudformation fits perfectly in Continuous Delivery ways of working, where the whole AWS infrastructure can be automatically setup and maintained by a delivery pipeline. There is no need for a human to provision instances, setting up security groups, DNS record sets etc, as long as the Cloudformation templates are developed and maintained properly.

You use the AWS CLI to execute the Cloudformation operations, e.g. to create or delete a stack. What we’ve done at my current client is to put the AWS CLI calls into certain bash scripts, which are version controlled along side the templates. This allows us to not having to remember all the options and arguments necessary to perform the Cloudformation operations. We can also use those scripts both on developer workstations and in the delivery pipeline.

A drawback of Cloudformation is that not all features of AWS are available through that interface, which might force you to create workarounds depending on your needs. For instance Cloudformation does not currently support the creation of private DNS hosted zones within a Virtual Private Cloud. We solved this by using the AWS CLI to create that private DNS hosted zone in the bash script responsible for setting up our DNS configuration, prior to performing the actual Cloudformation operation which makes use of that private DNS hosted zone.

As Cloudformation is a superb way for setting up resources in AWS, in contrast of managing those resources manually e.g. through the web UI, you can actually enforce restrictions on your account so that resources only can be created through Cloudformation. This is something that we currently use at my current client for our production environment setup to assure that the proper ways of workings are followed.

I’ve created a basic Cloudformation template example which can be found here.

Tommy Tynjä
@tommysdk

Agile Configuration Management – part 1

On June 5 I held a lightning talk on Agile Configuration Management at the Agila Sverige 2014 conference. The 10 minute format does not allow for digging very deep. In this series of blog posts I expand on this topic.

The last year I have lead a long journey towards agile and devopsy automated configuration management for infrastructure at my client, a medium sized IT department. It’s part of a larger initiative of moving towards mature continuous delivery. We sit in a team that traditionally has had responsibility to maintain the test environments, but as part of the CD initiative we’ve been pushing to transform this to instead providing and maintaining a delivery platform for all environments.

The infrastructure part was initiated when we were to set up a new system and had a lot of machines to configure for that. Here was a golden window of opportunity to introduce modern configuration management (CM) automation tools. Note that nobody asked us to do this, it was just the only decent thing to do. Consequently, nobody told us what tools to use and how to do it.

The requirement was thus to configure the servers up to the point where our delivery pipeline implemented with Jenkins could deploy the applications, and to maintain them. The main challenge was that we need to support a large amount of java web applications with slightly different configuration requirements.

Goals

So we set out to find tools and build a framework that would support agile and devopsy CM. We’re building something PaaS-like. More specifically the goals we set up were:

  1. Self service model
    It’s important to not create a new silo. We want the developers to be able to get their work done without involving us. There is no configuration manager or other command or control function. The developers are already doing application CM, it’s just not acknowledged as CM.
  2. Infrastructure as Code
    This means that all configuration for servers are managed and versioned together as code, and the code and only the code can affect the configuration of the infrastructure. When we do this we can apply all the good practices we know well from software development such as unit testing, collaboration, diff, merge, etc.
  3. Short lead times for changes
    Short means minutes to hours rather than weeks. Who wants to wait 5 days rather than 5 minutes to see the effect of a change. Speeding up the feedback cycle is the most important factor for being able to experiment, learn and get things done.

Project phases

Our journey had different phases, each with their special context, goals and challenges.

1. Bootstrap

At the outset we address a few systems and use cases. The environments are addressed one after the other. The goal is to build up knowledge and create drafts for frameworks. We evaluate some, but not all tools. Focus is on getting something simple working. We look at Puppet and Ansible but go for the former as Ansible was very new and not yet 1.0. The support systems, such as the puppet master are still manually managed.

We use a centralized development model in this phase. There are few committers. We create a svn repository for the puppet code and the code is all managed together, although we luckily realize already now that it must be structured and modularized, inspired by Craig Dunns blog post.

2. Scaling up

We address more systems and the production environment. This leads to the framework expanding to handle more variations in use cases. There are more committers now as some phase one early adopters are starting to contribute. It’s a community development model. The code is still shared between all teams, but as outlined below each team deploy independently.

The framework is a moving target and the best way to not become legacy is to keep moving:

  • We increase automation, e.g. the puppet installations are managed with Ansible.
  • We migrate from svn to git.
  • Hiera is introduced to separate code and data for puppet.
  • Full pipelines per system are implemented in Jenkins.
    We use the Puppet dynamic environments pattern, have the Puppet agent daemon stopped and use Ansible to trigger a puppet agent run via the Jenkins job to be able to update the systems independently.

The Pipeline

As continuous delivery consultants we wanted of course to build a pipeline for the infrastructure changes we could be proud of.

Steps

  1. Static checks (Parse, Validate syntax, Compile)
  2. Apply to CI  (for all systems)
  3. Apply to TEST (for given system)
  4. Dry-run (–noop) in PROD (for given system)
  5. PROD Release notes/request generation (for given system)
  6. Apply in PROD (for given system)

First two steps are automatic and executed for all systems on each commit. Then the pipeline fork and the rest of the steps are triggered manually per system)

Complexity increases

Were doing well, but the complexity has increased. There is some coupling in that the code base is monolithic and is shared between several teams/systems. There are upsides to this. Everyone benefits from improvements and additions. We early on had to structure the code base and not have a big ball of mud that solves only one use case.

Another form of coupling is that some servers (e.g. load balancers) are shared which forces us to implement blocks in the Jenkins apply jobs so that they do not collide.

There is some unfamiliarity with the development model so there is some uncertainty on the responsibilities – who test and deploy what, when? Most developers including my team are also mainly ignorant on how to test infrastructure.

Looking at our pipeline we could tell that something is not quite all right:

Puppet Complex Pipeline in Jenkins

 

In the next part of this blog series we will see how we addressed these challenges in phase 3: Increase independence and quality.

Introducing Delivery Pipeline Plugin for Jenkins

In Continuous Delivery visualisation is one of the most important areas. When using Jenkins as a build server it is now possible with the Delivery Pipeline Plugin to visualise one or more delivery pipelines in the same view even in full screen. Perfect for information radiators.

The plugin uses the upstream/downstream dependencies of jobs to visualize the pipelines.

delivery-pipeline-plugin
Fullscreen view
screenshot
Work view

A pipeline consists of several stages, usually one stage will be the same as one job in Jenkins. An example of a pipeline which can consist of both build, unit test, packaging and analyses the pipeline can be quite long if every Jenkins job is a stage. So in the Delivery Pipeline Plugin it is possible to group jobs into the same stage, calling the Jenkins jobs tasks instead.

stage

Stage

task

Task

The version showed in the header is the version/display name of the first Jenkins job in the pipeline, so the first job has to define the version.

The plugin also has possibility to show what we call a Aggregated View which shows the latest execution of every stage and displays the version for  that stage.

Metrics, metrics everywhere with Graphite

What useful metrics does you application provide and how accessible are they?
In my experience, many times metrics of any application is bolted on by operations before going live or maybe even afterwards when you start experiencing strange problems and realize that the only way of knowing how the application performs is looking at cpu usage and stuff like that. Even though cpu, io and memory usage can be very helpful for ops it is probably not very useful metrics when looking at how your application performs in business terms.You need to build in metrics to your application and it should be as natural and common as any other logging you put there. Live metrics and stats presented in a appealing graphs are priceless feedback for practically everybody in the organisation ranging from operations, development, marketing, sales and even executives. Since all those people have very different views on what useful metrics are you need to start pushing out metrics of everything. You never know when you need it since it is so easy there’s really no excuse for not doing it. With very little effort you can be the graphing hero and hopefully cool dashboards with customized live metrics graphs will start to pop up everywhere.

Install Graphite

Graphite is a cool little project that allows you to collect/aggregate metrics and in a very easy and flexible way create customized real time graphs on demand. It is a python/django app with a web front that hooks into Apache. The data aggregator is called Carbon and is essentially a python deamon that slurps data from a udp port. The whole “package” can be a bit tricky to install (at least when you are on REL), it depends on some image processing libraries and stuff but you will get it done in an hour or two at them most, just follow the install instructions. Needless to say it must be installed on a server that is accessible from where the applications are running so they can push metrics to it on a udp port, but I’m sure there’s one laying around running some old monitoring tools or something. There are default examples of all config files so once all the python packs and dependencies are installed you will be up n’ running in no time and can start to push metrics to Carbon.

Start pushing metrics

They way you push data to Carbon is extremely easy, just push a udp package (udp for low cost fire-and-forget communication) like this:

node-123.myCoolApplication.enviroment.activeSessions 87 1320316143

The first part is a unique metric key which in a clustered environment also should include the node identifier. The second part is the actual metric value so in this case there are 87 active sessions. The last part is a timestamp.

This kind of metrics should preferably be pushed regularly with some scheduling utility like quartz or similar but you can of course also push metrics as events of business transactions like this:

node-123.myCoolApplication.service.buyBook.success 1 1320316143

In this case I push the metric of the event of 1 book being sold successfully. These metrics will be scattered in time but nevertheless very useful when you look at them cumulative for trends or compare them with other technical metrics.

It is also very important that you measure failures since they can provide powerful insights compared to other metrics. So in buyBook service I would also push this metrics every time it for some reason failed:

node-123.myCoolApplication.service.buyBook.failed 1 1320316143

My advice is to take a few minutes to think about a good naming convention for you metric keys since it will have some impact on they way you can aggregate data and graph it later and you don’t want to change a key once you have started to measure it.

Here’s a simple java utility class that would do the trick:

public class GraphiteLogger {
    private static final Logger LOGGER = LoggerFactory.getLogger(GraphiteLogger.class);
    private String graphiteHost;
    private int graphitePort;
    private boolean enabled;
    private String nodeIdentifier;

    public static GraphiteLogger getDefaultLogger() {
        String gHost =  “localhost”;  // get it from application startup properties or something
        int gPort = 2003 ; // get it from application startup properties or something
        boolean enabled = true; // good thing to have a on/off switch in application config
        return new GraphiteLogger(gHost, gPort, enabled);
    }

    public GraphiteLogger(String graphiteHost, int graphitePort, boolean enabled) {
        this.enabled = enabled;
        this.graphiteHost = graphiteHost;
        this.graphitePort = graphitePort;
        try {
            this.nodeIdentifier = java.net.InetAddress.getLocalHost().getHostName();
        } catch (UnknownHostException ex) {
            LOGGER.warn("Failed to determin host name",ex);
        }
       if (this.graphiteHost==null || this.graphiteHost.length()==0 ||
           this.nodeIdentifier==null || this.nodeIdentifier.length()==0 ||
           this.graphitePort<0 || !logToGraphite("connection.test", 1L))
       {
            LOGGER.warn("Faild to create GraphiteLogger, graphiteHost graphitePost or nodeIdentifier could not be defined properly: " + about());
            this.enabled=false;
        }
    }

    public final String about() {
        return new StringBuffer().append("{ graphiteHost=").append(this.graphiteHost).append(", graphitePort=").append(this.graphitePort).append(", nodeIdentifier=").append(this.nodeIdentifier).append(" }").toString();
    }

    public void logMetric(String key, long value) {
        logToGraphite(key,value);
    }

    public boolean logToGraphite(String key, long value) {
        Map stats = new HashMap();
        stats.put(key, value);
        return logToGraphite(stats);
    }

    public boolean logToGraphite(Map stats) {
        if (stats.isEmpty()) {
            return true;
        }

        try {
            logToGraphite(nodeIdentifier, stats);
        } catch (Throwable t) {
            LOGGER.warn("Can't log to graphite", t);
            return false;
        }
        return true;
    }

    private void logToGraphite(String nodeIdentifier, Map stats) throws Exception {
        Long curTimeInSec = System.currentTimeMillis() / 1000;
        StringBuffer lines = new StringBuffer();
        for (Object entry : stats.entrySet()) {
            Entry stat = (Entry)entry;
            String key = nodeIdentifier + "." + stat.getKey();
            lines.append(key).append(" ").append(stat.getValue()).append(" ").append(curTimeInSec).append("\n"); //even the last line in graphite
        }
        logToGraphite(lines);
    }
    private void logToGraphite(StringBuffer lines) throws Exception {
        if (this.enabled) {
            LOGGER.debug("Writing [{}] to graphite", lines.toString);
            byte[] bytes = lines.toString().getBytes();
            InetAddress address = InetAddress.getByName(graphiteHost);
            DatagramPacket packet = new DatagramPacket(bytes, bytes.length,address, graphitePort);
            DatagramSocket dsocket = new DatagramSocket();
            try {
                dsocket.send();
            } finally {
                dsocket.close();
            }
        }
    }
}

As easy as you log info and debug to your logging framework of choice you can now use this to push technical and business metrics to graphite everywhere in your app:

public class BookService {
private static final GraphiteLogger GRAPHITELOGGER = GraphiteLogger.getDefaultLogger();
    public void buyBook(..) {
        try {
        // do your service stuff
    } catch (ServiceException e) {
        // do your exception handling
        GRAPHITELOGGER.logMetric(“bookstore.service.buyBook.failed”, 1L);
    }
    GRAPHITELOGGER.logMetric(“bookstore.service.buyBook.success”, 1L);
}

Start Graphing

Now when you have got graphite up n’ running and your app is pushing all sorts of useful metrics to it you can start with the fun part, graphing!Graphite comes with a web front for elaborating with graphs, just brows to it on the installed Apache (defaults as document root). There you can browse your metric keys and create graphs in a graph composer, apply misc functions and rendering options etc. From here you can also access the documentation and some experimental feature for flot and events.
However, the really useful interface graphite provides is the url for rendering a graph on demand. This url e.g.:

http://localhost:8000/render?target=keepLastValue(integral(sum(usbeta13.epsos-web.service.*.failed.*)))&target=keepLastValue(integral(sum(usbeta13.epsos-web.service.*.success.*)))&from=20111024

Will give you a png image of a graph of the sum of all service calls (success and failed) accumulated over time from 2011-11-24

Yes, it is that easy!

There’s also a great deal of functions you can apply to your data e.g integral, cumulative, sum, average, max, min, etc and there’s also a lot of parameters to customize the graph with colors, fonts, texts etc. So just go crazy and define all the graphs you can think of and put them on a self-refreshing webpage, embedd them in a wiki or some other dashboard mash-up you may already have.

And if you find the graphs a bit crude and want to do something more fancy you can just pull the raw data by adding these parameter to the url:

&rawData=true&format=csv

And then use your favorite graph tool and do what ever cool trix you want. The formats available are raw | csv | json. A cool thing to try would be to pull the raw data in json format into a grails app and do some cool eye-candy charts with google charts… I’ll put that in the list of cool-things-to-try

Find the useful graphs

Now you have all the tools in place to make really useful dashboards about your applications real business performance in addition to the technical perfomance. You can in real time graph all kinds of interesting stuff and compare metrics that can give you very valuable insight, lets say you are running a business with a site of some sort and you wan’t to see the business impact on new released features, make sure you push metric to graphite when you deploy and then graph deploys vs what ever business metric you are interested in (e.g. sold books), hopefully you will see a boost after each deploy that contains new cool features and if not maybe you have something to think about. Like this you can combine technical metrics and business value metrics to see patterns and trends which can be really useful for a lot of people in the organisation.

Make them visible

Put the graphs on the biggest displays you can find in a place where as many people as possible can see them. Make sure they are updated frequently enough to provide real-time information and continuously improve, create new and remove old graphs that wasn’t really useful. If you don’t have access to big dashboard displays maybe instead write a small script what will pick useful graphs on a daily basis and email them through out the company, just be sure to spread the knowledge that the graphs provide.

And again, don’t forget to measure failures, many times just visualizing the problems in a sometimes painful way to everyone will give a boost on quality because nobody wants to be the bad guy and everybody wants to be a hero like you!

Andreas Rehn
@andreasrehn

DevOps movement

Patrick Debois – jag vill skicka ett tack för att du sett mig! De senaste åren har jag haft stora problem att definiera mig själv och vad min roll är som konsult. Jag märker hur jag börjar svamla lite när kollegor frågar vad jag gör, även om jag alltid vetat att det jag gör och strävar efter har varit helt rätt och mycket viktigt! Jag visste inte vad som saknades innan jag för några månader sedan kom över en presentation från Patrick Debois som definierade en helt ny yrkesroll – DevOps. Plötsligt föll allt på plats! Det var otroligt, det finns fler som jag och några av mina kollegor på Diabol. Fler som förstår att systemutveckling kan göras mycket mer effektivt än vad vi generellt sett gör idag.

Jag har under flera år arbetat som DevOp utan att själv veta om det. Vägen dit för mig har varit via 10+ års Enterprise Java utveckling via en renodlad Applikationsserver (GlassFish) expert som mer naturligt hör till operationsavdelningen och i och med det, en insikt i hur lite dessa två avdelningar (utveckling och operations) pratar med varandra. En brist på kommunikation som är oerhört kostsam.

Det var det första biten som jag började slipa på, men det kändes som att försöka nysta upp en bäverfördämning det var så cementerat att det inte gick med några nya verktyg och välformulerade emails. Det måste in mer tungt artilleri. Ledningen måste förstå problemet och verkligen vilja göra något åt det, så kulturen i företaget ändras i grunden. När man väl är där inser man plötsligt att Developer – Operations problematiken bara är en liten topp av isberget. Det är här hela DevOps rörelsen kommer in på scenen. En DevOp definierar och bygger fabriken som ska producera systemet. Hela flödet, från hur business value bäst omhändertas, idéer och features hanteras, förändringar (koden, konfiguration), kvalitetssäkring och produktionssättning genomförs så effektivt som möjligt. Kultur, process och teknik (tools). Det nya är insikten att detta arbete har ett fundamentalt värde för företaget. Lika stort värde, om inte större än själva systemet som sitter i produktion. Ju bättre detta arbete görs, ju enklare blir det för företaget att kostnadseffektivt och med hög kvalité låta systemen följa företagets och dess marknads utveckling. Bilfabriker har länge slipat på detta. Produktionskostnader, omställningskostnader, kvalitetskostnader, materialkostnader, arbetande kapital, lagerkostnader.. allt är begrepp som är precis lika relevanta för en systemutvecklande it-avdelning, men som det talas alldeles för lite om. Ännu mindre görs något åt. Det man talar om istället är kostnader för löner och timpris på programmerare. Blir det för billiga, så kvaliteten går ner, skickar man utan att blinka in flera testare som åtgärd. Begrepp man hör som buggrättningsperioder, entrycriterias, överlämning, manuellt regressionstest, uppstädnings sprintar, omskrivning, maintenance fönster, planerad nertid, patch release, etc, är alla begrepp som borde generera en stor varningsklocka hos produktägrana: “Varför ska vi lägga tid och pengar på det?”. Nej in med ny kultur, automatiska tester, constant improvement, autmatic deploys, continuous delivery, multifunktionella team, pragmatisk arkitektur. Det är business value!