Category Archives: Continuous Delivery

Retrospective from a JFokus presentation

I have been thrilled to get the chance to present my experience with DevOps and Continuous delivery at JFokus 2012! This is a short story, describing the experience.

Back in october 2011 when my proposal was approved, I immediately started to read the great Presentation Zen book by Garr Reynolds. This book is great! It makes presentation seem really simple and the recommendations are crystal clear. However, it turns out that the normal guidance a bullet point presentation gives, is considered a disaster in the book. This leaves the presentation clean and lean, but I soon realized it also leaves the presenter alone on the stage. Infront of 150 or so persons, you need to remember everything you have planned to say on each slide. I thought I was going to pull it off, but a repetition one week before the event, made me think twice. It was a disaster! But beeng somewhat stubborn, I decided to stick to the plan and luckily it got 100% better at the actual event.

DevOps and Continuous Delivery really is something that I’m passionate about, and I hope that helped in my attempt to deliver something meaningful to the ones that showed up. I got the feeling that people was listening and hopefully left the presentation with some thoughts that can help their companies be more productive and successful.

One question I got after was: “Are anyone actually doing this?” Well, I’d like to pass the question forward. send me a tweet @danielfroding, if you are doing this. I know at least four places, of which I have been highly involved in two, that has implemented an automated deployment pipeline and are developing the system according to Continuous Delivery principles.

My slides are here: http://www.jfokus.se/jfokus12/preso/jf12_RetrospectiveFromTheYearOfDevOps.pdf

JFokus is really a great conference and I’m proud that we have such event in Stockholm! A huge thanks to the organizers for setting this up!

See you next year!

Culture Hacks

On DevOpsDays in Gothenburg in mid October I attended a open session on the topic “Cultural Hacks”. It was one of the most interesting open sessions and I just want to share the ideas that came up.

Why culture needs to be hacked

By definition culture is something that can not be replaced like a tool or even people in an organization and therefore in order to change culture it needs to be “hacked”. So what hacks can you do to start a cultural change in a company? Well I guess it depends on what you want to change but in this discussion it is the old devs vs. ops culture and what we want is a devops culture where developers and operations talk to each other, collaborate and strive for the same goal which is getting good software out the door and into production in a controlled way as often as possible (in one very simplified sentence)

Proposed hacks

  • Metrics that provides transparency though out the company. Measure everything and make it available to everyone. Not only technical metrics like server load, disk io or whatever but also useful business metrics and combine them in every possible way to find the real useful and interesting correlations.
  • Hackathons would hopefully get people who does not normally interact with each other to talk (maybe about something completely out of work-scope) and collaborate and share ideas and thereby learn from each other.
  • Ops engineers in dev-teams with shorter feedback loops and tighter collaboration devs will learn more about ops and infrastructure, how their code behaves in production and what they can do to help in that area. And at the same time making the ops involved in the development process early on and can contribute with deployment scripts, server provisioning scripts, tuning etc and enforce dev requirements that makes deployment and operation tasks better and easier.
  • Daily standsups well this obvious as I see it but nevertheless very important and if you do it right it can defiantly make way for cultural change in a team.
  • Transparent backlog exposing your teams backlog will hopefully create a bigger understanding of what you are doing and why. I guess the main purpose is to enforce better prioritizing and communication between those who requests your time and services.
  • Fail cake funny little harmless hack which means that the one responsible for production failure has to buy cake for the team. Punishment enough for that person and since ever body likes cake no one can be that angry with him/her either. The purpose of this is of course to strive for better quality, embrace failure in the sense that it will happen just learn how handle it and prevent it the next time and of course to learn from others mistakes.
  • Exchange with other non competitor companies This hack proposed a people exchange between two companies that has a lot to learn from each other but does not compete on the same market. I like this one, a day, week month or what you think is appropriate will for sure make people learn new stuff, bring home new good things and also learn to avoid the bad things. I’m sure knowledge exchange happens all the time at conferences and tech talks but to actually exchange people and work at other companies I guess is not very common.
  • Tech talks with external speakers this, hack proposed to lift the tech talks that many companies have but consider very internal, by bringing in external speakers. That would hopefully spice up the discussions and make more people come and learn new stuff. Keep an eye open for when interesting people are in town for some event. Many times a 20 min tech talk over lunch is no big deal to squeeze in and does not have to cost you a lot either since they are already in town.
  • Give root access to developers This hack, proposed by a ops guy of course, I think sound a lot like a “chaos monkey” experiment. However, I think there is a nig psychological point in giving devs root access, saying you have the power to do stuff but also the responsibility to make sure you do not mess up. It will hopefully erase some of the invisible boundaries between development and operations.
  • Draw a picture Simple but also an important thing you can do to spread knowledge, get people to talk and build up truly cross-functional teams. This was also mentioned in Mitchell Hashimotos talk at DevOpsDays in Gothenburg as key part of bringing devs closer to ops.
  • Framing problems To be honest I can’t really remember what this hack was about. Suggestions are welcome…
  • Make people feel safe, give all the credit and take all the blame Good way of getting people that are reluctant to change to take the first step and try something new.
  • Take advantage of compelling events I can’t remember exactly how the discussion around this was but I guess it is about keeping your eye open for things that you can use as an “excuse” to impose a change that normally would just be rejected.
  • Subjective metrics Funny little pretty harmless hack that I’ve seen around. E.g. letting each individual in the team present a smiley of their mood. Purpose is to create an more open environment and encourage communication. You can track the level of satisfaction in a team and maybe do some correlation to how they are actually performing. However, you have to be careful not to pressure peoples personal integrity.
  • Force to set “confident level” on every checkin I guess this is somewhat related to the subjective metrics above. I like the idea, and I think the cultural change it will hopefully create is to get people to think more about what quality of code they are checking in. If someone checks in with low confident level you can ask them why, are you checking in crappy code? If they check in with high confident level you can also ask them if they are so sure this will work and not break anything? I guess it will be like when you first start off with scrum , the first times the team will be over-optimistic but in time they till learn where their level is. I’ve also seen related subjective metrics e.g. “commit karma” where everybody starts at 100% and is decreased if their commit breaks something. Someone with low karma has a harder time getting their code out in production than someone with high karma.
 
 

Andreas Rehn
@andreasrehn

Metrics, metrics everywhere with Graphite

What useful metrics does you application provide and how accessible are they?
In my experience, many times metrics of any application is bolted on by operations before going live or maybe even afterwards when you start experiencing strange problems and realize that the only way of knowing how the application performs is looking at cpu usage and stuff like that. Even though cpu, io and memory usage can be very helpful for ops it is probably not very useful metrics when looking at how your application performs in business terms.You need to build in metrics to your application and it should be as natural and common as any other logging you put there. Live metrics and stats presented in a appealing graphs are priceless feedback for practically everybody in the organisation ranging from operations, development, marketing, sales and even executives. Since all those people have very different views on what useful metrics are you need to start pushing out metrics of everything. You never know when you need it since it is so easy there’s really no excuse for not doing it. With very little effort you can be the graphing hero and hopefully cool dashboards with customized live metrics graphs will start to pop up everywhere.

Install Graphite

Graphite is a cool little project that allows you to collect/aggregate metrics and in a very easy and flexible way create customized real time graphs on demand. It is a python/django app with a web front that hooks into Apache. The data aggregator is called Carbon and is essentially a python deamon that slurps data from a udp port. The whole “package” can be a bit tricky to install (at least when you are on REL), it depends on some image processing libraries and stuff but you will get it done in an hour or two at them most, just follow the install instructions. Needless to say it must be installed on a server that is accessible from where the applications are running so they can push metrics to it on a udp port, but I’m sure there’s one laying around running some old monitoring tools or something. There are default examples of all config files so once all the python packs and dependencies are installed you will be up n’ running in no time and can start to push metrics to Carbon.

Start pushing metrics

They way you push data to Carbon is extremely easy, just push a udp package (udp for low cost fire-and-forget communication) like this:

node-123.myCoolApplication.enviroment.activeSessions 87 1320316143

The first part is a unique metric key which in a clustered environment also should include the node identifier. The second part is the actual metric value so in this case there are 87 active sessions. The last part is a timestamp.

This kind of metrics should preferably be pushed regularly with some scheduling utility like quartz or similar but you can of course also push metrics as events of business transactions like this:

node-123.myCoolApplication.service.buyBook.success 1 1320316143

In this case I push the metric of the event of 1 book being sold successfully. These metrics will be scattered in time but nevertheless very useful when you look at them cumulative for trends or compare them with other technical metrics.

It is also very important that you measure failures since they can provide powerful insights compared to other metrics. So in buyBook service I would also push this metrics every time it for some reason failed:

node-123.myCoolApplication.service.buyBook.failed 1 1320316143

My advice is to take a few minutes to think about a good naming convention for you metric keys since it will have some impact on they way you can aggregate data and graph it later and you don’t want to change a key once you have started to measure it.

Here’s a simple java utility class that would do the trick:

public class GraphiteLogger {
    private static final Logger LOGGER = LoggerFactory.getLogger(GraphiteLogger.class);
    private String graphiteHost;
    private int graphitePort;
    private boolean enabled;
    private String nodeIdentifier;

    public static GraphiteLogger getDefaultLogger() {
        String gHost =  “localhost”;  // get it from application startup properties or something
        int gPort = 2003 ; // get it from application startup properties or something
        boolean enabled = true; // good thing to have a on/off switch in application config
        return new GraphiteLogger(gHost, gPort, enabled);
    }

    public GraphiteLogger(String graphiteHost, int graphitePort, boolean enabled) {
        this.enabled = enabled;
        this.graphiteHost = graphiteHost;
        this.graphitePort = graphitePort;
        try {
            this.nodeIdentifier = java.net.InetAddress.getLocalHost().getHostName();
        } catch (UnknownHostException ex) {
            LOGGER.warn("Failed to determin host name",ex);
        }
       if (this.graphiteHost==null || this.graphiteHost.length()==0 ||
           this.nodeIdentifier==null || this.nodeIdentifier.length()==0 ||
           this.graphitePort<0 || !logToGraphite("connection.test", 1L))
       {
            LOGGER.warn("Faild to create GraphiteLogger, graphiteHost graphitePost or nodeIdentifier could not be defined properly: " + about());
            this.enabled=false;
        }
    }

    public final String about() {
        return new StringBuffer().append("{ graphiteHost=").append(this.graphiteHost).append(", graphitePort=").append(this.graphitePort).append(", nodeIdentifier=").append(this.nodeIdentifier).append(" }").toString();
    }

    public void logMetric(String key, long value) {
        logToGraphite(key,value);
    }

    public boolean logToGraphite(String key, long value) {
        Map stats = new HashMap();
        stats.put(key, value);
        return logToGraphite(stats);
    }

    public boolean logToGraphite(Map stats) {
        if (stats.isEmpty()) {
            return true;
        }

        try {
            logToGraphite(nodeIdentifier, stats);
        } catch (Throwable t) {
            LOGGER.warn("Can't log to graphite", t);
            return false;
        }
        return true;
    }

    private void logToGraphite(String nodeIdentifier, Map stats) throws Exception {
        Long curTimeInSec = System.currentTimeMillis() / 1000;
        StringBuffer lines = new StringBuffer();
        for (Object entry : stats.entrySet()) {
            Entry stat = (Entry)entry;
            String key = nodeIdentifier + "." + stat.getKey();
            lines.append(key).append(" ").append(stat.getValue()).append(" ").append(curTimeInSec).append("\n"); //even the last line in graphite
        }
        logToGraphite(lines);
    }
    private void logToGraphite(StringBuffer lines) throws Exception {
        if (this.enabled) {
            LOGGER.debug("Writing [{}] to graphite", lines.toString);
            byte[] bytes = lines.toString().getBytes();
            InetAddress address = InetAddress.getByName(graphiteHost);
            DatagramPacket packet = new DatagramPacket(bytes, bytes.length,address, graphitePort);
            DatagramSocket dsocket = new DatagramSocket();
            try {
                dsocket.send();
            } finally {
                dsocket.close();
            }
        }
    }
}

As easy as you log info and debug to your logging framework of choice you can now use this to push technical and business metrics to graphite everywhere in your app:

public class BookService {
private static final GraphiteLogger GRAPHITELOGGER = GraphiteLogger.getDefaultLogger();
    public void buyBook(..) {
        try {
        // do your service stuff
    } catch (ServiceException e) {
        // do your exception handling
        GRAPHITELOGGER.logMetric(“bookstore.service.buyBook.failed”, 1L);
    }
    GRAPHITELOGGER.logMetric(“bookstore.service.buyBook.success”, 1L);
}

Start Graphing

Now when you have got graphite up n’ running and your app is pushing all sorts of useful metrics to it you can start with the fun part, graphing!Graphite comes with a web front for elaborating with graphs, just brows to it on the installed Apache (defaults as document root). There you can browse your metric keys and create graphs in a graph composer, apply misc functions and rendering options etc. From here you can also access the documentation and some experimental feature for flot and events.
However, the really useful interface graphite provides is the url for rendering a graph on demand. This url e.g.:

http://localhost:8000/render?target=keepLastValue(integral(sum(usbeta13.epsos-web.service.*.failed.*)))&target=keepLastValue(integral(sum(usbeta13.epsos-web.service.*.success.*)))&from=20111024

Will give you a png image of a graph of the sum of all service calls (success and failed) accumulated over time from 2011-11-24

Yes, it is that easy!

There’s also a great deal of functions you can apply to your data e.g integral, cumulative, sum, average, max, min, etc and there’s also a lot of parameters to customize the graph with colors, fonts, texts etc. So just go crazy and define all the graphs you can think of and put them on a self-refreshing webpage, embedd them in a wiki or some other dashboard mash-up you may already have.

And if you find the graphs a bit crude and want to do something more fancy you can just pull the raw data by adding these parameter to the url:

&rawData=true&format=csv

And then use your favorite graph tool and do what ever cool trix you want. The formats available are raw | csv | json. A cool thing to try would be to pull the raw data in json format into a grails app and do some cool eye-candy charts with google charts… I’ll put that in the list of cool-things-to-try

Find the useful graphs

Now you have all the tools in place to make really useful dashboards about your applications real business performance in addition to the technical perfomance. You can in real time graph all kinds of interesting stuff and compare metrics that can give you very valuable insight, lets say you are running a business with a site of some sort and you wan’t to see the business impact on new released features, make sure you push metric to graphite when you deploy and then graph deploys vs what ever business metric you are interested in (e.g. sold books), hopefully you will see a boost after each deploy that contains new cool features and if not maybe you have something to think about. Like this you can combine technical metrics and business value metrics to see patterns and trends which can be really useful for a lot of people in the organisation.

Make them visible

Put the graphs on the biggest displays you can find in a place where as many people as possible can see them. Make sure they are updated frequently enough to provide real-time information and continuously improve, create new and remove old graphs that wasn’t really useful. If you don’t have access to big dashboard displays maybe instead write a small script what will pick useful graphs on a daily basis and email them through out the company, just be sure to spread the knowledge that the graphs provide.

And again, don’t forget to measure failures, many times just visualizing the problems in a sometimes painful way to everyone will give a boost on quality because nobody wants to be the bad guy and everybody wants to be a hero like you!

Andreas Rehn
@andreasrehn

Deployment pipeline och Continuous delivery

En deployment pipeline är till stor utsträckning ett löpande band för att införa förändringar i ett befintligt system. Eller om man så vill automatisering av releaseprocessen.

Vad är nyttan med pipelines? Det finns massvis, men det är två fundamentala mekanismer som sticker ut: Det första är att en automatisk pipeline kräver att man faktiskt har kartlagt och fastställt vilken releaseprocess man har. Det är här den största svårigheten/nyttan finns. Att bara modellera den och komma överens om vilka steg som ska finnas, samt vilka aktiviteter som ska ingå i vilka steg gör att arbetet kommer gå otroligt mycket smidigare. Att sedan stoppa in ett verktyg som gör vissa av stegen automatiskt är mer eller mindre en bonus och ger en bra överblick över vilka steg som ingår i processen. Allt för många gör tvärt om, stoppar in verktyget först. Min erfarenhet är att det aldrig blir aldrig bra.

Det andra är att det främjar DevOps-samarbetet genom att alla får insyn i vad som händer på väg från utveckling till produktion. En pipeline går rakt i igenom Dev-QA-Ops vilket är väsentligt för att alla ska jobba mot samma mål. Devsidan får ett “API” mot releaseprocessen som gör att de håller sig inom de ramar som produktionsmiljön utgör. Rätt implementerat får QA-avdelningen får en knapp att trycka på för att lägga in vilken version som helt i en testmiljö. Ops avdelningens arbete blir även mer inriktat till att bygga automatiska funktioner (robotar på löpande bandet) som stödjer release processen istället för att arbeta med återkommande manuella uppgifter som behöver upprepas för varje release. Man har då gått från att vara en ren kostnad till att investera i sitt löpande band.

I dagsläget finns en uppsjö produkter som hanterar pipelines och nya dyker upp hela tiden.

ThoughtWorks har flaggskeppet Go (fd Cruise) som bygger helt och hållet på pipelines.
Jenkins/Hudson kräver en plugin för att hantera samma sak, fördelen är att både Jenkins och plugin är fritt att använda som opensource.
Atlassians Bamboo kan kombineras med Jira och en plugin från SysBliss för att skapa pipelines.
Vill man ha bättre kontroll på vem som gör vad är Nolio en produkt som kan hantera användarrättigheter i samband med pipelines.
AnthillPro är ännu en produkt som är mycket bra på att bygga avancerad deployment autmation (pipelines).

Pipelines är mycket bra, men det finns några fällor att kliva i så det gäller att vara påläst, pragmatisk och hålla tungan rätt i mun..

Vill man läsa mer om detta, kan man förutom att följa denna blogg, läsa Jez Humbles bok : Continuous Delivery

Enterprise 2.0

Vad är enterprise 2.0? – Jo det är början på en framtid som kommer ändra systemutveckling i grunden. Hur? – Genom att företag som använder enterprise 2.0 kommer att sopa banan med konkurrenter!

Men vad ÄR enterprise 2.0 då? Enligt min egen definition är det möjligheten att skapa system som hanterar stora mängder transaktioner (< 10000 om dagen) samtidigt som förändringar aldrig tar längre än en sprint att införa.

Det krävs att en mängd saker finns på plats för att det ska vara möjligt, men idag är det endast en bakåtsträvande eller möjlgen en “J2EE bränd” organisation som inte ser och tar den möjligheten. Med J2EE bränd menar jag en organisation som för ca 10 år sedan satsade enorma summor på att bygga en arkitektur enligt J2EE och upptäckte att den var värdelös aldeles för sent.

Vi lever i en ny tid idag, ramverken är bättre, teknikerna är bättre, hårdvaran är bättre och kanske framförallt JVM är bättre än den någonsin varit. Så vad krävs då för enterprise 2.0. Jo, framförallt två viktiga saker – Kreativitet och självförtroende! Kreativitet att bygga system som är enkla nog att underhålla samtidigt som man hela tiden inför ny funktionalitet. Självförtoende att ständigt förbättra -refakrorera- systemet (inom en sprint) utan att behöva tänka att “något kanske går sönder”. Det låter kanske enkelt, men det kräver mycket arbete för ett team att nå dit. Test driven utveckling, continuous integration som tar hand om alla byggen 100% autmatiskt, omfattande testramverk men automatisk regressionstest. Men även ett stort medvetande hos utvecklare hur produktionsmiljön faktiskt ser ut.

För att tydliggöra var jag menar med kännedom om produktionsmiljö: Det krävs till exempel kännedom om övervakningssystem och larmhantering. Ska ett system ut i produktion samma dag som sprinten är slut, håller det inte att börja anpassa ett övervakningssystem i efterhand. Det är en del av systemet och därmed en del av utvecklingen och ska utföras av det multifunktionella teamet i sprinten. Alla saker som idag normalt sett ligger utanför sprinten måste få plats i! Test (unit, integration, regression, last), byggen, konfigurering, acceptans, vaildering, etc. Det är svårt!

Tänk den företagsledare som får en idé som klart förbättrar konkurrensläget, som har en IT avdelning som kan leverera idéer ut mot kund på fyra veckor! Det är en ledare med en guldmotor i bolagen som endast begränsas av sin egen kreativitet för att nå framgång. – Det är enterprise 2.0!

Reflektion om produktionstakt

Jag har varit systemutvecklande konsult i tunga transaktionsintensiva projekt i över 10 år nu. Det är dags att börja göra några summeringar och reflektioner. Jag tror vi kan göra många saker mycket bättre!

Utvecklingen har verkligen tagit jättekliv sedan 1998. Du patchade man system direkt i produktion även om de var kritiska för versamheten. CVS var en uppstickare. Java var “för långsamt” och UML var det nya och heta som alla skulle använda, gärna i RUP projekt. CORBA och business components var framtiden!

Men då kom revolutionen – J2EE! Nu skulle man fokusera på business logik och inte tänka på något annat. Komponenter skulle produceras med en rasande hastighet för att jackas in i applikationsservermiljöer som automatiskt kunde kopplas mot vilket “legacy” system som helst. Men vi vet ju alla vilken vändning detta tog.

Faktum är att jag håller med Rod Johnson när han talar om “the dark years of enterprise java”. Även om många tagit klivet ur J2EE till förmån för Spring eller Java EE är det tyvärr mycket som lever kvar från den tiden. Objektorientering är helt borttappad, en mängd olika lager existerar för att lösa J2EE problem som ju inte finns längre, ägare och CM är livrädd för förändringar, domänmodell är inget man arbetar med över huvud taget, systemen växer på alla breddar istället för att kontinueligt justeras efter en verksamhet.

Men vad värre är att utvecklarna ses som skurkarna i många projekt idag. Man vill gärna bädda in dem i en mängd testare, QA-personer, driftpersoner, avlämningsdokument, junittestrapporter och kodgranskningar för att lösa ett problem som inte längre behöver existera! Tänk vad allt detta kostar. Hur mycket tid kan en modern systemutvecklare igentligen lägga på att producera vettig kod??

Men jag förstår personerna som bäddar in utvecklarna, lika mycket som jag förstår utveklarna som inte gnäller på detta. Utvecklingsteam har med J2EE producerat oanvändbara system i många år, de klarar inte pressen! Samtidigt som produktägarna har dåligt förtroende för teamen. De är trötta på att få höra att det tar upp till fyra månader att få ut sin lilla feature som gör att denne kan fånga den där viktiga kunden.

Därför borde vi, som tycker att detta nära nog är en samhällsekonomisk katastrof, göra uppror och visa projekten vi deltar i att idag kan vi producera system som håller betydligt högre kvalité till betydligt lägre pris, om vi bara får ansvaret att göra det!

Men ANSVAR är huvudordet här! En systemutvecklare som suttit inbäddad på detta sätt länge har glömt att det är det som kodas in i systemet som bestämmer hur bra det fungerar. Det gör inte hur många buggar man “lyckas” hitta i oändliga testperioder, eller hur lite tid en överdimensionerad driftavdelning har systemet nere. Det är hur få buggar som byggs in och framförallt, hur mycket systemet klarar sig självt i produktion som bestämmer hur väl utvecklingsteamet klarar uppgiften, helst utan någon systemtest utanför teamet.

Jag är säker på att frånvaron av alla timmar som lägs på överlämningsdokument och testmöten, skulle kunna halvera kostnader i många projekt om bara utvecklarna tog ansvar och produktägarna började lita på dem.