Tag Archives: devops

Conference retrospectives

The first week of June was a busy one for us with representation on a couple of conferences taking place in Stockholm, the inaugural CoDe Continuous Delivery & DevOps conference on June 2nd and Agila Sverige (Agile Sweden) on June 3rd and 4th. Here’s a small retrospective on both of those events.

The CoDe conference was quite small (a bit less than 200 people) but had a very good vibe to it. We were silver sponsors of the event and we had our Jenkins Delivery Pipeline Plugin on display in our booth which gained a lot of interest from attendees. It led to many interesting discussions on CI/CD servers, automation, capabilities and visualization. Our feeling was that the attendees had a lot of genuine interest, knowledge and awareness about Continuous Delivery which fueled these interesting discussions.

Our Marcus Philip presented “Continuous Delivery for Databases” at the CoDe conference, where he talked about how to transition a legacy database application with a lot of PL/SQL code into a streamlined process where all changes are version controlled, traceable and automatically deployed. The talk covered the whole spectrum of the problem, starting with why they weren’t doing Continuous Delivery, why they should do it, thoughts on different solutions to the problem and how they did the actual implementation and how it panned out. Many attendees enjoyed the presentation and Marcus also got the opportunity to present the talk again at the Oracle Stockholm Meetup on June 16th. Cool!

Slides from the talk can be found here: https://speakerdeck.com/marcusphi/continuous-delivery-for-databases.

Agila Sverige is a conference which strongly encourages discussions among the conference attendees. All talks on the conference are lightning talks and there is plenty of time allocated for open space discussions and breaks, allowing you to network and share experiences with others. The conference is in Swedish but is probably one of the best when it comes to agile software development practices. Two or three years ago many talks and discussions on this conference focused on “why to practice agile development practices”. This year it was apparent that the industry has matured on all levels since the majority of talks and open space sessions rather focused on “how to practice agile development practices”.

Tommy Tynjä presented a visionary talk on how tomorrow’s software can be structured and delivered in his presentation “Next Generation Continuous Delivery”. The talk gained a lot of interest, the room was packed for the presentation and Tommy had some interesting discussions on the topic afterwards.

Slides from his talk can be found here: http://www.slideshare.net/tommysdk/next-generation-continuous-delivery
There is also a video recording available at: https://agilasverige.solidtango.com/video/next-generation-continuous-delivery.

We give our thumbs up for both of these conferences and we hope to see you on any of these events next year as well! We always look forward to meet people to discuss thoughts, experiences, problems and visions regarding Continuous Delivery, DevOps and agile software development methodologies. But if you don’t want to wait until next year, don’t hesitate to contact us!

AWS Cloudformation introduction

AWS Cloudformation is a concept for modelling and setting up your Amazon Web Services resources in an automatic and unified way. You create templates that describe your AWS resoruces, e.g. EC2 instances, EBS volumes, load balancers etc. and Cloudformation takes care of all provisioning and configuration of those resources. When Cloudformation provisiones resources, they are grouped into something called a stack. A stack is typically represented by one template.

The templates are specified in json, which allows you to easily version control your templates describing your AWS infrastrucutre along side your application code.

Cloudformation fits perfectly in Continuous Delivery ways of working, where the whole AWS infrastructure can be automatically setup and maintained by a delivery pipeline. There is no need for a human to provision instances, setting up security groups, DNS record sets etc, as long as the Cloudformation templates are developed and maintained properly.

You use the AWS CLI to execute the Cloudformation operations, e.g. to create or delete a stack. What we’ve done at my current client is to put the AWS CLI calls into certain bash scripts, which are version controlled along side the templates. This allows us to not having to remember all the options and arguments necessary to perform the Cloudformation operations. We can also use those scripts both on developer workstations and in the delivery pipeline.

A drawback of Cloudformation is that not all features of AWS are available through that interface, which might force you to create workarounds depending on your needs. For instance Cloudformation does not currently support the creation of private DNS hosted zones within a Virtual Private Cloud. We solved this by using the AWS CLI to create that private DNS hosted zone in the bash script responsible for setting up our DNS configuration, prior to performing the actual Cloudformation operation which makes use of that private DNS hosted zone.

As Cloudformation is a superb way for setting up resources in AWS, in contrast of managing those resources manually e.g. through the web UI, you can actually enforce restrictions on your account so that resources only can be created through Cloudformation. This is something that we currently use at my current client for our production environment setup to assure that the proper ways of workings are followed.

I’ve created a basic Cloudformation template example which can be found here.

Tommy Tynjä
@tommysdk

Agile Configuration Management – intermezzo

Why do I need agile configuration management?

The main reason for doing agile configuration management is that it’s a necessary means too achieve agile infrastructure and architecture. When you have an agile architecture it becomes easy to make the correct choice for architecture.

Developers make decisions every day without realizing it. Solving that requirement by adding a little more code to the monolith and a new table in the DB is a choice, even though it may not be perceived as such. When you have an agile architecture, it’s easy to create (and maintain!) a new server or  middleware to adress the requirement in a better way.

Continuous Delivery in the cloud

During the last year or so, I have been implementing Continuous Delivery and delivery pipelines for several different companies. From 100+ employees down to less than 5. And in several different businesses. It is striking to see that the problems, once you dig them out, are soo similar in all companies. Even more striking maybe is that the solutions to the problems are soo similar. Implementing a low cycle time – high thoughput delivery pipeline in IT departments is very hard, but the challenges are very similar in most of the customers I have worked with. Implementing the somewhat immature tools and automation solutions that need to be connected and in sync (GO, Jenkins, Nolio, Rundeck, Puppet, Chef, Maven, Gails, Redmine, Liquibase, Selenium, Twist.. etc) is something that needs to be done over and over again.

This observation has given us the idea that continuous delivery is perfectly suited for a cloud offering. Therefore me and my colleges have started an ambitious initiative to implement a complete “from-requirement-to-production-to-validation” delivery pipeline as a could service. We will do it using our collective experience from the different solutions we have done for our various customers.

Why is it that every IT department need to be, not only experts of their business, but also experts in the craft of systems development. This is really strange, in particular since the challenges seem to be so similar in every single IT department. Systems development should be all about implementing the business requirements as efficient and correct as possible. Not so much about discussing how testing should be increased to maintain quality, or implementing solutions for deploying war-files to JBoss. That “development” is done in each and every company these days. Talk about breaking the DRY rule.

The solution to the problem above has traditionally been to try to purchase “off-the-shelf” systems that, at least on paper, fulfills the business needs. The larger the “off-the-shelf” system, the more business needs are fulfilled. But countless are the businesses that have come to realize that fulfilling on paper is not the same as fulfilling in production, with customers, making money. And worse, once the “off-the-shelf” system has been incorporated into every single corner of the company, it becomes very hard to adapt to changing markets and growing business requirements. And markets change rapidly these days – if in doubt, ask Nokia.

A continuous delivery pipeline solution might seem like a trivial thing when talking about large markets like the one Nokia is acting in. But there are many indicators that huge improvements can be achieved by relatively small means that are related to IT. Let me mention two strong indicators:

  1. Most of the really large new companies in the world, all have leaders with a heavy legacy in hardcore systems development/programming. (Oracle, Google, Apple, Flickr, Facebook, Spotify, Microsoft to mention a few)
  2. A relatively new concept “Lean-Startup”, by Eric Ries takes an extremely IT-close approach to business development as a whole. This concept has renderd huge impact in fortune 500 companies around the word. And Eric Ries has busy days coaching and talking to companies that are struggling to be innovative when the market they are in are moving faster than they can handle. As an example, one of the large european banks recently put their new internet bank system “on ice” simply because they realized that it was going to be out of date before it was launched.

All in all. To coupe with the extreme pace, systems development is becoming more and more industry like. And traditional industries have spent more that 80 years optimizing production lines. No one would ever get the stupid idea to try to build a factory to develop a toy or a new component for a car without seeking professional guidance and assistance. I mean, what kind of robot from Motoman Robotics would you choose? That is simply not core business value!

More on our continuous delivery pipeline in the cloud solution will follow on this channel!

Retrospective from a JFokus presentation

I have been thrilled to get the chance to present my experience with DevOps and Continuous delivery at JFokus 2012! This is a short story, describing the experience.

Back in october 2011 when my proposal was approved, I immediately started to read the great Presentation Zen book by Garr Reynolds. This book is great! It makes presentation seem really simple and the recommendations are crystal clear. However, it turns out that the normal guidance a bullet point presentation gives, is considered a disaster in the book. This leaves the presentation clean and lean, but I soon realized it also leaves the presenter alone on the stage. Infront of 150 or so persons, you need to remember everything you have planned to say on each slide. I thought I was going to pull it off, but a repetition one week before the event, made me think twice. It was a disaster! But beeng somewhat stubborn, I decided to stick to the plan and luckily it got 100% better at the actual event.

DevOps and Continuous Delivery really is something that I’m passionate about, and I hope that helped in my attempt to deliver something meaningful to the ones that showed up. I got the feeling that people was listening and hopefully left the presentation with some thoughts that can help their companies be more productive and successful.

One question I got after was: “Are anyone actually doing this?” Well, I’d like to pass the question forward. send me a tweet @danielfroding, if you are doing this. I know at least four places, of which I have been highly involved in two, that has implemented an automated deployment pipeline and are developing the system according to Continuous Delivery principles.

My slides are here: http://www.jfokus.se/jfokus12/preso/jf12_RetrospectiveFromTheYearOfDevOps.pdf

JFokus is really a great conference and I’m proud that we have such event in Stockholm! A huge thanks to the organizers for setting this up!

See you next year!

Culture Hacks

On DevOpsDays in Gothenburg in mid October I attended a open session on the topic “Cultural Hacks”. It was one of the most interesting open sessions and I just want to share the ideas that came up.

Why culture needs to be hacked

By definition culture is something that can not be replaced like a tool or even people in an organization and therefore in order to change culture it needs to be “hacked”. So what hacks can you do to start a cultural change in a company? Well I guess it depends on what you want to change but in this discussion it is the old devs vs. ops culture and what we want is a devops culture where developers and operations talk to each other, collaborate and strive for the same goal which is getting good software out the door and into production in a controlled way as often as possible (in one very simplified sentence)

Proposed hacks

  • Metrics that provides transparency though out the company. Measure everything and make it available to everyone. Not only technical metrics like server load, disk io or whatever but also useful business metrics and combine them in every possible way to find the real useful and interesting correlations.
  • Hackathons would hopefully get people who does not normally interact with each other to talk (maybe about something completely out of work-scope) and collaborate and share ideas and thereby learn from each other.
  • Ops engineers in dev-teams with shorter feedback loops and tighter collaboration devs will learn more about ops and infrastructure, how their code behaves in production and what they can do to help in that area. And at the same time making the ops involved in the development process early on and can contribute with deployment scripts, server provisioning scripts, tuning etc and enforce dev requirements that makes deployment and operation tasks better and easier.
  • Daily standsups well this obvious as I see it but nevertheless very important and if you do it right it can defiantly make way for cultural change in a team.
  • Transparent backlog exposing your teams backlog will hopefully create a bigger understanding of what you are doing and why. I guess the main purpose is to enforce better prioritizing and communication between those who requests your time and services.
  • Fail cake funny little harmless hack which means that the one responsible for production failure has to buy cake for the team. Punishment enough for that person and since ever body likes cake no one can be that angry with him/her either. The purpose of this is of course to strive for better quality, embrace failure in the sense that it will happen just learn how handle it and prevent it the next time and of course to learn from others mistakes.
  • Exchange with other non competitor companies This hack proposed a people exchange between two companies that has a lot to learn from each other but does not compete on the same market. I like this one, a day, week month or what you think is appropriate will for sure make people learn new stuff, bring home new good things and also learn to avoid the bad things. I’m sure knowledge exchange happens all the time at conferences and tech talks but to actually exchange people and work at other companies I guess is not very common.
  • Tech talks with external speakers this, hack proposed to lift the tech talks that many companies have but consider very internal, by bringing in external speakers. That would hopefully spice up the discussions and make more people come and learn new stuff. Keep an eye open for when interesting people are in town for some event. Many times a 20 min tech talk over lunch is no big deal to squeeze in and does not have to cost you a lot either since they are already in town.
  • Give root access to developers This hack, proposed by a ops guy of course, I think sound a lot like a “chaos monkey” experiment. However, I think there is a nig psychological point in giving devs root access, saying you have the power to do stuff but also the responsibility to make sure you do not mess up. It will hopefully erase some of the invisible boundaries between development and operations.
  • Draw a picture Simple but also an important thing you can do to spread knowledge, get people to talk and build up truly cross-functional teams. This was also mentioned in Mitchell Hashimotos talk at DevOpsDays in Gothenburg as key part of bringing devs closer to ops.
  • Framing problems To be honest I can’t really remember what this hack was about. Suggestions are welcome…
  • Make people feel safe, give all the credit and take all the blame Good way of getting people that are reluctant to change to take the first step and try something new.
  • Take advantage of compelling events I can’t remember exactly how the discussion around this was but I guess it is about keeping your eye open for things that you can use as an “excuse” to impose a change that normally would just be rejected.
  • Subjective metrics Funny little pretty harmless hack that I’ve seen around. E.g. letting each individual in the team present a smiley of their mood. Purpose is to create an more open environment and encourage communication. You can track the level of satisfaction in a team and maybe do some correlation to how they are actually performing. However, you have to be careful not to pressure peoples personal integrity.
  • Force to set “confident level” on every checkin I guess this is somewhat related to the subjective metrics above. I like the idea, and I think the cultural change it will hopefully create is to get people to think more about what quality of code they are checking in. If someone checks in with low confident level you can ask them why, are you checking in crappy code? If they check in with high confident level you can also ask them if they are so sure this will work and not break anything? I guess it will be like when you first start off with scrum , the first times the team will be over-optimistic but in time they till learn where their level is. I’ve also seen related subjective metrics e.g. “commit karma” where everybody starts at 100% and is decreased if their commit breaks something. Someone with low karma has a harder time getting their code out in production than someone with high karma.
 
 

Andreas Rehn
@andreasrehn

Metrics, metrics everywhere with Graphite

What useful metrics does you application provide and how accessible are they?
In my experience, many times metrics of any application is bolted on by operations before going live or maybe even afterwards when you start experiencing strange problems and realize that the only way of knowing how the application performs is looking at cpu usage and stuff like that. Even though cpu, io and memory usage can be very helpful for ops it is probably not very useful metrics when looking at how your application performs in business terms.You need to build in metrics to your application and it should be as natural and common as any other logging you put there. Live metrics and stats presented in a appealing graphs are priceless feedback for practically everybody in the organisation ranging from operations, development, marketing, sales and even executives. Since all those people have very different views on what useful metrics are you need to start pushing out metrics of everything. You never know when you need it since it is so easy there’s really no excuse for not doing it. With very little effort you can be the graphing hero and hopefully cool dashboards with customized live metrics graphs will start to pop up everywhere.

Install Graphite

Graphite is a cool little project that allows you to collect/aggregate metrics and in a very easy and flexible way create customized real time graphs on demand. It is a python/django app with a web front that hooks into Apache. The data aggregator is called Carbon and is essentially a python deamon that slurps data from a udp port. The whole “package” can be a bit tricky to install (at least when you are on REL), it depends on some image processing libraries and stuff but you will get it done in an hour or two at them most, just follow the install instructions. Needless to say it must be installed on a server that is accessible from where the applications are running so they can push metrics to it on a udp port, but I’m sure there’s one laying around running some old monitoring tools or something. There are default examples of all config files so once all the python packs and dependencies are installed you will be up n’ running in no time and can start to push metrics to Carbon.

Start pushing metrics

They way you push data to Carbon is extremely easy, just push a udp package (udp for low cost fire-and-forget communication) like this:

node-123.myCoolApplication.enviroment.activeSessions 87 1320316143

The first part is a unique metric key which in a clustered environment also should include the node identifier. The second part is the actual metric value so in this case there are 87 active sessions. The last part is a timestamp.

This kind of metrics should preferably be pushed regularly with some scheduling utility like quartz or similar but you can of course also push metrics as events of business transactions like this:

node-123.myCoolApplication.service.buyBook.success 1 1320316143

In this case I push the metric of the event of 1 book being sold successfully. These metrics will be scattered in time but nevertheless very useful when you look at them cumulative for trends or compare them with other technical metrics.

It is also very important that you measure failures since they can provide powerful insights compared to other metrics. So in buyBook service I would also push this metrics every time it for some reason failed:

node-123.myCoolApplication.service.buyBook.failed 1 1320316143

My advice is to take a few minutes to think about a good naming convention for you metric keys since it will have some impact on they way you can aggregate data and graph it later and you don’t want to change a key once you have started to measure it.

Here’s a simple java utility class that would do the trick:

public class GraphiteLogger {
    private static final Logger LOGGER = LoggerFactory.getLogger(GraphiteLogger.class);
    private String graphiteHost;
    private int graphitePort;
    private boolean enabled;
    private String nodeIdentifier;

    public static GraphiteLogger getDefaultLogger() {
        String gHost =  “localhost”;  // get it from application startup properties or something
        int gPort = 2003 ; // get it from application startup properties or something
        boolean enabled = true; // good thing to have a on/off switch in application config
        return new GraphiteLogger(gHost, gPort, enabled);
    }

    public GraphiteLogger(String graphiteHost, int graphitePort, boolean enabled) {
        this.enabled = enabled;
        this.graphiteHost = graphiteHost;
        this.graphitePort = graphitePort;
        try {
            this.nodeIdentifier = java.net.InetAddress.getLocalHost().getHostName();
        } catch (UnknownHostException ex) {
            LOGGER.warn("Failed to determin host name",ex);
        }
       if (this.graphiteHost==null || this.graphiteHost.length()==0 ||
           this.nodeIdentifier==null || this.nodeIdentifier.length()==0 ||
           this.graphitePort<0 || !logToGraphite("connection.test", 1L))
       {
            LOGGER.warn("Faild to create GraphiteLogger, graphiteHost graphitePost or nodeIdentifier could not be defined properly: " + about());
            this.enabled=false;
        }
    }

    public final String about() {
        return new StringBuffer().append("{ graphiteHost=").append(this.graphiteHost).append(", graphitePort=").append(this.graphitePort).append(", nodeIdentifier=").append(this.nodeIdentifier).append(" }").toString();
    }

    public void logMetric(String key, long value) {
        logToGraphite(key,value);
    }

    public boolean logToGraphite(String key, long value) {
        Map stats = new HashMap();
        stats.put(key, value);
        return logToGraphite(stats);
    }

    public boolean logToGraphite(Map stats) {
        if (stats.isEmpty()) {
            return true;
        }

        try {
            logToGraphite(nodeIdentifier, stats);
        } catch (Throwable t) {
            LOGGER.warn("Can't log to graphite", t);
            return false;
        }
        return true;
    }

    private void logToGraphite(String nodeIdentifier, Map stats) throws Exception {
        Long curTimeInSec = System.currentTimeMillis() / 1000;
        StringBuffer lines = new StringBuffer();
        for (Object entry : stats.entrySet()) {
            Entry stat = (Entry)entry;
            String key = nodeIdentifier + "." + stat.getKey();
            lines.append(key).append(" ").append(stat.getValue()).append(" ").append(curTimeInSec).append("\n"); //even the last line in graphite
        }
        logToGraphite(lines);
    }
    private void logToGraphite(StringBuffer lines) throws Exception {
        if (this.enabled) {
            LOGGER.debug("Writing [{}] to graphite", lines.toString);
            byte[] bytes = lines.toString().getBytes();
            InetAddress address = InetAddress.getByName(graphiteHost);
            DatagramPacket packet = new DatagramPacket(bytes, bytes.length,address, graphitePort);
            DatagramSocket dsocket = new DatagramSocket();
            try {
                dsocket.send();
            } finally {
                dsocket.close();
            }
        }
    }
}

As easy as you log info and debug to your logging framework of choice you can now use this to push technical and business metrics to graphite everywhere in your app:

public class BookService {
private static final GraphiteLogger GRAPHITELOGGER = GraphiteLogger.getDefaultLogger();
    public void buyBook(..) {
        try {
        // do your service stuff
    } catch (ServiceException e) {
        // do your exception handling
        GRAPHITELOGGER.logMetric(“bookstore.service.buyBook.failed”, 1L);
    }
    GRAPHITELOGGER.logMetric(“bookstore.service.buyBook.success”, 1L);
}

Start Graphing

Now when you have got graphite up n’ running and your app is pushing all sorts of useful metrics to it you can start with the fun part, graphing!Graphite comes with a web front for elaborating with graphs, just brows to it on the installed Apache (defaults as document root). There you can browse your metric keys and create graphs in a graph composer, apply misc functions and rendering options etc. From here you can also access the documentation and some experimental feature for flot and events.
However, the really useful interface graphite provides is the url for rendering a graph on demand. This url e.g.:

http://localhost:8000/render?target=keepLastValue(integral(sum(usbeta13.epsos-web.service.*.failed.*)))&target=keepLastValue(integral(sum(usbeta13.epsos-web.service.*.success.*)))&from=20111024

Will give you a png image of a graph of the sum of all service calls (success and failed) accumulated over time from 2011-11-24

Yes, it is that easy!

There’s also a great deal of functions you can apply to your data e.g integral, cumulative, sum, average, max, min, etc and there’s also a lot of parameters to customize the graph with colors, fonts, texts etc. So just go crazy and define all the graphs you can think of and put them on a self-refreshing webpage, embedd them in a wiki or some other dashboard mash-up you may already have.

And if you find the graphs a bit crude and want to do something more fancy you can just pull the raw data by adding these parameter to the url:

&rawData=true&format=csv

And then use your favorite graph tool and do what ever cool trix you want. The formats available are raw | csv | json. A cool thing to try would be to pull the raw data in json format into a grails app and do some cool eye-candy charts with google charts… I’ll put that in the list of cool-things-to-try

Find the useful graphs

Now you have all the tools in place to make really useful dashboards about your applications real business performance in addition to the technical perfomance. You can in real time graph all kinds of interesting stuff and compare metrics that can give you very valuable insight, lets say you are running a business with a site of some sort and you wan’t to see the business impact on new released features, make sure you push metric to graphite when you deploy and then graph deploys vs what ever business metric you are interested in (e.g. sold books), hopefully you will see a boost after each deploy that contains new cool features and if not maybe you have something to think about. Like this you can combine technical metrics and business value metrics to see patterns and trends which can be really useful for a lot of people in the organisation.

Make them visible

Put the graphs on the biggest displays you can find in a place where as many people as possible can see them. Make sure they are updated frequently enough to provide real-time information and continuously improve, create new and remove old graphs that wasn’t really useful. If you don’t have access to big dashboard displays maybe instead write a small script what will pick useful graphs on a daily basis and email them through out the company, just be sure to spread the knowledge that the graphs provide.

And again, don’t forget to measure failures, many times just visualizing the problems in a sometimes painful way to everyone will give a boost on quality because nobody wants to be the bad guy and everybody wants to be a hero like you!

Andreas Rehn
@andreasrehn