Tag Archives: devops

Experiences from a highly productive development team

During the past year I’ve been working in a development team which arguable is the most productive team I’ve been around in my ten year career in software development. During the year, we have been able to achieve great things. We have been taking new systems and services live, delivering new features to both internal and external customers and decommissioning old services which have served their purpose in a fast pace. This article aims to address some of the key aspects that I have been observing, which makes this team so productive and efficient in how they work.

We seldom, if ever, perform work on our own. We have been exercising so called mob programming for a year now, which means the whole team works on the same problem, in front of a big shared monitor, with one keyboard and mouse being passed around the participants. This was something that the team just decided to try after attending a conference which had a presentation on mob programming, which then evolved into a standard way of working within the team. The team’s manager do not care how we work, as long as we deliver. Every morning we finish our daily scrum/standup meeting by deciding on the team formation. Sometimes a small task needs to be performed that might not be suitable for the mob. In that case we exercise pair programming while the mob continues. The benefits of everyone working together are e.g. knowledge sharing, that everyone knows what’s happening, architecture decisions can be made as we go etc. We also prefer constant pre-commit code review because at that point in time everybody is familiar with the code changes, the intent of the changes and what possible solutions were discarded, something that is not as clear with a separate post-commit review which also adds unnecessary overhead. A team moves quicker than an individual and the shared knowledge of the team is greater than of particular individuals. These effects were more or less immediate when we started with mob programming.

Continuous Delivery. We have a streamlined process where the whole delivery process is automated. For each code change that is pushed to a source code repository, a new build of the software is triggered. For each build we run unit, subsystem and regression tests in a fully automated fashion. This is our safety net and if all tests pass, we automatically continue deploying the code to our different environments. In roughly 30 minutes, the new code change has been deployed to production if it has passed all the gatekeeping. No manual interventions needed. This allows us to go fast and get quick feedback on whether a given code change is fit for purpose. A benefit of being able to ship new code changes as soon as possible is that you do not create an inventory of code changes. If a problem would arise, the developers would be up to speed and have the context of the particular code in mind if they would need to make any additional changes. Even though we tag our deployments automatically for traceability, it is also very easy to know what code is actually deployed in production.

Code that is in production brings value, code that isn’t, doesn’t.

Teams with end-to-end responsibility. The Amazon CTO Werner Vogels have been quoted saying: “You build it, you run it”. This aspect is really important for a development team to be able to take responsibility and accountability for the code they write. If you are on-call for the code you’ve written, you will pay attention to quality since you do not want to be woken up in the middle of the night due to technical issues. And if problems arise, you will be sure to solve them for good once they occur. Having to think about the operational aspects of software on a daily basis has made me a better developer. Not only do I write code that is easy to test (an effect of test-driven development), but I also keep in mind how the code is going to be deployed and run.

A team dedicated product owner (PO) is key. A year ago we had a PO that was split between our and another team. When that person was dedicated for our team only a few months later we noticed great effects. The PO was much more available for the needs of the development team. The PO was able answer questions and take decisions which shortened the feedback loop for the developers substantially. These days the PO even joins the programming mob to both gain an understanding of how particular problems are solved but also to bring domain specific expertise to the team.

Automate everything. Not everything is worth automating at first, but if it is obvious that a task is repetitive and thus error prone it will be worth automating. Automate the tedious task together in the team and everyone will understand the benefits of it and know the operating procedure. Time is a valuable asset and we only have so much of it. We need to use it wisely to be able to bring as much value for the company and our customers.

Don’t let a human do a scripts job.

The willingness to experiment and try out new things is of utmost importance. Not all experiments will bring the desired effect and some might even bring negative effect. But the culture should embrace continuous improvement and experimentation. It should be easy to discard an experiment and rollback.

You should never stop improving. No matter how good you feel about yourself, which processes you have in place, how fast you are able to deliver new features to your customers, you must not stop improving. The journey has no end. If you stop improving, someone of your competitors will eventually outplay you. Why has Usain Bolt been the best sprinter for such a long period of time? He probably works out harder than any of his competitors to stay on top of the game. You should too.

Tommy Tynjä
LinkedIn profile

Reasons why code freezes don’t work

This article is a continuation on my previous article on how to release software with quality and confidence.

When the big e-commerce holidays such as Black Friday, Cyber Monday and Christmas are looming around the corner, many companies are gearing up to make sure their systems are stable and able to handle the expected increase in traffic.

What many organizations do is introducing a full code freeze for the holiday season, where no new code changes or features are allowed to be released to the production systems during this period of the year. Another approach is to only allow updates to the production environment during a few hours during office hours. These approaches might sound logical but are in reality anti-patterns that neither reduces risk nor ensures stability.

What you’re doing when you stop deploying software is interrupting the pace of the development teams. The team’s feedback loop breaks and the ways of workings are forced to deviate from the standard process, which leads to decreased productivity.

When your normal process allows deploying software on a regular basis in an automated and streamlined fashion, it becomes just as natural as breathing. When you stop deploying your software, it’s like holding your breath. This is what happens to your development process. When you finally turn on the floodgates after the holiday season, the risk for bugs and deployment failures are much more likely to occur. As more changes are stacked up, the bigger the risk of unwanted consequences for each deployment. Changes might also be rushed, with less quality to be able to make it into production in time before a freeze. Keeping track of what changes are pending release becomes more challenging.

Keeping your systems stable with exceptional uptime should be a priority throughout the whole year, not only during holiday season. The ways of working for the development teams and IT organization should embrace this mindset and expectations of quality. Proper planning can go a long way to reduce risk. It might not make sense to push through a big refactoring of the source code during the busiest time of the year.

A key component for allowing a continuous flow of evolving software throughout the entire year is organizational maturity regarding monitoring, logging and planning. It should be possible to monitor, preferably visualised on big screens in the office, how the systems are behaving and functioning in real-time. Both technical and business metrics should be metered and alarm thresholds configured accordingly based on these metrics.

Deployment strategies are also of importance. When deploying a new version of a system, it should always be cheap to rollback to a previous version. This should be automated and possible to do by clicking just one button. Then, if there is a concern after deploying a new version, discovered by closely monitoring the available metrics, it is easy to revert to a previously known good version. Canary releasing is also a strategy where possible issues can be discovered before rolling out new code changes for all users. Canary releasing allows you to route a specific amount of traffic to specific version of the software and thus being able to compare metrics, outcomes and possible errors between the different versions.

When the holiday season begins, keep calm and keep breathing.

Tommy Tynjä
LinkedIn profile

Summary of Eurostar 2016

About Eurostar

Eurostar is Europe’s largest conference that is focused on testing and this year the conference was held in Stockholm October 31 – November 3. Since I have been working with test automation lately it seemed like a good opportunity to go my first test conference (I was there for two days). The conference had the usual mix of tutorials, presentations and expo, very much a traditional conference setup unlike the community driven style.

Key take away

Continuous delivery and DevOps changes much of the conventional thinking around test. The change is not primarily related to that you should automate everything related to test but that, in the same way as you drive user experience testing with things like A/B testing, a key enabler of quality is monitoring in production and the ability to quickly respond to problems. This does not mean that all automated testing is useless. But it calls for a very different mindset compared to conventional quality wisdom where the focus has been on finding problems as early as possible (in terms of phases of development). Instead there is increased focus on deploying changes fast in a gradual and controlled way with a high degree of monitoring and diagnostics, thereby being able to diagnose and remedy any issues quickly.


Roughly in order of how much I liked the sessions, here is what I participated in:

Sally Goble et. al. – How we learned to love quality and stop testing

This was both well presented and thought provoking. They described the journey at Guardian from having a long (two weeks) development cycle with a considerable amount of testing to the current situation where they deploy continuously. The core of the story was how the focus had been changed from test to quality with a DevOps setup. When they first started their journey in automation they took the approach of creating Selenium tests for their full manual regression test suite. This is pretty much scrapped now and they rely primarily on the ability to quickly detect problems in production and do fixes. Canary releases and good use of monitoring / APM, and investments in improved logging were the key enablers here.

Automated tests are still done on a unit, api and integration test level but as noted above really not much automation of front end tests.

Declan O´Riordan – Application security testing: A new approach

Declan is an independent consultant and started his talk claiming that the number of security related incidents continue to increase and that there is a long list of potential security breaches that one need to be aware of. He also talked about how continuous delivery has shrunk the time frame available for security testing to almost nothing. I.e., it is not getting any easier to secure your applications. Then he went on claiming that there has been a breakthrough in terms of what tools can with regard to security testing in the last 1-2 years. These new tools are categorised as IAST (Interactive Analysis Security Testing) and RASP (Runtime Application Self-Protection). While traditional automated security testing tools find 20-30% of the security issues in an application, IAST-tools find as much as 99% of the issues automatically. He gave a demo and it was impressive. He used the toolset from Contrast but there are other supplier with similar tools and many hustling to catch up. It seems to me that an IAST tool should be part of your pipeline before going to production and a RASP solution should be part of your production monitoring/setup. Overall an interesting talk and lots to evaluate, follow up on, and possibly apply.

Jan van Moll – Root cause analysis for testers

This was both entertaining and well presented. Jan is head of quality at Philips Healthcare but he is also an independent investigator / software expert that is called in when things go awfully wrong or when there are close escapes/near misses, like when a plane crashes.

No clear takeaway for me from the talk that can immediately be put to use but a list of references to different root cause analysis techniques that I hope to get the time to look into at some point. It would have been interesting to hear more as this talk only scratched the surface of the subject.

Julian Harty – Automated testing of mobile apps

This was interesting but it is not a space that I am directly involved in so I am not sure that there will be anything that is immediately useful for me. Things that were talked about include:

  • Monkey testing, there is apparently some tooling included in the Android SDK that is quite useful for this.
  • An analysis that Microsoft research has done on 30 000 app crash dumps indicates that over 90% of all crashes are caused by 10 common implementation mistakes. Failing to check http status codes and always expecting a 200 comes to mind as one of the top ones.
  • appdiff.com a free and robot-based approach to automated testing where the robots apply machine learned heuristics. Simple and free to try and if you are doing mobile and not already using it you should probably have a look.

Ben Simo – Stories from testing healthcare.gov

The presenter rose to fame at the launch of the Obamacare website about a year ago. As you remember there were lots of problems the weeks/months after the launch. Ben approached this as a user from the beginning but after a while when things worked so poorly he started to look at things from a tester point of view. He then uncovered a number of issues related to security, usability, performance, etc. He started to share his experience on social media, mostly to help others trying to use the site, but also rose to fame in mainstream media. The presentation was fun and entertaining but I am not sure there was so much to learn as it mostly was a run-through of all of the problems he found and how poorly the project/launch had been handled. So it was entertaining and interesting but did not offer so much in terms of insight or learning.

Jay Sehti – What happened when we switched our data center off?

The background was that a couple of years ago Financial Times had a major outage in one of their data centres and the talk was about what went down in relation to that. I think the most interesting lesson was that they had built a dashboard in Dashing showing service health across their key services/applications that each are made up of a number of micro services. But when they went to the dashboard to see what was still was working and where there were problems they realised that the dashboard had a single point of failure related to the data centre that was down. Darn. Lesson learned: secure your monitoring in the same way or better as your applications.

In addition to that specific lesson I think the most interesting part of this presentation was what kind of journey they had gone through going to continuous delivery and micro services. In many ways this was similar to the Guardian story in that they now relied more on monitoring and being able to quickly respond to problems rather than having extensive automated (front end) tests. He mentioned for example they still had some Selenium tests but that coverage was probably around 20% now compared to 80% before.

Similar to Guardian they had plenty of test automation at Unit/API levels but less automation of front end tests.

Tutorial – Test automation management patterns

This was mostly a walk-through of the website/wiki testautomationpatterns.wikispaces.com and how to use it. The content on the wiki is not bad as such but it is quite high-level and common-sense oriented. It is probably useful to browse through the Issues and Automation Patterns if you are involved in test automation and have a difficult time to get traction. The diagnostics tool did not appear that useful to me.

No big revelations for me during this tutorial if anything it was more of a confirmation of that the approach we have taken at my current customer around testing of backend systems is sound.

Liz Keogh – How to test the inside of your head

Liz, an independent consultant of BDD-fame, talked among other things about Cynefin and how it is applicable in a testing context. Kind of interesting but it did not create much new insight for me (refreshed some old and that is ok too).

Bryan Bakker – Software reliability: Measuring to know

In this presentation Bryan presented an approach (process) to reliability engineering that he has developed together with a couple of colleagues/friends. The talk was a bit dry and academic and quite heavily geared towards embedded software. Surveillance cameras were the primary example that was used. Some interesting stuff here in particular in terms of how to quantify reliability.

Adam Carmi – Transforming your automated tests with visual testing

Adam is CTO of an Israeli tools company called Applitools and the talk was close to a marketing pitch for their tool. Visual Testing is distinct from functional testing in that is only concerned with visuals, i.e., what the human eye can see. It seems to me that if your are doing a lot of cross-device, cross-browser testing this kind of automated test might be of merit.

Harry Collins – The critique of AI in the age of the net

Harry is a professor in sociology at Cardiff University. This could have been an interesting talk about scientific theory / sociology / AI / philosophy / theory of mind and a bunch of other things. I am sure the presenter has the knowledge to make a great presentation on any of these subjects but this was ill-prepared, incoherent, pretty much without point, and not very well-presented. More of a late-night rant in a bar than a keynote.


As with most conferences there was a mix of good and not quite so good content but overall I felt that it was more than worthwhile to be there as I learned a bunch of things and maybe even had an insight or two. Hopefully there will be opportunity to apply some of the things I learned at the customers I am working with.

Svante Lidman
LinkedIn profile

JavaOne Latin America summary

Last week I attended the JavaOne Latin America software development conference in São Paulo, Brazil. This was a joint event with Oracle Open World, a conference with focus on Oracle solutions. I was accepted as a speaker for the Java, DevOps and Methodologies track at JavaOne. This article intends to give a summary of my main takeaways from the event.

The main points Oracle made during the opening keynote of Oracle Open World was their commitment to cloud technology. Oracle CEO Mark Hurd made the prediction that in 10 years, most production workloads will be running in the cloud. Oracle currently engineers all of their solutions for the cloud, which is something that also their on-premise solutions can leverage from. Security is a very important aspect in all of their solutions. Quite a few sessions during JavaOne showcased Java applications running on Oracles cloud platform.

The JavaOne opening keynote demonstrated the Java flight recorder, which enables you to profile your applications with near zero overhead. The aim of the flight recorder is to have as minimal overhead as possible so it can be enabled in production systems by default. It has a user interface which provides valuable data in case you want to get an understanding of how your Java applications perform.

The next version of Java, Java 9, will feature long awaited modularity through project Jigsaw. Today, all of your public APIs in your source code packages are exposed and can be used by anyone who has access to the classes. With Jigsaw, you as a developer can decide which classes are exported and exposed. This gives you as a developer more control over what are internal APIs and what are intended for use. This was quickly demonstrated on stage. You will also have the possibility to decide what your application dependencies are within the JDK. For instance it is quite unlikely that you need e.g. UI related libraries such as Swing or audio if you develop back-end software. I once tried to dissect the Java rt.jar for this particular purpose (just for fun), so it is nice to finally see this becoming a reality. The keynote also mentioned project Valhalla (e.g. value types for classes) and project Panama but at this date it is still uncertain if they will be included in Java 9.

Mark Heckler from Pivotal had an interesting session on the Spring framework and their Spring cloud projects. Pivotal works closely with Netflix, which is a known open source contributor and one of the industry leaders when it comes to developing and using new technologies. However, since Netflix is committed to run their applications on AWS, Spring cloud aims to make use of Netflix work to create portable solutions for a wider community by introducing appropriate abstraction layers. This enables Spring cloud users to avoid changing their code if there is a need to change the underlying technology. Spring cloud has support for Netflix tools such as Eureka, Ribbon, Zuul, Hystrix among others.

Arquillian, by many considered the best integration testing framework for Java EE projects, was dedicated a full one hour session at the event. As a long time contributor to project, it was nice to see a presentation about it and how its Cube extension can make use of Docker containers to execute your tests against.

One interesting session was a non-technical one, also given by Mark Heckler, which focused on financial equations and what factors drives business decisions. The purpose was to give an understanding of how businesses calculate paybacks and return on investments and when and why it makes sense for a company to invest in new technology. For instance the payback period for an initiative should be as short as possible, preferably less than a year. The presentation also covered net present values and quantification’s. Transforming a monolithic architecture to a microservice style approach was the example used for the calculations. The average cadence for releases of key monolithic applications in our industry is just one release per year. Which in many cases is even optimistic! Compare these numbers with Amazon, who have developed their applications with a microservice architecture since 2011. Their average release cadence is 7448 times a day, which means that they perform a production release once every 11.6s! This presentation certainly laid a good foundation for my own session on continuous delivery.

Then it was finally time for my presentation! When I arrived 10 minutes before the talk there was already a long line outside of the room and it filled up nicely. In my presentation, The Road to Continuous Delivery, I talked about how a Swedish company revamped its organization to implement a completely new distributed system running on the Java platform and using continuous delivery ways of working. I started with a quick poll of how many in the room were able to perform production releases every day and I was apparently the only one. So I’m glad that there was a lot of interest for this topic at the conference! If you are interested in having me presenting this or another talk on the topic at your company or at an event, please contact me! It was great fun to give this presentation and I got some good questions and interesting discussions afterwards. You can find the slides for my presentation here.

Java Virtual Machine (JVM) architect Mikael Vidstedt had an interesting presentation about JVM insights. It is apparently not uncommon that the vast majority of the Java heap footprint is taken up by String objects (25-50% not uncommon) and many of them have the same value. JDK 8 however introduced an important JVM improvement to address this memory footprint concern (through string deduplication) in their G1 garbage collector improvement effort.

I was positively surprised that many sessions where quite non-technical. A few sessions talked about possibilities with the Raspberry Pi embedded device, but there was also a couple of presentations that touched on software development in a broader sense. I think it is good and important for conferences of this size to have such a good balance in content.

All in all I thought it was a good event. Both JavaOne and Oracle Open World combined for around 1300 attendees and a big exhibition hall. JavaOne was multi-track, but most sessions were in Portuguese. However, there was usually one track dedicated to English speaking sessions but it obviously limited the available content somewhat for non-Portuguese speaking attendees. An interesting feature was that the sessions held in English were live translated to Portuguese to those who lent headphones for that purpose.

The best part of the conference was that I got to meet a lot of people from many different countries. It is great fun to meet colleagues from all around the world that shares my enthusiasm and passion for software development. The strive for making this industry better continues!

Tommy Tynjä
LinkedIn profile

AWS Summit recap

This week, the annual AWS Summit took place in sunny Stockholm. This article aims to provide a recap of my impressions from the event.

It was evident that the event had grown from last year, with approximately 2000 people attending this year’s one day event at Waterfront Congress Centre. Only a few session were technical as most of the presentations just gave an overview of the different services and various use cases. I really appreciated the talks from different AWS customers who spoke about their use of AWS technologies and what problems they solved and how. I found it valuable to hear from different companies on how they leverage certain products in their production environments.

The opening keynote was long (2 hours!) and included a lot of sales talk. The main keynote speaker mentioned that 20 percent of the audience had never used any AWS services at all, which explains the thorough walkthrough of the different AWS products. One product which stood out was Amazon Inspector, which can detect and remediate security issues early in your AWS environment. It is not yet available in all regions, but is available in e.g. eu-west-1 (Ireland). It was also interesting to hear about migration of large amounts of data using Snowball, a physical device shipped to your datacenter, which allows you to move your data faster than over the Internet (except for the physical delivery of the device to and from your own datacenter).

It is undeniable that Internet of Things (IoT) is gaining traction and that the amount of connected devices around has grown exponentially the past few years. AWS provides several services for developing and running IoT services. With AWS IoT, your devices can securely communicate with your backend servers. What I found most interesting was the concept of device shadows. A shadow is an interface which allows you to communicate with a device even though it would be offline at the moment. In your application, you can communicate with the shadow without the need to care about whether the device is online or not. If you want to change the state of a device currently offline, you will update the shadow and when the device connects again, it will get the new desired state from the shadow.

At the startup track, we got to hear how Mojang leverages AWS for their Minecraft Realm concept. Instead of letting external parties host their game servers, they decided to go with AWS for Minecraft Realm, to allow for a more flexible infrastructure. An interesting aspect is that they had to develop their own algorithm for scaling out quickly, as in a gaming environment it is not acceptable to wait for five minutes for an auto scaling group to spin up new machines to meet the current demand from users. Instead, they have to use quite large instance types and have new servers on standby to be able to take on new traffic as it arrives. It is not trivial either to terminate instances where there is people playing, even though only a few, that wouldn’t provide a good user experience. Instead, they kindly inform the user that the server will terminate in five minutes and that usually makes the users change server. Not ideal but live migration is too far away at the moment. They still use old EC2 classic instances and they will have to do some heavy lifting to modernise their stack on AWS.

There was also a presentation from QuizUp on how they use infrastructure as code with Terraform to manage their AWS resources. A benefit they get from using Terraform instead of Cloudformation is to get an execution plan before actually applying changes. The drawback is that it is not possible to query Terraform for the current resources and their state directly from AWS.

In the world of relational databases in AWS (RDS), Aurora is an AWS developed database to maximise reliability, scalability and cost-effectiveness. It delivers up to five times the throughput of a standard MySQL running on the same hardware. It is designed to scale and to handle failures. It even provides an SQL extension to simulate failures:

Probably the most interesting session of the day was about serverless architecture using AWS Lambda. Lambda allows you to upload snippets of code, functions to AWS which runs them for you. No need to provision servers or think about scalability, AWS does that for you and you only pay for the time your code executes in units of 100 ms. The best thing about this talk was the peek under the hood. AWS leverages Linux containers (not Docker) to isolate the resources of the uploaded functions and to be able to run and scale these quickly. It also offers predictive capacity planning. An interesting part is that you can upload libraries which your code depends on as part of your function, so you could basically run a small microservice just by using Lambda. To deploy your function, you package it in a zip archive and use Cloudformation (specified as type AWS::Lambda::Function). You’re able to run your function inside of your VPC and thus leverage other resources available within your VPC.

All in all I thought this was a great event. If you didn’t attend I really recommend attending the next one – especially if you’re already using AWS.

As we at Diabol are standard partners with Amazon, not only can we assist you with your cloud platform strategies but also to tie that together with the full view of your systems development process. Don’t hesitate to contact us!

You can read more about us at diabol.se.

Tommy Tynjä

Diabol hjälper Klarna utveckla en ny plattform och att bli experter på Continuous Delivery

Klarna har sedan starten 2005 haft en kraftig tillväxt och på mycket kort tid växt till ett företag med över 1000 anställda. För att möta den globala marknadens behov av sina betalningstjänster behövde Klarna göra stora förändringar i både teknik, organisation och processer. Klarna anlitade konsulter från Diabol för att nå sina högt satta mål med utveckling av en ny tjänsteplattform och bli ledande inom DevOps och Continuous Delivery.


Efter stora framgångar på den nordiska marknaden och flera år av stark tillväxt behövde Klarna utveckla en ny plattform för sina betalningstjänster för att kunna möta den globala marknaden. Den nya plattformen skulle hantera miljontals transaktioner dagligen och vara robust, skalbar och samtidigt stödja ett agilt arbetssätt med snabba förändringar i en växande organisation. Tidplanen var mycket utmanande och förutom utveckling av alla tjänster behövde man förändra både arbetssätt och infrastruktur för att möta utmaningarna med stor skalbarhet och korta ledtider.


Diabols erfarna konsulter med expertkompetens inom Java, DevOps och Continuous Delivery fick förtroendet att stärka upp utvecklingsteamen för att ta fram den nya plattformen och samtidigt automatisera releaseprocessen med bl.a. molnteknik från Amazon AWS. Kompetens kring automatisering och verktyg byggdes även upp i ett internt supportteam med syfte att stödja utvecklingsteamen med verktyg och processer för att snabbt, säkert och automatiserat kunna leverera sina tjänster oberoende av varandra. Diabol hade en central roll i detta team och agerade som coach för Continuous Delivery och DevOps brett i utvecklings- och driftorganisationen.


Klarna kunde på rekordtid gå live med den nya plattformen och öppna upp på flera stora internationella marknader. Autonoma utvecklingsteam med stort leveransfokus kan idag på egen hand leverera förändringar och ny funktionalitet till produktion helt automatiskt vilket vid behov kan vara flera gånger om dagen.

Uttömmande automatiserade tester körs kontinuerligt vid varje kodförändring och uppsättning av testmiljöer i AWS sker också helt automatiserat. En del team praktiserar även s.k. “continuous deployment” och levererar kodändringar till sina produktionsmiljöer utan någon som helst manuell handpåläggning.

“Diabol har varit en nyckelspelare för att uppnå våra högt ställda mål inom DevOps och Continuous Delivery.”

– Tobias Palmborg, Manager Engineering Support, Klarna



Diabol migrerar Abdona till AWS och inför en automatiserad leveransprocess

Abdona tillhandahåller tjänster för affärsresehantering till ett flertal organisationer i offentlig sektor. I samband med en större utvecklingsinsats vill man också se över infrastrukturen för drift och testmiljöer för att minska kostnader och på ett säkert sätt kunna garantera hög kvalité och korta leveranstider. Diabol anlitades för ett helhetsåtagande att modernisera infrastruktur, utvecklingsmiljö, test- och leveransprocess.


Abdonas system består av en klassisk 3-lagersarkitektur i Java Enterprise och sedan lanseringen för 7 år sedan har endast mindre uppdateringar skett. Teknik och infrastruktur har inte uppdaterats och har med tiden blivit förlegade och svårhanterliga. Manuellt konfigurerade servrar, undermålig dokumentation och spårbarhet, knapphändig versionshantering, ingen kontinuerlig integration eller stabil byggmiljö, manuell test och deployment. Förutom dessa strukturella problem var kostnaden för hårdvara som satts upp manuellt för både test- och driftmiljö var omotiverad dyr jämfört med dagens molnbaserade alternativ.


Diabol började med att kartlägga problemen och först och främst ta kontroll över kodbasen som var utspridd över flera versionshanteringssytem. All kod flyttades till Atlassian Bitbucket och en byggserver med Jenkins sattes upp för att på ett repeterbart sätt bygga och testa systemet. Vidare så valdes Nexus för att hantera beroenden och arkivera de artifakter som produceras av byggservern. Infrastruktur migrerades till Amazon AWS av både kostnadsmässiga skäl, men också för att kunna utnyttja moderna verktyg för automatisering och möjligheterna med dynamisk infrastruktur. Applikationslager flyttades till EC2 och databasen till RDS. Terraform valdes för att automatisera uppsättningen av resurser i AWS och Puppet introducerades för automatisk konfigurationshantering av servrar. En fullständig leveranspipeline med automatiskt deployment implementerades i Jenkins.


Migrering till Amazon AWS har lett till drastiskt minskade driftkostnader för Abdona. Därtill har man nu en skalbar modern infrastruktur, fullständig spårbarhet och en automatisk leveranskedja som garanterar hög kvalitet och korta ledtider. Systemet är helt och hållet rekonstruerbart från kodbasen och testmiljöer kan skapas helt automatiskt vid behov.

Continuous Delivery for us all

During the last years I have taken a great interest in Continuous Delivery, or CD, and DevOps because the possibilities they give are very tempting, like:

  • releasing tested and tracable code to production for example an hour after check-in, all without any drama or escalation involved
  • giving the business side the control over when a function is installed or activated in production
  • bringing down organizational boundries, working more collaboratively and letting the teams get more responsibility and control over their systems and their situation
  • decreasing the amount of meetings and manual work that is being done

My CD interest grew to the point that I wanted to work more focused with this area so I started looking for a new job but when scanning the market I ran into a problem. If you do a search for work in Sweden within CD, and look at the knowledge requested, it is often not C#, MSBuild, TFS and Windows Server they are looking for and most of my background, knowledge and work experience is within that stack. This also concurred with my former experience because looking at other companies that are considered in the forefront as Google, Netflix, Amazon and Spotify they do not have their base in Microsoft technology.

At my former workplace, where we mainly used Microsoft products, we were a few driving forces who promoted and educated people in CD and also tried to implement it via a pilot project. Working with this project we never felt that Microsoft technology made it impossible (or even hard) to implement a CD way of working as it works regardless of your underlying technology. So why are Microsoft users not so good at being in the front, or at least not showing it, because CD is there and is possible to achieve with some hard work. My reflection over why is that Microsoft users (generally)

  • are more accustomed to using Microsoft technology and do not look around for what can complete or improve their situation. Linux and Java users are for example more used to finding and using products that solve more specific problems
  • don’t think that other products can be added in a smooth way to their environment and way of working
  • don’t have the same drive around technology as for example Linux and Java users, a drive they also are eager to show
  • can be a little content and don’t question their situation, or they see the problems but neglect to respond to them

This is something I want to change so more Microsoft based companies are shown in the “forefront” because all companies with IT, regardless of their choice of technology, have great benefits to gain from CD. Puppet Labs yearly conduct a “State Of DevOps” survey* to see how DevOps and CD is accepted in the world of it** and what difference, if any, that this makes. If you look at the result from the survey the results are very clear (https://puppetlabs.com/sites/default/files/2015-state-of-devops-report.pdf):

  • High performing IT organizations release code 30 times more frequent with 200 times shorter leadtime. They also have 60 times less “failures” and recover 168 times faster
  • Lean management and continuous delivery practices create the conditions for delivering value faster, sustainably
  • An environment with freedom and responsibility where it invests in the people and for example automation of releases give a lot not only to the employees but also to the organization
  • Being high performing is (under some conditions) achievable whether you work with greenfield, brownfield or legacy systems. If you don’t have the conditions right now that is something to work towards

To help you on your journey towards an effective, reliable and frequent delivery Diabol has developed a CD Maturity Model (http://www.infoq.com/articles/Continuous-Delivery-Maturity-Model) and this you can use to evaluate your current situation and see what abilities you need to evolve in order to be a high performing it-organization. And to be called high performing you need to be in at least an Advanced level in all dimensions, with a few exceptions that are motivated by your organizations circumstances. But remember that for every step on the model that you take you have great benefits and improvements to gain.

So if you work at a company which releases every 1, 3 or 6 months where every release is a minor or major project, how would it feel if you instead could release the code after every sprint, every week or even every hour without so much as a raised eyebrow? How would it be to know exactly what code is installed in each environment and have new code installed right after check-in? How about knowing how an environment is configured and also know the last time it was changed and why? This is all possible so let us at Diabol help you get there and you are of course welcome regardless of your choice in technology.

*If you would like to know more about the report from Puppet Labs and how it is made I can recommend reading http://itrevolution.com/the-science-behind-the-2013-puppet-labs-devops-survey-of-practice/

**I don’t mean that CD and DevOps are the same but this can be the topic of another blog post. They though have a lot in common and both are mentioned by Puppet Labs


Top class Continuous Delivery in AWS

Last week Diabol arranged a workshop in Stockholm where we invited Amazon together with Klarna and Volvo Group Telematics that both practise advanced Continuous Delivery in AWS. These companies are in many ways pioneers in this area as there is little in terms of established practices. We want to encourage and facilitate cross company knowledge sharing and take Continuous Delivery to the next level. The participants have very different businesses, processes and architectures but still struggle with similar challenges when building delivery pipelines for AWS. Below follows a short summary of some of the topics covered in the workshop.

Centralization and standardization vs. fully autonomous teams

One of the most interesting discussions among the participants wasn’t even technical but covered the differences in how they are organized and how that affects the work with Continuous Delivery and AWS. Some come from a traditional functional organisation and have placed their delivery teams somewhere in between development teams and the operations team. The advantages being that they have been able to standardize the delivery platform to a large extent and have a very high level of reuse. They have built custom tools and standardized services that all teams are more or less forced to use This approach depends on being able to keep at least one step ahead of the dev teams and being able to scale out to many dev teams without increasing headcount. One problem with this approach is that it is hard to build deep AWS knowledge out in the dev teams since they feel detached from the technical implementation. Others have a very explicit strategy of team autonomy where each team basically is in charge of their complete process all the way to production. In this case each team must have a quite deep competence both about AWS and the delivery pipelines and how they are set up. The production awareness is extremely high and you can e.g. visualize each team’s cost of AWS resources. One problem with this approach is a lower level of reusability and difficulties in sharing knowledge and implementation between teams.

Both of these approaches have pros and cons but in the end I think less silos and more team empowerment wins. If you can manage that and still build a common delivery infrastructure that scales, you are in a very good position.

Infrastructure as code

Another topic that was thoroughly covered was different ways to deploy both applications and infrastructure to AWS. CloudFormation is popular and very powerful but has its shortcomings in some scenarios. One participant felt that CF is too verbose and noisy and have built their own YAML configuration language on top of CF. They have been able to do this since they have a strong standardization of their micro-service architecture and the deployment structure that follows. Other participants felt the same problem with CF being too noisy and have broken out a large portion of configuration from the stack templates to Ansible, leaving just the core infrastructure resources in CF. This also allows them to apply different deployment patterns and more advanced orchestration. We also briefly discussed 3:rd part tools, e.g. Terraform, but the general opinion was that they all have a hard time keeping up with features in AWS. On the other hand, if you have infrastructure outside AWS that needs to be managed in conjunction with what you have in AWS, Terraform might be a compelling option. Both participants expressed that they would like to see some kind of execution plan / dry-run feature in CF much like Terraform have.

Docker on AWS

Use of Docker is growing quickly right now and was not surprisingly a hot topic at the workshop. One participant described how they deploy their micro-services in Docker containers with the obvious advantage of being portable and lightweight (compared to baking AMI’s). This is however done with stand-alone EC2-instances using a shared common base AMI and not on ECS, an approach that adds redundant infrastructure layers to the stack. They have just started exploring ECS and it looks promising but questions around how to manage centralized logging, monitoring, disk encryption etc are still not clear. Docker is a very compelling deployment alternative but both Docker itself and the surrounding infrastructure need to mature a bit more, e.g. docker push takes an unreasonable long time and easily becomes a bottleneck in your delivery pipelines. Another pain is the need for a private Docker registry that on this level of continuous delivery needs to be highly available and secure.

What’s missing?

The discussions also identified some feature requests for Amazon to bring home. E.g. we discussed security quite a lot and got into the technicalities of AIM-roles, accounts, security groups etc. It was expressed that there might be a need for explicit compliance checks and controls as a complement to the more crude ways with e.g. PEN-testing. You can certainly do this by extracting information from the API’s and process it according to your specific compliance rules, but it would be nice if there was a higher level of support for this from AWS.

We also discussed canarie releasing and A/B testing. Since this is becoming more of a common practice it would be nice if Amazon could provide more services to support this, e.g. content based routing and more sophisticated analytic tools.

Next step

All-in-all I think the workshop was very successful and the discussions and experience sharing was valuable to all participants. Diabol will continue to push Continuous Delivery maturity in the industry by arranging meetups and workshops and involve more companies that can contribute and benefit from this collaboration.  


Conference retrospectives

The first week of June was a busy one for us with representation on a couple of conferences taking place in Stockholm, the inaugural CoDe Continuous Delivery & DevOps conference on June 2nd and Agila Sverige (Agile Sweden) on June 3rd and 4th. Here’s a small retrospective on both of those events.

The CoDe conference was quite small (a bit less than 200 people) but had a very good vibe to it. We were silver sponsors of the event and we had our Jenkins Delivery Pipeline Plugin on display in our booth which gained a lot of interest from attendees. It led to many interesting discussions on CI/CD servers, automation, capabilities and visualization. Our feeling was that the attendees had a lot of genuine interest, knowledge and awareness about Continuous Delivery which fueled these interesting discussions.

Our Marcus Philip presented “Continuous Delivery for Databases” at the CoDe conference, where he talked about how to transition a legacy database application with a lot of PL/SQL code into a streamlined process where all changes are version controlled, traceable and automatically deployed. The talk covered the whole spectrum of the problem, starting with why they weren’t doing Continuous Delivery, why they should do it, thoughts on different solutions to the problem and how they did the actual implementation and how it panned out. Many attendees enjoyed the presentation and Marcus also got the opportunity to present the talk again at the Oracle Stockholm Meetup on June 16th. Cool!

Slides from the talk can be found here: https://speakerdeck.com/marcusphi/continuous-delivery-for-databases.

Agila Sverige is a conference which strongly encourages discussions among the conference attendees. All talks on the conference are lightning talks and there is plenty of time allocated for open space discussions and breaks, allowing you to network and share experiences with others. The conference is in Swedish but is probably one of the best when it comes to agile software development practices. Two or three years ago many talks and discussions on this conference focused on “why to practice agile development practices”. This year it was apparent that the industry has matured on all levels since the majority of talks and open space sessions rather focused on “how to practice agile development practices”.

Tommy Tynjä presented a visionary talk on how tomorrow’s software can be structured and delivered in his presentation “Next Generation Continuous Delivery”. The talk gained a lot of interest, the room was packed for the presentation and Tommy had some interesting discussions on the topic afterwards.

Slides from his talk can be found here: http://www.slideshare.net/tommysdk/next-generation-continuous-delivery
There is also a video recording available at: https://agilasverige.solidtango.com/video/next-generation-continuous-delivery.

We give our thumbs up for both of these conferences and we hope to see you on any of these events next year as well! We always look forward to meet people to discuss thoughts, experiences, problems and visions regarding Continuous Delivery, DevOps and agile software development methodologies. But if you don’t want to wait until next year, don’t hesitate to contact us!