AWS Summit recap

This week, the annual AWS Summit took place in sunny Stockholm. This article aims to provide a recap of my impressions from the event.

It was evident that the event had grown from last year, with approximately 2000 people attending this year’s one day event at Waterfront Congress Centre. Only a few session were technical as most of the presentations just gave an overview of the different services and various use cases. I really appreciated the talks from different AWS customers who spoke about their use of AWS technologies and what problems they solved and how. I found it valuable to hear from different companies on how they leverage certain products in their production environments.

The opening keynote was long (2 hours!) and included a lot of sales talk. The main keynote speaker mentioned that 20 percent of the audience had never used any AWS services at all, which explains the thorough walkthrough of the different AWS products. One product which stood out was Amazon Inspector, which can detect and remediate security issues early in your AWS environment. It is not yet available in all regions, but is available in e.g. eu-west-1 (Ireland). It was also interesting to hear about migration of large amounts of data using Snowball, a physical device shipped to your datacenter, which allows you to move your data faster than over the Internet (except for the physical delivery of the device to and from your own datacenter).

It is undeniable that Internet of Things (IoT) is gaining traction and that the amount of connected devices around has grown exponentially the past few years. AWS provides several services for developing and running IoT services. With AWS IoT, your devices can securely communicate with your backend servers. What I found most interesting was the concept of device shadows. A shadow is an interface which allows you to communicate with a device even though it would be offline at the moment. In your application, you can communicate with the shadow without the need to care about whether the device is online or not. If you want to change the state of a device currently offline, you will update the shadow and when the device connects again, it will get the new desired state from the shadow.

At the startup track, we got to hear how Mojang leverages AWS for their Minecraft Realm concept. Instead of letting external parties host their game servers, they decided to go with AWS for Minecraft Realm, to allow for a more flexible infrastructure. An interesting aspect is that they had to develop their own algorithm for scaling out quickly, as in a gaming environment it is not acceptable to wait for five minutes for an auto scaling group to spin up new machines to meet the current demand from users. Instead, they have to use quite large instance types and have new servers on standby to be able to take on new traffic as it arrives. It is not trivial either to terminate instances where there is people playing, even though only a few, that wouldn’t provide a good user experience. Instead, they kindly inform the user that the server will terminate in five minutes and that usually makes the users change server. Not ideal but live migration is too far away at the moment. They still use old EC2 classic instances and they will have to do some heavy lifting to modernise their stack on AWS.

There was also a presentation from QuizUp on how they use infrastructure as code with Terraform to manage their AWS resources. A benefit they get from using Terraform instead of Cloudformation is to get an execution plan before actually applying changes. The drawback is that it is not possible to query Terraform for the current resources and their state directly from AWS.

In the world of relational databases in AWS (RDS), Aurora is an AWS developed database to maximise reliability, scalability and cost-effectiveness. It delivers up to five times the throughput of a standard MySQL running on the same hardware. It is designed to scale and to handle failures. It even provides an SQL extension to simulate failures:
ALTER SYSTEM CRASH [INSTANCE | DISPATCHER | NODE ];
ALTER SYSTEM SIMULATE percentage_of_failure PERCENT
* READ REPLICA FAILURE
* DISK FAILURE
* DISK CONGESTION
FOR INTERVAL quantity [ YEAR | QUARTER | MONTH | WEEK | DAY | HOUR | MINUTE | SECOND ]

Probably the most interesting session of the day was about serverless architecture using AWS Lambda. Lambda allows you to upload snippets of code, functions to AWS which runs them for you. No need to provision servers or think about scalability, AWS does that for you and you only pay for the time your code executes in units of 100 ms. The best thing about this talk was the peek under the hood. AWS leverages Linux containers (not Docker) to isolate the resources of the uploaded functions and to be able to run and scale these quickly. It also offers predictive capacity planning. An interesting part is that you can upload libraries which your code depends on as part of your function, so you could basically run a small microservice just by using Lambda. To deploy your function, you package it in a zip archive and use Cloudformation (specified as type AWS::Lambda::Function). You’re able to run your function inside of your VPC and thus leverage other resources available within your VPC.

All in all I thought this was a great event. If you didn’t attend I really recommend attending the next one – especially if you’re already using AWS.

As we at Diabol are standard partners with Amazon, not only can we assist you with your cloud platform strategies but also to tie that together with the full view of your systems development process. Don’t hesitate to contact us!

You can read more about us at diabol.se.

Tommy Tynjä
@tommysdk

The power of Jenkins JobDSL

At my current client we define all of our Jenkins jobs with Jenkins Job Builder so that we can have all of our job configurations version controlled. Every push to the version control repository which hosts the configuration files will trigger an update of the jobs in Jenkins so that we can make sure that the state in version control is always reflected in our Jenkins setup.

We have six jobs that are essentially the same for each of our projects but with only a slight change of the input parameters to each job. These jobs are responsible for deploying the application binaries to different environments.

Jenkins Job Builder supports templating, but our experience is that those templates usually will be quite hard to maintain in the long run. And since it is just YAML files, you’re quite restricted to what you’re able to do. JobDSL Jenkins job configurations on the other hand are built using Groovy. It allows you to insert actual code execution in your job configurations scripts which can be very powerful.

In our case, with JobDSL we can easily just create one common job configuration which we then just iterate over with the job specific parameters to create the necessary jobs for each environment. We also can use some utility methods written in Groovy which we can invoke in our job configurations. So instead of having to maintain similar Jenkins Job Builder configurations in YAML, we can do it much more consise with JobDSL.

Below, an example of a JobDSL configuration file (Groovy code), which generates six jobs according to the parameterized job template:

class Utils {
    static String environment(String qualifier) { qualifier.substring(0, qualifier.indexOf('-')).toUpperCase() }
    static String environmentType(String qualifier) { qualifier.substring(qualifier.indexOf('-') + 1)}
}

[
    [qualifier: "us-staging"],
    [qualifier: "eu-staging"],
    [qualifier: "us-uat"],
    [qualifier: "eu-uat"],
    [qualifier: "us-live"],
    [qualifier: "eu-live"]
].each { Map environment ->

    job("myproject-deploy-${environment.qualifier}") {
        description "Deploy my project to ${environment.qualifier}"
        parameters {
            stringParam('GIT_SHA', null, null)
        }
        scm {
            git {
                remote {
                    url('ssh://git@my_git_repo/myproject.git')
                    credentials('jenkins-credentials')
                }
                branch('$GIT_SHA')
            }
        }
        deliveryPipelineConfiguration("Deploy to " + Utils.environmentType("${environment.qualifier}"),
                Utils.environment("${environment.qualifier}") + " environment")
        logRotator(-1, 30, -1, -1)
        steps {
            shell("""deployment/azure/deploy.sh ${environment.qualifier}""")
        }
    }
}

If you need help getting your Jenkins configuration into good shape, just contact us and we will be happy to help you! You can read more about us at diabol.se.

Tommy Tynjä
@tommysdk

Choosing Atlassian cloud or on-premise, things to consider

Many companies are considering moving off from Atlassian cloud instance to a self hosted (on-premise), or hiring a third party hosting service.  Here are some pros and cons (for and against) that you and your company should consider:

Pros (on-premise):
+ Customization in the source code
+ Access to the entire Atlassian Plugins management library
+ Don’t need to force the application upgrade
+ Access to the log files
+ Access to the DB
+ Potential cost savings from the licensing fees
+ Restrict the network access (e.g: host JIRA in a server only accessible to your company using a VPN)
+ Commercial and Academic licenses give you another developer license where you can use in a second instance (Usually for non-production where you can replicate your production instance and run some tests related to new features, plugins, upgrade…)
+ Use your domain name
+ No storage limit
+ LDAP

Cons:
– Concern about a self- or CA signed certificate (probably you want to enforce SSL)
– Server administration (Including the allocated resources)
– Upgrade plan
– Maintaining your own infrastructure
– Reliability (If your production instance goes down?)

However, as a best solution in order to you figure out what would better fit your company needs, I’d advise you to generate an evaluation license (my.atlassian.com) and give a try in a test server (Importing your Cloud data) to see if it will better fulfill your final goal. Instruction to do that can be found here.


Being an Atlassian Experts in Sweden, Diabol can help you the tools improve your Atlassian usage. Just contact us. You can read more about us at diabol.se.

Diabol hjälper Klarna utveckla en ny plattform och att bli experter på Continuous Delivery

Klarna har sedan starten 2005 haft en kraftig tillväxt och på mycket kort tid växt till ett företag med över 1000 anställda. För att möta den globala marknadens behov av sina betalningstjänster behövde Klarna göra stora förändringar i både teknik, organisation och processer. Klarna anlitade konsulter från Diabol för att nå sina högt satta mål med utveckling av en ny tjänsteplattform och bli ledande inom DevOps och Continuous Delivery.

Utmaning

Efter stora framgångar på den nordiska marknaden och flera år av stark tillväxt behövde Klarna utveckla en ny plattform för sina betalningstjänster för att kunna möta den globala marknaden. Den nya plattformen skulle hantera miljontals transaktioner dagligen och vara robust, skalbar och samtidigt stödja ett agilt arbetssätt med snabba förändringar i en växande organisation. Tidplanen var mycket utmanande och förutom utveckling av alla tjänster behövde man förändra både arbetssätt och infrastruktur för att möta utmaningarna med stor skalbarhet och korta ledtider.

Lösning

Diabols erfarna konsulter med expertkompetens inom Java, DevOps och Continuous Delivery fick förtroendet att stärka upp utvecklingsteamen för att ta fram den nya plattformen och samtidigt automatisera releaseprocessen med bl.a. molnteknik från Amazon AWS. Kompetens kring automatisering och verktyg byggdes även upp i ett internt supportteam med syfte att stödja utvecklingsteamen med verktyg och processer för att snabbt, säkert och automatiserat kunna leverera sina tjänster oberoende av varandra. Diabol hade en central roll i detta team och agerade som coach för Continuous Delivery och DevOps brett i utvecklings- och driftorganisationen.

Resultat

Klarna kunde på rekordtid gå live med den nya plattformen och öppna upp på flera stora internationella marknader. Autonoma utvecklingsteam med stort leveransfokus kan idag på egen hand leverera förändringar och ny funktionalitet till produktion helt automatiskt vilket vid behov kan vara flera gånger om dagen.

Uttömmande automatiserade tester körs kontinuerligt vid varje kodförändring och uppsättning av testmiljöer i AWS sker också helt automatiserat. En del team praktiserar även s.k. “continuous deployment” och levererar kodändringar till sina produktionsmiljöer utan någon som helst manuell handpåläggning.

“Diabol har varit en nyckelspelare för att uppnå våra högt ställda mål inom DevOps och Continuous Delivery.”

– Tobias Palmborg, Manager Engineering Support, Klarna

 

 

Diabol migrerar Abdona till AWS och inför en automatiserad leveransprocess

Abdona tillhandahåller tjänster för affärsresehantering till ett flertal organisationer i offentlig sektor. I samband med en större utvecklingsinsats vill man också se över infrastrukturen för drift och testmiljöer för att minska kostnader och på ett säkert sätt kunna garantera hög kvalité och korta leveranstider. Diabol anlitades för ett helhetsåtagande att modernisera infrastruktur, utvecklingsmiljö, test- och leveransprocess.

Utmaning

Abdonas system består av en klassisk 3-lagersarkitektur i Java Enterprise och sedan lanseringen för 7 år sedan har endast mindre uppdateringar skett. Teknik och infrastruktur har inte uppdaterats och har med tiden blivit förlegade och svårhanterliga. Manuellt konfigurerade servrar, undermålig dokumentation och spårbarhet, knapphändig versionshantering, ingen kontinuerlig integration eller stabil byggmiljö, manuell test och deployment. Förutom dessa strukturella problem var kostnaden för hårdvara som satts upp manuellt för både test- och driftmiljö var omotiverad dyr jämfört med dagens molnbaserade alternativ.

Lösning

Diabol började med att kartlägga problemen och först och främst ta kontroll över kodbasen som var utspridd över flera versionshanteringssytem. All kod flyttades till Atlassian Bitbucket och en byggserver med Jenkins sattes upp för att på ett repeterbart sätt bygga och testa systemet. Vidare så valdes Nexus för att hantera beroenden och arkivera de artifakter som produceras av byggservern. Infrastruktur migrerades till Amazon AWS av både kostnadsmässiga skäl, men också för att kunna utnyttja moderna verktyg för automatisering och möjligheterna med dynamisk infrastruktur. Applikationslager flyttades till EC2 och databasen till RDS. Terraform valdes för att automatisera uppsättningen av resurser i AWS och Puppet introducerades för automatisk konfigurationshantering av servrar. En fullständig leveranspipeline med automatiskt deployment implementerades i Jenkins.

Resultat

Migrering till Amazon AWS har lett till drastiskt minskade driftkostnader för Abdona. Därtill har man nu en skalbar modern infrastruktur, fullständig spårbarhet och en automatisk leveranskedja som garanterar hög kvalitet och korta ledtider. Systemet är helt och hållet rekonstruerbart från kodbasen och testmiljöer kan skapas helt automatiskt vid behov.

Puppet resource command

I have used puppet for several years but had overlooked the puppet resource command until now. This command uses the Puppet RAL (Resource Abstraction Layer, i.e. Puppet DSL) to directly interact with the system. What that means in plain language is that you can easily reverse engineer a system and get information about it directly in puppet format on the command line.

The basic syntax is: puppet resource type [name]

Some examples from my Mac will make it more clear:

Get info about a given resource: me

$ puppet resource user marcus
user { 'marcus':
 ensure => 'present',
 comment => 'Marcus Philip',
 gid => '20',
 groups => ['_appserveradm', '_appserverusr', '_lpadmin', 'admin'],
 home => '/Users/marcus',
 password => '*',
 shell => '/bin/bash',
 uid => '501',
}

Get info about all resources of a given type: users

$ puppet resource user
user { '_amavisd':
 ensure => 'present',
 comment => 'AMaViS Daemon',
 gid => '83',
 home => '/var/virusmails',
 password => '*',
 shell => '/usr/bin/false',
 uid => '83',
}
user { '_appleevents':
 ensure => 'present',
 comment => 'AppleEvents Daemon',
 gid => '55',
 home => '/var/empty',
 password => '*',
 shell => '/usr/bin/false',
 uid => '55',
}
...
user { 'root': ensure => 'present',
 comment => 'System Administrator',
 gid => '0',
 groups => ['admin', 'certusers', 'daemon', 'kmem', 'operator', 'procmod', 'procview', 'staff', 'sys', 'tty', 'wheel'],
 home => '/var/root',
 password => '*',
 shell => '/bin/sh',
 uid => '0',
}

One use case for this command could perhaps be to extract into version controlled code the state of a an existing server that I want to put under puppet management.

With the --edit flag the output is sent to a buffer that can be edited and then applied. And with attribute=value you can set attributes directly from command line.

However, I think I will only use this command for read operations. The reason for this is that I think that the main benefit of puppet and other tools of the same ilk is not the abstractions it provides in itself, but rather the capability to treat the infrastructure as code, i.e. under version control and testable.

The reason I have overlooked this command is probably because I’ve mainly been applying puppet to new, ‘greenfield’ servers, and because, as a software developer, I’m used to work from the model side rather than tinkering directly with the system on the command line. Anyway, now I have another tool in the belt. Always feel good.

Continuous Delivery for us all

During the last years I have taken a great interest in Continuous Delivery, or CD, and DevOps because the possibilities they give are very tempting, like:

  • releasing tested and tracable code to production for example an hour after check-in, all without any drama or escalation involved
  • giving the business side the control over when a function is installed or activated in production
  • bringing down organizational boundries, working more collaboratively and letting the teams get more responsibility and control over their systems and their situation
  • decreasing the amount of meetings and manual work that is being done

My CD interest grew to the point that I wanted to work more focused with this area so I started looking for a new job but when scanning the market I ran into a problem. If you do a search for work in Sweden within CD, and look at the knowledge requested, it is often not C#, MSBuild, TFS and Windows Server they are looking for and most of my background, knowledge and work experience is within that stack. This also concurred with my former experience because looking at other companies that are considered in the forefront as Google, Netflix, Amazon and Spotify they do not have their base in Microsoft technology.

At my former workplace, where we mainly used Microsoft products, we were a few driving forces who promoted and educated people in CD and also tried to implement it via a pilot project. Working with this project we never felt that Microsoft technology made it impossible (or even hard) to implement a CD way of working as it works regardless of your underlying technology. So why are Microsoft users not so good at being in the front, or at least not showing it, because CD is there and is possible to achieve with some hard work. My reflection over why is that Microsoft users (generally)

  • are more accustomed to using Microsoft technology and do not look around for what can complete or improve their situation. Linux and Java users are for example more used to finding and using products that solve more specific problems
  • don’t think that other products can be added in a smooth way to their environment and way of working
  • don’t have the same drive around technology as for example Linux and Java users, a drive they also are eager to show
  • can be a little content and don’t question their situation, or they see the problems but neglect to respond to them

This is something I want to change so more Microsoft based companies are shown in the “forefront” because all companies with IT, regardless of their choice of technology, have great benefits to gain from CD. Puppet Labs yearly conduct a “State Of DevOps” survey* to see how DevOps and CD is accepted in the world of it** and what difference, if any, that this makes. If you look at the result from the survey the results are very clear (https://puppetlabs.com/sites/default/files/2015-state-of-devops-report.pdf):

  • High performing IT organizations release code 30 times more frequent with 200 times shorter leadtime. They also have 60 times less “failures” and recover 168 times faster
  • Lean management and continuous delivery practices create the conditions for delivering value faster, sustainably
  • An environment with freedom and responsibility where it invests in the people and for example automation of releases give a lot not only to the employees but also to the organization
  • Being high performing is (under some conditions) achievable whether you work with greenfield, brownfield or legacy systems. If you don’t have the conditions right now that is something to work towards

To help you on your journey towards an effective, reliable and frequent delivery Diabol has developed a CD Maturity Model (http://www.infoq.com/articles/Continuous-Delivery-Maturity-Model) and this you can use to evaluate your current situation and see what abilities you need to evolve in order to be a high performing it-organization. And to be called high performing you need to be in at least an Advanced level in all dimensions, with a few exceptions that are motivated by your organizations circumstances. But remember that for every step on the model that you take you have great benefits and improvements to gain.

So if you work at a company which releases every 1, 3 or 6 months where every release is a minor or major project, how would it feel if you instead could release the code after every sprint, every week or even every hour without so much as a raised eyebrow? How would it be to know exactly what code is installed in each environment and have new code installed right after check-in? How about knowing how an environment is configured and also know the last time it was changed and why? This is all possible so let us at Diabol help you get there and you are of course welcome regardless of your choice in technology.

*If you would like to know more about the report from Puppet Labs and how it is made I can recommend reading http://itrevolution.com/the-science-behind-the-2013-puppet-labs-devops-survey-of-practice/

**I don’t mean that CD and DevOps are the same but this can be the topic of another blog post. They though have a lot in common and both are mentioned by Puppet Labs

 

Slow Jenkins Start up – Think twice before updating files in a loop

Some time ago I spent a day trying to figure out why our Jenkins Master is so slow (~30min) to start up. We have around 1000 jobs and about 100 plugins. The large amount of jobs makes us hit some performance issues that never is an issue in smaller installations. The number of plugins, makes the list of possible culprits long. Furthermore, it might be a combination of plugins causing the problem. And we may in fact have several separate issues.

Template Project + Job Config History = @#*$%*!?!

One issue that we have found is the combination of Template Project and Job Config History plugins. The Template Project implements ItemListener.onLoaded() and in a loop updates (twice) all projects using it (and we use it a lot). However, this seems to be some workaround that never(?) actually does any real work. Since the job is updated, this will trigger the Job Config History plugin (and maybe others listening to this). Which of course the Template plugin didn’t account for.

The Job Config History is potentially writing several times to disk for each job when triggered. Disk I/O is, as every programmer should know, relatively slow, but one write operation per job would be acceptable at startup if there is no other way. HOWEVER, there is a sleep 500 ms statement to avoid some clashing when writing to disk. 500 ms is an eon in computer world! Disk operations are normally a lot faster than that. 50 ms would be more reasonable value, provided that you can’t avoid a sleep call completely.

1000 jobs x 2 calls/job x 500 ms = 1000 s ? 17 minutes !

Oops! Well, that explains a large part of our slow startup time. The funny thing is that load, disk I/O, cpu and memory is low, we’re mostly just waiting. I had to do thread analysis to find this problem.

I reported JENKINS-24915. It is still reported unresolved, but may of course be resolved anyway. I haven’t tested recently since we choose to uninstall the job config history plugin, even though we like it and you could argue that it’s the template plugin that is

Summary

Jenkins loads all jobs on startup, which is reasonable. But if, for a large number of  jobs, this triggers a slow operation, then all hell can break loose.

In general, you should think twice before you implement remote or disk calls in a loop. Specifically for Jenkins, doing it in an event method that potentially is called for all jobs is not a good idea.

How hard is continuous delivery for the database?

The CTO of DBmaestro recently blogged http://www.dbmaestro.com/2015/11/why-do-we-talk-about-devops-like-its-a-new-concept/ where he argues that despite  devops being a several years old idea (and agile a lot older) “major companies are still not getting it right when it comes to DevOps and Agile, for that reason, they ARE relatively new concepts“. Furthermore, he concludes that “We need to bring the same processes for source code continuous delivery to DevOps for the Database“.

I definitely agree with this. Continuous delivery is necessary also for the DB layer. But further on he states that “ensuring safe continuous delivery of databases is not so simple“, and “database deployment automation is not a simple process“. Here our opinions may diverge slightly – I don’t think we need to emphasize the difficulties of bringing CD to the DB. Of course, no change in a software development process that impacts several people is ever very simple. But it’s not necessarily harder than continuous delivery of applications – even with only using free open source tools, or maybe precisely when using free open source tools. However, the challenges typically lies in the people part. If the team acknowledges that the current practices are a problem and is given the mandate to change, it is pretty straightforward.

Finally I must add that often the DB problem should be solved by simplifying things instead of building elaborate tooling to be able to continue with sub-optimal practices. This is also a largely a people problem.

We Continuously Deliver!