Category Archives: Infrastructure

AWS Summit recap

This week, the annual AWS Summit took place in sunny Stockholm. This article aims to provide a recap of my impressions from the event.

It was evident that the event had grown from last year, with approximately 2000 people attending this year’s one day event at Waterfront Congress Centre. Only a few session were technical as most of the presentations just gave an overview of the different services and various use cases. I really appreciated the talks from different AWS customers who spoke about their use of AWS technologies and what problems they solved and how. I found it valuable to hear from different companies on how they leverage certain products in their production environments.

The opening keynote was long (2 hours!) and included a lot of sales talk. The main keynote speaker mentioned that 20 percent of the audience had never used any AWS services at all, which explains the thorough walkthrough of the different AWS products. One product which stood out was Amazon Inspector, which can detect and remediate security issues early in your AWS environment. It is not yet available in all regions, but is available in e.g. eu-west-1 (Ireland). It was also interesting to hear about migration of large amounts of data using Snowball, a physical device shipped to your datacenter, which allows you to move your data faster than over the Internet (except for the physical delivery of the device to and from your own datacenter).

It is undeniable that Internet of Things (IoT) is gaining traction and that the amount of connected devices around has grown exponentially the past few years. AWS provides several services for developing and running IoT services. With AWS IoT, your devices can securely communicate with your backend servers. What I found most interesting was the concept of device shadows. A shadow is an interface which allows you to communicate with a device even though it would be offline at the moment. In your application, you can communicate with the shadow without the need to care about whether the device is online or not. If you want to change the state of a device currently offline, you will update the shadow and when the device connects again, it will get the new desired state from the shadow.

At the startup track, we got to hear how Mojang leverages AWS for their Minecraft Realm concept. Instead of letting external parties host their game servers, they decided to go with AWS for Minecraft Realm, to allow for a more flexible infrastructure. An interesting aspect is that they had to develop their own algorithm for scaling out quickly, as in a gaming environment it is not acceptable to wait for five minutes for an auto scaling group to spin up new machines to meet the current demand from users. Instead, they have to use quite large instance types and have new servers on standby to be able to take on new traffic as it arrives. It is not trivial either to terminate instances where there is people playing, even though only a few, that wouldn’t provide a good user experience. Instead, they kindly inform the user that the server will terminate in five minutes and that usually makes the users change server. Not ideal but live migration is too far away at the moment. They still use old EC2 classic instances and they will have to do some heavy lifting to modernise their stack on AWS.

There was also a presentation from QuizUp on how they use infrastructure as code with Terraform to manage their AWS resources. A benefit they get from using Terraform instead of Cloudformation is to get an execution plan before actually applying changes. The drawback is that it is not possible to query Terraform for the current resources and their state directly from AWS.

In the world of relational databases in AWS (RDS), Aurora is an AWS developed database to maximise reliability, scalability and cost-effectiveness. It delivers up to five times the throughput of a standard MySQL running on the same hardware. It is designed to scale and to handle failures. It even provides an SQL extension to simulate failures:
ALTER SYSTEM CRASH [INSTANCE | DISPATCHER | NODE ];
ALTER SYSTEM SIMULATE percentage_of_failure PERCENT
* READ REPLICA FAILURE
* DISK FAILURE
* DISK CONGESTION
FOR INTERVAL quantity [ YEAR | QUARTER | MONTH | WEEK | DAY | HOUR | MINUTE | SECOND ]

Probably the most interesting session of the day was about serverless architecture using AWS Lambda. Lambda allows you to upload snippets of code, functions to AWS which runs them for you. No need to provision servers or think about scalability, AWS does that for you and you only pay for the time your code executes in units of 100 ms. The best thing about this talk was the peek under the hood. AWS leverages Linux containers (not Docker) to isolate the resources of the uploaded functions and to be able to run and scale these quickly. It also offers predictive capacity planning. An interesting part is that you can upload libraries which your code depends on as part of your function, so you could basically run a small microservice just by using Lambda. To deploy your function, you package it in a zip archive and use Cloudformation (specified as type AWS::Lambda::Function). You’re able to run your function inside of your VPC and thus leverage other resources available within your VPC.

All in all I thought this was a great event. If you didn’t attend I really recommend attending the next one – especially if you’re already using AWS.

As we at Diabol are standard partners with Amazon, not only can we assist you with your cloud platform strategies but also to tie that together with the full view of your systems development process. Don’t hesitate to contact us!

You can read more about us at diabol.se.

Tommy Tynjä
@tommysdk

Diabol hjälper Klarna utveckla en ny plattform och att bli experter på Continuous Delivery

Klarna har sedan starten 2005 haft en kraftig tillväxt och på mycket kort tid växt till ett företag med över 1000 anställda. För att möta den globala marknadens behov av sina betalningstjänster behövde Klarna göra stora förändringar i både teknik, organisation och processer. Klarna anlitade konsulter från Diabol för att nå sina högt satta mål med utveckling av en ny tjänsteplattform och bli ledande inom DevOps och Continuous Delivery.

Utmaning

Efter stora framgångar på den nordiska marknaden och flera år av stark tillväxt behövde Klarna utveckla en ny plattform för sina betalningstjänster för att kunna möta den globala marknaden. Den nya plattformen skulle hantera miljontals transaktioner dagligen och vara robust, skalbar och samtidigt stödja ett agilt arbetssätt med snabba förändringar i en växande organisation. Tidplanen var mycket utmanande och förutom utveckling av alla tjänster behövde man förändra både arbetssätt och infrastruktur för att möta utmaningarna med stor skalbarhet och korta ledtider.

Lösning

Diabols erfarna konsulter med expertkompetens inom Java, DevOps och Continuous Delivery fick förtroendet att stärka upp utvecklingsteamen för att ta fram den nya plattformen och samtidigt automatisera releaseprocessen med bl.a. molnteknik från Amazon AWS. Kompetens kring automatisering och verktyg byggdes även upp i ett internt supportteam med syfte att stödja utvecklingsteamen med verktyg och processer för att snabbt, säkert och automatiserat kunna leverera sina tjänster oberoende av varandra. Diabol hade en central roll i detta team och agerade som coach för Continuous Delivery och DevOps brett i utvecklings- och driftorganisationen.

Resultat

Klarna kunde på rekordtid gå live med den nya plattformen och öppna upp på flera stora internationella marknader. Autonoma utvecklingsteam med stort leveransfokus kan idag på egen hand leverera förändringar och ny funktionalitet till produktion helt automatiskt vilket vid behov kan vara flera gånger om dagen.

Uttömmande automatiserade tester körs kontinuerligt vid varje kodförändring och uppsättning av testmiljöer i AWS sker också helt automatiserat. En del team praktiserar även s.k. “continuous deployment” och levererar kodändringar till sina produktionsmiljöer utan någon som helst manuell handpåläggning.

“Diabol har varit en nyckelspelare för att uppnå våra högt ställda mål inom DevOps och Continuous Delivery.”

– Tobias Palmborg, Manager Engineering Support, Klarna

 

 

Puppet resource command

I have used puppet for several years but had overlooked the puppet resource command until now. This command uses the Puppet RAL (Resource Abstraction Layer, i.e. Puppet DSL) to directly interact with the system. What that means in plain language is that you can easily reverse engineer a system and get information about it directly in puppet format on the command line.

The basic syntax is: puppet resource type [name]

Some examples from my Mac will make it more clear:

Get info about a given resource: me

$ puppet resource user marcus
user { 'marcus':
 ensure => 'present',
 comment => 'Marcus Philip',
 gid => '20',
 groups => ['_appserveradm', '_appserverusr', '_lpadmin', 'admin'],
 home => '/Users/marcus',
 password => '*',
 shell => '/bin/bash',
 uid => '501',
}

Get info about all resources of a given type: users

$ puppet resource user
user { '_amavisd':
 ensure => 'present',
 comment => 'AMaViS Daemon',
 gid => '83',
 home => '/var/virusmails',
 password => '*',
 shell => '/usr/bin/false',
 uid => '83',
}
user { '_appleevents':
 ensure => 'present',
 comment => 'AppleEvents Daemon',
 gid => '55',
 home => '/var/empty',
 password => '*',
 shell => '/usr/bin/false',
 uid => '55',
}
...
user { 'root': ensure => 'present',
 comment => 'System Administrator',
 gid => '0',
 groups => ['admin', 'certusers', 'daemon', 'kmem', 'operator', 'procmod', 'procview', 'staff', 'sys', 'tty', 'wheel'],
 home => '/var/root',
 password => '*',
 shell => '/bin/sh',
 uid => '0',
}

One use case for this command could perhaps be to extract into version controlled code the state of a an existing server that I want to put under puppet management.

With the --edit flag the output is sent to a buffer that can be edited and then applied. And with attribute=value you can set attributes directly from command line.

However, I think I will only use this command for read operations. The reason for this is that I think that the main benefit of puppet and other tools of the same ilk is not the abstractions it provides in itself, but rather the capability to treat the infrastructure as code, i.e. under version control and testable.

The reason I have overlooked this command is probably because I’ve mainly been applying puppet to new, ‘greenfield’ servers, and because, as a software developer, I’m used to work from the model side rather than tinkering directly with the system on the command line. Anyway, now I have another tool in the belt. Always feel good.

Top class Continuous Delivery in AWS

Last week Diabol arranged a workshop in Stockholm where we invited Amazon together with Klarna and Volvo Group Telematics that both practise advanced Continuous Delivery in AWS. These companies are in many ways pioneers in this area as there is little in terms of established practices. We want to encourage and facilitate cross company knowledge sharing and take Continuous Delivery to the next level. The participants have very different businesses, processes and architectures but still struggle with similar challenges when building delivery pipelines for AWS. Below follows a short summary of some of the topics covered in the workshop.

Centralization and standardization vs. fully autonomous teams

One of the most interesting discussions among the participants wasn’t even technical but covered the differences in how they are organized and how that affects the work with Continuous Delivery and AWS. Some come from a traditional functional organisation and have placed their delivery teams somewhere in between development teams and the operations team. The advantages being that they have been able to standardize the delivery platform to a large extent and have a very high level of reuse. They have built custom tools and standardized services that all teams are more or less forced to use This approach depends on being able to keep at least one step ahead of the dev teams and being able to scale out to many dev teams without increasing headcount. One problem with this approach is that it is hard to build deep AWS knowledge out in the dev teams since they feel detached from the technical implementation. Others have a very explicit strategy of team autonomy where each team basically is in charge of their complete process all the way to production. In this case each team must have a quite deep competence both about AWS and the delivery pipelines and how they are set up. The production awareness is extremely high and you can e.g. visualize each team’s cost of AWS resources. One problem with this approach is a lower level of reusability and difficulties in sharing knowledge and implementation between teams.

Both of these approaches have pros and cons but in the end I think less silos and more team empowerment wins. If you can manage that and still build a common delivery infrastructure that scales, you are in a very good position.

Infrastructure as code

Another topic that was thoroughly covered was different ways to deploy both applications and infrastructure to AWS. CloudFormation is popular and very powerful but has its shortcomings in some scenarios. One participant felt that CF is too verbose and noisy and have built their own YAML configuration language on top of CF. They have been able to do this since they have a strong standardization of their micro-service architecture and the deployment structure that follows. Other participants felt the same problem with CF being too noisy and have broken out a large portion of configuration from the stack templates to Ansible, leaving just the core infrastructure resources in CF. This also allows them to apply different deployment patterns and more advanced orchestration. We also briefly discussed 3:rd part tools, e.g. Terraform, but the general opinion was that they all have a hard time keeping up with features in AWS. On the other hand, if you have infrastructure outside AWS that needs to be managed in conjunction with what you have in AWS, Terraform might be a compelling option. Both participants expressed that they would like to see some kind of execution plan / dry-run feature in CF much like Terraform have.

Docker on AWS

Use of Docker is growing quickly right now and was not surprisingly a hot topic at the workshop. One participant described how they deploy their micro-services in Docker containers with the obvious advantage of being portable and lightweight (compared to baking AMI’s). This is however done with stand-alone EC2-instances using a shared common base AMI and not on ECS, an approach that adds redundant infrastructure layers to the stack. They have just started exploring ECS and it looks promising but questions around how to manage centralized logging, monitoring, disk encryption etc are still not clear. Docker is a very compelling deployment alternative but both Docker itself and the surrounding infrastructure need to mature a bit more, e.g. docker push takes an unreasonable long time and easily becomes a bottleneck in your delivery pipelines. Another pain is the need for a private Docker registry that on this level of continuous delivery needs to be highly available and secure.

What’s missing?

The discussions also identified some feature requests for Amazon to bring home. E.g. we discussed security quite a lot and got into the technicalities of AIM-roles, accounts, security groups etc. It was expressed that there might be a need for explicit compliance checks and controls as a complement to the more crude ways with e.g. PEN-testing. You can certainly do this by extracting information from the API’s and process it according to your specific compliance rules, but it would be nice if there was a higher level of support for this from AWS.

We also discussed canarie releasing and A/B testing. Since this is becoming more of a common practice it would be nice if Amazon could provide more services to support this, e.g. content based routing and more sophisticated analytic tools.

Next step

All-in-all I think the workshop was very successful and the discussions and experience sharing was valuable to all participants. Diabol will continue to push Continuous Delivery maturity in the industry by arranging meetups and workshops and involve more companies that can contribute and benefit from this collaboration.  

 

AWS Cloudformation introduction

AWS Cloudformation is a concept for modelling and setting up your Amazon Web Services resources in an automatic and unified way. You create templates that describe your AWS resoruces, e.g. EC2 instances, EBS volumes, load balancers etc. and Cloudformation takes care of all provisioning and configuration of those resources. When Cloudformation provisiones resources, they are grouped into something called a stack. A stack is typically represented by one template.

The templates are specified in json, which allows you to easily version control your templates describing your AWS infrastrucutre along side your application code.

Cloudformation fits perfectly in Continuous Delivery ways of working, where the whole AWS infrastructure can be automatically setup and maintained by a delivery pipeline. There is no need for a human to provision instances, setting up security groups, DNS record sets etc, as long as the Cloudformation templates are developed and maintained properly.

You use the AWS CLI to execute the Cloudformation operations, e.g. to create or delete a stack. What we’ve done at my current client is to put the AWS CLI calls into certain bash scripts, which are version controlled along side the templates. This allows us to not having to remember all the options and arguments necessary to perform the Cloudformation operations. We can also use those scripts both on developer workstations and in the delivery pipeline.

A drawback of Cloudformation is that not all features of AWS are available through that interface, which might force you to create workarounds depending on your needs. For instance Cloudformation does not currently support the creation of private DNS hosted zones within a Virtual Private Cloud. We solved this by using the AWS CLI to create that private DNS hosted zone in the bash script responsible for setting up our DNS configuration, prior to performing the actual Cloudformation operation which makes use of that private DNS hosted zone.

As Cloudformation is a superb way for setting up resources in AWS, in contrast of managing those resources manually e.g. through the web UI, you can actually enforce restrictions on your account so that resources only can be created through Cloudformation. This is something that we currently use at my current client for our production environment setup to assure that the proper ways of workings are followed.

I’ve created a basic Cloudformation template example which can be found here.

Tommy Tynjä
@tommysdk

Agile Configuration Management – intermezzo

Why do I need agile configuration management?

The main reason for doing agile configuration management is that it’s a necessary means too achieve agile infrastructure and architecture. When you have an agile architecture it becomes easy to make the correct choice for architecture.

Developers make decisions every day without realizing it. Solving that requirement by adding a little more code to the monolith and a new table in the DB is a choice, even though it may not be perceived as such. When you have an agile architecture, it’s easy to create (and maintain!) a new server or  middleware to adress the requirement in a better way.

Docker on Mac OS X using CoreOS

Docker is on everybodys lips these days. It’s an open-source software project that leverages from Linux kernel resource isolation to allow independent so called containers to run within a single Linux instance, thus avoiding overhead of virtual machines/hypervisors while still offering full container isolation. Docker is therefore a feasible approach for automated and scalable software deployments.

Many developers (including myself) are nowdays developing on Mac OS X, which is not Linux. It is however possible to use Docker on OS X but one should be aware of what this implies. As OS X is not based on Linux and therefore lacks the kernel features which would allow you to run Docker containers natively on your system, you still need to have a Linux host somewhere. Docker provides you with something called boot2docker which essentially is a Linux distribution (based on Tiny Core Linux) built specifically for running Docker containers.

In case you want to use a more general Linux VM, if you want to use it for other tasks than just running Docker containers for instance, an alternative for boot2docker is to use CoreOS as your Docker host VM. CoreOS is quite lightweight (but is obviously bigger than boot2docker) and comes bundled with Docker. Setting up a fresh CoreOS instance to run with Vagrant is easy:

mkdir ~/coreos
cd ~/coreos
echo 'VAGRANTFILE_API_VERSION = "2"
Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
config.vm.box = "coreos"
config.vm.box_url = "http://storage.core-os.net/coreos/amd64-generic/dev-channel/coreos_production_vagrant.box"
config.vm.network "private_network",
ip: "192.168.0.100"
end' > Vagrantfile
vagrant up
vagrant ssh
core@localhost ~ $ docker --version
Docker version 0.9.0, build 2b3fdf2

Now you have a CoreOS Linux VM available which you can use as your Docker host.

If you want to mount a directory to be shared between OS X and your CoreOS host, just add the following line with the proper existent paths in the Vagrantfile:
config.vm.synced_folder "/Users/tommy/share", "/home/core/share", id: "core", :nfs => true, :mount_options => ['nolock,vers=3,udp']

Happy hacking!

Slimmed down immutable infrastructure

Last weekend we had a hackathon at Diabol. The topics somehow related to DevOps and Continuous Delivery. My group of four focused on slim microservices with immutable infrastructure. Since we believe in automated delivery pipelines for software development and infrastructure setup, the next natural step would be to merge these two together. Ideally, one would produce a machine image that contains everything needed to run the current application. The servers would be immutable, since we don’t want anyone doing manual changes to a running environment. Rather, the changes should be checked in to version control and a new server would be created based on the automated build pipeline for the infrastructure.

The problem with traditional machine images running on e.g. VMware or Amazon is that they tend to very large in size, a couple of gigabytes is not an unusual size. Images of that size become cumbersome to work with as they take a long time to create and ship over a network. Therefore it is desirable to keep server images as small as possible, especially since you might create and tear down servers ad-hoc for e.g. test purposes in your delivery pipeline. Linux is a very common server operating system but many Linux distributions are shipped with features that we are very unlikely to ever be using on a server, such as C compilers or utility programs. But since we adopt immutable servers, we don’t even need things as editors, man pages or even ssh!

Docker is an interesting solution for slimmed down infrastructure and full stack machine images which we evaluated during the hackathon. After getting our hands dirty after a couple of hours, we were quite pleased with its capabilities. We’ll definitely keep it on our radar and continue with our evaluation of it.

Since we’re mostly operating in the Java space, I also spent some time looking at how we could save some size on our machine images by potentially slimming down the JVM. Since a delivery pipeline will be triggered several times a day to deploy, test etc, every megabyte saved will increase the pipeline throughput. But why should you slim down the JVM? Well the JVM also contains features (or libraries) that are highly unlikely to ever be used on a server, such as audio, the awt and Swing UI frameworks, JavaFX, fonts, cursor images etc. The standard installation of the Java 8 JRE is around 150 MB. It didn’t take long to shave off a third of that size by removing libraries such as the aforementioned ones. Unfortunately the core library of Java, rt.jar is 66 MB of size, which is a constraint for the minimal possible size of a working JVM (unless you start removing the class files inside it too). Without too much work, I was able to safely remove a third of the size of the standard JRE installation, landing on a bit under 100 MB of size and still run our application. Although this practice might not be suitable for production use of technical or even legal reasons, it’s still interesting to see how much we typically install on our severs although it’ll never be used. The much anticipated project Jigsaw which will introduce modularity to Java SE has been postponed several times. Hopefully it can be incorporated into Java 9, enabling us to decide which modules we actually want to use for our particular use case.

Our conclusion for the time spent on this topic during the hackathon is that Docker is an interesting alternative to traditional machine image solutions, which not only allows, but also encourages slim servers and immutable infrastructure.

Tommy Tynjä
@tommysdk