Distributed version control systems

Distributed version control systems (DVCS) has been around for many years already, and is increasing in popularity all the time. There are however many projects that are still using a traditional version control system (VCS), such as Subversion. I have until recently, only been working with Subversion as a VCS. Subversion sure has its flaws and problems but mostly got the job done over the years I’ve been working with it. I started contributing to the JBoss ShrinkWrap project early this spring, where they use a DVCS in form of Git. The more I’ve been working with Git, the more I have been aware of the problems which are imposed by Subversion. The biggest transition for me has been to adopt the new mindset that DVCS brings. Suddenly I realized that my daily work has many many times been influenced on the way the VCS worked, rather than doing things the way that feels natural for me as a developer. I think this is one of the key benefits with DVCS, and I think you start being aware of this as soon as you start using a DVCS.

While a traditional VCS can be sufficent in many projects, DVCSs brings new interesting dimensions and possibilites to version control.

What is a distributed version control system?
The fundamental of a DVCS is that each user keeps an own self-contained repository on his/her computer. There is no need to have a central master repository, even if most projects have one, e.g. to allow continuous integration. This allows for the following characteristics:

* Rapid startup. Install the DVCS of choice and start committing instantly into your local repository.
* As there is no need for a central repository, you can pull individual updates from other users. They do not have to be checked in into a central repository (even if you use one) like in Subversion.
* A local repository allows you the flexibility to try out new things without the need to send them to a central repository and make them available to others just to get them under version control. E.g. it is not necessary to create a branch on a central server for these kind of operations.
* You can select which updates you wish to apply to your repository.
* Commits can be cherry-picked, which means that you can select individual patches/fixes from users as you like
* Your repository is available offline, so you can check in, view project history etc. regardless of your Internet connection status.
* A local repository allows you to check in often, even though your code might not even compile, to create checkpoints of your current work. This without interfering with other peoples work.
* You can change history, modify, reorder and squash commits locally as you like before other users get access to your work. This is called rebasing.
* DVCSs are far more fault-tolerant as there are many copies of the actual repository available. If a central/master repository is used it should be backed up though.

One of the biggest differences between Git and Subversion which I’ve noticed is not listed above and is the speed of the version control system. The speed of Git has really been blowing me away and in terms of speed, it feels like comparing a Bugatti Veyron (Git) with an old Beetle (Subversion). A project which would take minutes to download from a central Subversion repository is literally taking seconds with Git. Once, I actually had to investigate that my file system acutally contained all the files Git told me it downloaded, as it went so incredibly fast! I want to emphasize that Git is not only faster when downloading/checking out source code the first time, it also applies to commiting, retrieving history etc.

Squashing commits with Git
To be able to change history is something I’ve longed for in all these years working with Subversion. With a DVCS, it is possible! When I’ve been working on a new feature for instance, previously I’ve ususally wanted to commit my progress (as checkpoints, mentioned above) but in a Subversion environment this would screw things up for other team members. When I work with Git, it allows me the freedom to do what I’ve wanted to do during all these years, committing small incremental changes to the code base, but without disturbing other team members in their work. For example, I could add a new method to an interface, commit it, start working on the implementation, commit often, work some more on the implementation, commit some more stuff, then realize that I need to rethink some of the implementation, revert a couple of commits, redo the implementation, commit etc. All this without disturbing my colleagues working on the same code base. When I feel like commiting my work, I don’t necessarily want to bring in all small commits I’ve made at development time, e.g. just adding javadoc to a method in a commit. With Git I can do something called squash, which means that I can bunch commits together, e.g. bunch my latest 5 commits together to a single one, which I then can share with other users. I can even modify the commit message, which I think is a very neat feature.

Example: Squash the latest 5 commits on the current working tree
$ git rebase -i HEAD~5

This will launch a VI editor (here I assume you are familiar with it). Leave the first commit as pick, change the rest of the signatures to “squash”, such as:

pick 79f4edb Work done on new feature
pick 032aab2 Refactored
pick 7508090 More work on implementation
pick 368b3c0 Began stubbing out interface implementation
pick c528b95 Added new interface method

to:

pick 79f4edb Work done on new feature
squash 032aab2 Refactored
squash 7508090 More work on implementation
squash 368b3c0 Began stubbing out interface implementation
squash c528b95 Added new interface method

On the next screen, delete or comment all lines you don’t want and add a more proper commit message:

# This is a combination of 5 commits.
# The first commit's message is:
Added new interface method
# This is the 2nd commit message:
Began stubbing out interface implementation
...

to:

# This is a combination of 5 commits.
# The first commit's message is:
Finished work on new feature
#Added new interface method
# This is the 2nd commit message:
#Began stubbing out interface implementation
...

Save to execute the squash. This will leave you with a single commit with the message you provided. Now you can just share this single commit with other users, e.g. via push to the master repository (if used).

Another interesting aspect of DVCSs is that if you use master repository, it won’t get hit that often since you execute your commits locally before squashing things together and send them upstream. This makes DVCSs more attractive from a scalability point of view.

Summary
A DVCS does not enforce you to have a central repository and every user has its own local repository with full history. Users can work and commit locally before sharing code with other users. If you haven’t tried out DVCS yet, do it! It is actually as easy as stated earlier: Download, install and create your first repository! The concepts of DVCS may be confusing for a non-DVCS user at first, but there are a lot of tutorials out there and “cheat sheets” which covers the most basic (and more advanced) tasks. You will soon discover many nice features with the DVCS of your choice, making it harder and harder to go back to a traditional VCS. If you have experience from DVCSs, please share your experiences!

Tommy Tynjä
@tommysdk

One thought on “Distributed version control systems”

  1. Very nice post. Something that really surprised me was how easy is to switch branches! Even if you are not using any plugin in Eclipse. You just switch the branch, refresh the project folder from the IDE and you are in a whole different branch, so you dont need to download the branch again.

Comments are closed.