The Fundamentals of Version Control
What's that commit thing all about?

29 November 2017

In several of my previous posts, I’ve mentioned version control. Usually, I’m writing about something that I’ve done using a popular version control program called Git. Version control is fairly fundamental to modern software engineering practices, so for those posts I’ve assumed that the reader already has a basic grasp of the core concepts. Today, I’m going to be taking a moment to explain what those core concepts are.

What is Version Control

Programming can be looked at as the process of creating and maintaining text files. A large program has a large number of text files with a lot of good stuff in them. Take a few people working on the same set of text files at the same time on different computers, and you have a recipe for utter chaos!

Luckily, the story doesn’t end there. There are programs for helping to manage this chaos. Version control is conceptually about tracking the changes that have been made to a file, combining changes made by multiple people, and keeping a record of what a program looked like at a certain point in time.

If you’re using a version control system, you can answer questions like these:

  1. What does the program look like if you combine the changes I just made with the ones that my co-worker just made?
  2. Was this bug in the version of the software that we released last week?
  3. Something is broken. It was working yesterday. What’s changed in the source code? Who made those changes so I can go ask them about it?
  4. On no, it was actually the changes that I made yesterday that broke everything. How do I undo my changes without affecting anyone else’s work?

Where do I Personally use Version Control?

As a professional software engineer, all of my software goes through some sort of version control workflow. For my hobby projects, I also use version control, and publish my projects on GitHub. The source code for this website (including all of the posts) are also just text files which I keep in version control.

I also use version control systems for other things. The tools are general purpose change tracking and syncing programs for working with files on a computer, typically with solid command line support and libraries that you can include in scripts. It can be used for incrementally backing up files. It can be used as the backbone for creating automated file-synchronization systems.

Important Version Control Concepts

So, what are the core concepts that make up a version control system?

Commits

In version control, the most basic building block is the commit. When you create a new commit, the version control system will take the changes you’ve made and save them to a database. It’s sort of like if you copied the whole folder and labelled the copy VERSION 0.01, except it’s much more efficient.

If you pretend that programming is a computer game, then creating a commit is like creating a new save game.

Commits will also have some extra information. They will store who created the commit, a short description of the changes, and a unique identifier so you can refer to it later.

/assets/posts/vc/commit.svg

Branches

Say that I have version 1 on my machine, and so does a co-worker. We both make unrelated changes, and make commits of those changes. The feature that I’m working on isn’t quite complete, so I don’t want my co-workers to have my changes yet, so we keep working separately and making commits until we’re ready to combine the changes again.

Those two separate lines of development are called branches. If a branch is going to stay around for a while, most version control systems will allow you to give it a name.

Usually, you’ll have one branch that is the main branch that all changes that you want to keep eventually end up in. Different version control systems have different default names for this main branch, but it’s typically something like “master” or “trunk”.

/assets/posts/vc/branch.svg

Repositories

Now that you have this collection of commits, and the commits are logically grouped into branches, you need to put them somewhere. We call the database that the commits are being written into a repository.

Typically, each project will have its own repository, but some companies prefer to put everything that they do into the same repository.

/assets/posts/vc/repository.svg

Merging

Eventually, after I’m ready to share my branch with everyone else, I need to combine it with the main branch. Combining two branches is called merging.

Version control systems typically automate as much of the merging as possible. For example, if I add file a.txt on my branch, and someone else adds b.txt, then the version control system can figure out that it should just add both files.

If the version control system doesn’t know how to merge changes, for example if we both changed the same line in a.txt in different ways, then the version control system will ask for manual intervention. We call that situation where manual intervention is necessary a merge conflict.

/assets/posts/vc/merge.svg

Distributed vs Centralized

There are two popular paradigms when it comes to version control systems: either distributed systems or centralized systems. The difference is that in distributed systems, everybody has a complete copy of the repository. If you’re working with a distributed system, you can create commits, branch, and merge even if you’re offline.

In a centralized system, there is a single repository server somewhere, and every time you want to do something version control related you connect to that server.

For my own work, I tend to prefer the distributed systems. This could be in part because, in South Africa, the Internet connections can be a bit unreliable at times.

Syncing (Cloning, Pushing and Pulling)

If you’re working with a distributed version control system, then there’s an extra step to consider around sharing your changes with others.

It isn’t enough to just make your commit, you also need to push it to the repository that the rest of your team is watching for changes.

When you join a new project, you will clone their repository, which is basically downloading a copy of it.

If you want to get changes that someone else has made, you need to fetch the changes, either from their repository or from a repository that they have pushed their changes to.

Often, you will find that you want to both fetch changes and merge them into your own branch. We call that pulling changes.

This Sounds Fantastic, Where Can I Get One?

There are many different options for getting started with version control, so I’m going to give a simple recommendation.

Firstly, try using Git as your first version control system. It’s widely used in open source projects, so if you need to ask questions there will be many people able to help you. A good place to start is reading through the free online book, Pro Git, which will help you to get started with actually using Git.

Secondly, if you’re working with somebody, then you’ll want to sign up for a Git repository in the cloud. GitHub and BitBucket are both excellent options for getting started. They both have free options for if you’re working on open source projects. BitBucket also lets you have free private repositories if your team is small.

Using version control can initially seem a bit strange and arcane, but once you get used to having it you’ll never want to be on a project without it again.