Blessed are the diffs: for they shall inherit the earth
Recently I’ve been working more with the sophisticat that is Docker and it hasn’t escaped me that the foundations of the DevOps world is essentially
comprised of layer after layer of diffs.
For those readers who aren’t hardcore hackers a diff in back-in-the-day Unix terms simply means a difference. At a glance, as a Unix utility at least, it seems to have been around since the 1970s. The command cleverly allows you to compare files or directories so it’s easier to spot any differences between them. All modern-day Linux boffins will attest I’m sure to the fact that it’s still a highly useful command which saves the day frequently (if you’re curious the GNU version can be found here: GNU diff).
Of course any self-respecting coder will have been using revision control software for years. There are several such as the super-popular git, partly written of course by the venerable Linus Torvalds himself.
As a coder once you have performed a commit (or save) of your first version of a new piece of code, whether that be one or a thousand lines long, when using git its clever software repositories will only save the difference between that first version and any future version which you then commit.
By only dealing with diffs this process becomes uber-efficient, meaning that restoring previous versions can be done at breakneck speed and as you’d imagine storing even hundreds of thousands of lines of precious code into your repositories is even kind to disk space.
For the uninitiated somewhat surprisingly Docker doesn't work too differently. Its inherent layering model affords Docker images the luxury of being lightweight and exceptionally performant and to my mind at least the construction of Docker images is a thing of beauty.
Once a base layer has been decided upon for direct download or adjusted to your liking (such as Debian's) then with a little tweaking it’s perfectly possible to run your customised applications using an unfathomably thin slice of disk space on top of that base layer. There’s no gold stars being handed out for immediately guessing how that might work.
Correct, the intelligent Docker to all intents and purposes also uses diffs. Whenever you make a change to an already existing image you’re effectively adding a layer to Docker which sits on top of any existing layers. If you’re generating too many layers to keep track of then a simple way of reducing the number of layers for simplicity is by chaining commands together. For example these two commands, without the two ampersands chaining them together, would otherwise be two different layers as they’re two distinct adjustments to the underlying layer(s):
$ apt-get update && echo “Chris says hello”
This layering model dramatically reduces the amount of detail that Docker needs to remember, and of course by that I actually mean save to disk. When there’s a few Debian containers residing on a host Docker treats the base layer as a dependency and effectively makes the other changes which are found within the diffs as the container is launched. By way of an example one base layer would serve your Web, database and SMTP servers as three distinct containers with a few hundred megabytes of diffs being the only difference between them in total.
A story for another day is how Copy-On-Write (COW) works with Docker images but aside from that complexity the undeniably excellent layering model employed by Docker is remarkably simple.
Just like the super-slick git and lightning-fast Docker the next time you approach a complex problem I would encourage you to flex your lateral thinking muscles before meekly committing to a decision.
Simplicity after all is key in this brave new world.
If you've enjoyed reading the technical content on this site then please have a look at my Linux books which were both published in 2016 and some of my articles in Linux Magazine and Admin Magazine are available on their respective websites.