No Longer My Favorite Git Commit
Six years ago, David Thompson wrote a popular blog post called “My favourite Git commit” celebrating a whimsically detailed commit message his co-worker wrote. I enjoyed the post at the time and have sent it to several teammates as a model for good commit messages.
I recently revisited Thompson’s article as I was creating my own guide to writing useful commit messages. When pressed to explain what made Thompson’s post such an effective example, I was surprised to find that I couldn’t. It was fun to read as an outside observer, but I couldn’t justify it as a model of good software engineering.
Thompson’s favorite commit 🔗︎
Here’s the commit message that so enamored Thompson and others at the time, including me:
Convert template to US-ASCII to fix error 🔗︎
I introduced some tests in a feature branch to match the contents of
/etc/nginx/router_routes.conf
. They worked fine when run withbundle exec rake spec
orbundle exec rspec modules/router/spec
. But when run asbundle exec rake
each should block failed with:ArgumentError: invalid byte sequence in US-ASCII
I eventually found that removing the
.with_content(//)
matchers made the errors go away. That there weren’t any weird characters in the spec file. And that it could be reproduced by requiring Puppet in the same interpreter with:rake -E 'require "puppet"' spec
That particular template appears to be the only file in our codebase with an identified encoding of
utf-8
. All others areus-ascii
:dcarley-MBA:puppet dcarley$ find modules -type f -exec file --mime {} \+ | grep utf modules/router/templates/routes.conf.erb: text/plain; charset=utf-8
Attempting to convert that file back to US-ASCII identified the offending character as something that looked like a whitespace:
dcarley-MBA:puppet dcarley$ iconv -f UTF8 -t US-ASCII modules/router/templates/routes.conf.erb 2>&1 | tail -n5 proxy_intercept_errors off; # Set proxy timeout to 50 seconds as a quick fix for problems # iconv: modules/router/templates/routes.conf.erb:458:3: cannot convert
After replacing it (by hand) the file identifies as
us-ascii
again:dcarley-MBA:puppet dcarley$ file --mime modules/router/templates/routes.conf.erb modules/router/templates/routes.conf.erb: text/plain; charset=us-ascii
Now the tests work! One hour of my life I won’t get back..
The “punchline” is that, after this lengthy preamble, Thompson shows the actual diff:
Yes, the commit message contains six paragraphs and five code snippets, all to describe a one-character whitespace change.
Favorite != best 🔗︎
It’s easy to see what’s appealing about this commit.
Most developers would document this change as simply “Fix whitespace character,” so it’s a pleasant surprise that someone went to such lengths to explain their process for investigating and fixing the bug.
It is a good commit message for all the reasons Thompson offers: it creates a searchable artifact and shares helpful insights about the developer’s tools and process.
This is not an attack on Thompson or even the original author of the commit. Thompson never claimed it was the “best” commit message, just his favorite.
That said, I now see flaws that prevent me from using it as a model commit message.
It buries the most important information at the end 🔗︎
When Thompson originally published his blog post, one of the most common critiques was that the commit message was too verbose. I found that criticism misguided.
Thorough details in a commit message are useful as long as they’re relevant, and Thompson’s were. They’d help less experienced teammates learn the author’s debugging process and toolset. They’d also give more experienced teammates an opportunity to see if the developer overlooked something or is unaware of a relevant tool.
The reason people perceived Thompson’s example as overly verbose is that it buries the most important information deep into the commit message.
Re-read the first paragraph:
I introduced some tests in a feature branch to match the contents of
/etc/nginx/router_routes.conf
. They worked fine when run withbundle exec rake spec
orbundle exec rspec modules/router/spec
. But when run asbundle exec rake
each should block failed with:ArgumentError: invalid byte sequence in US-ASCII
Three sentences and a code snippet into this commit message, and the reader still doesn’t have any information about what the change actually does.
Commit messages should present the most important information first and gradually transition to finer details. Journalists call this the inverted pyramid style of writing.
Journalists structure news reports in an inverted pyramid, where the information relevant to the most people is at the top.
If I’m scrolling through a commit history, I want to find out quickly if each commit is relevant. The commit message should provide a high-level summary of the change right from the start.
It never quite explains the problem 🔗︎
By the end of Thompson’s example commit message, do you even understand the change?
Here’s the closest it comes to explaining the problem:
That particular template appears to be the only file in our codebase with an identified encoding of utf-8. All others are us-ascii:
dcarley-MBA:puppet dcarley$ find modules -type f -exec file --mime {} \+ | grep utf modules/router/templates/routes.conf.erb: text/plain; charset=utf-8
The message says that routes.conf.erb
has UTF-8 encoding, but it never explains why. Fortunately, the project is open-source, so I can investigate myself.
The issue is on line 463 of routes.conf.erb
:
$ cat modules/router/templates/routes.conf.erb | head -n 463 | tail -n 1
# where civica QueryPayments calls are taking too long.
You can’t see the issue with a regular text editor or web browser, but if you dump the raw file bytes with a tool like xxd
, you see the issue:
$ cat modules/router/templates/routes.conf.erb \
| head -n 463 | tail -n 1 \
| xxd | head -n 1
00000000: 2020 23c2 a077 6865 7265 2063 6976 6963 #..where civic
^^ ^^
If you haven’t memorized the US-ASCII and UTF-8 tables, here are the first few characters of that line:
Byte representation | Text representation |
---|---|
0x20 | ' ' (space) |
0x20 | ' ' (space) |
0x23 | '#' |
0xc2 0xa0 | ' ' (UTF-8 non-breaking space) |
So, the file had the byte sequence 0xC2 0xA0
, which means it can’t be a US-ASCII file, as 0xC2
and 0xA0
both fall outside of the US-ASCII byte range.
The 0xC2 0xA0
sequence means any application that consumes routes.conf.erb
must interpret it with UTF-8 encoding, a newer and more internationally-friendly scheme for encoding text.
Thompson’s codebase used Ruby 1.9.3, which supported UTF-8 encoding, but it defaulted to US-ASCII if the file didn’t explicitly declare otherwise.
Digging through the source history, I found that commit 5a8607 originally introduced the UTF-8 character. That commit message doesn’t mention any reason for introducing the UTF-8 character, so it was likely an accident.
A Hacker News commenter floated a plausible theory about why that stray UTF-8 character appeared in routes.conf.erb
:
the likely origin of the invalid character is somebody using an Apple Ireland/UK keyboard layout where # is Option-3 (AltGr-3), and non-breaking space is Option-Space (AltGr-Space).
-messe on Hacker News
It references code without linking to it 🔗︎
Thompson’s example commit opens with a reference to some external code:
I introduced some tests in a feature branch to match the contents of
/etc/nginx/router_routes.conf
. They worked fine when run withbundle exec rake spec
orbundle exec rspec modules/router/spec
.
But the commit message never names the branch or specifies a commit hash, so the reader has no way to reproduce the developer’s findings.
Later, the commit message says:
I eventually found that removing the
.with_content(//)
matchers made the errors go away. I didn’t see any weird characters in the spec file.
Without a commit hash or link, the reader doesn’t know what matchers or which spec file the developer means.
If a commit message references external code, it should link to it explicitly so that code reviewers and future maintainers can see the exact context for the change.
My rewrite 🔗︎
Here’s my proposed revision of Thompson’s favorite git commit:
Convert routes.conf.erb template to US-ASCII 🔗︎
routes.conf.erb
has a stray UTF-8 character that seems to have been introduced by accident in 5a8607.
rake
expect US-ASCII format, so the single UTF-8 character inroutes.conf.erb
causes test failures inrake
.This change replaces the UTF-8 character with an equivalent US-ASCII character to prevent test failures in
rake
.The stray UTF-8 character 🔗︎
The issue is on line 463 of
modules/router/templates/routes.conf.erb
:$ cat modules/router/templates/routes.conf.erb \ | head -n 463 | tail -n 1 \ | xxd | head -n 1 00000000: 2020 23c2 a077 6865 7265 2063 6976 6963 #..where civic ^^ ^^
0xC2 0xA0
is not a valid US-ASCII byte sequence, but it’s the UTF-8 non-breaking space character. Any tool that reads the file expecting US-ASCII encoding will fail.How I discovered this 🔗︎
I introduced some tests in a feature branch to match the contents of
/etc/nginx/router_routes.conf
(see abcd123). They worked fine when I ran them withbundle exec rake spec
orbundle exec rspec modules/router/spec
, but when I ran the tests asbundle exec rake
, eachshould
block failed with:ArgumentError: invalid byte sequence in US-ASCII
I eventually found that removing the
.with_content(//)
matchers made the errors go away. I didn’t see any weird characters in the spec file. I could reproduce the error by requiring Puppet in the same interpreter with:rake -E 'require "puppet"' spec
That particular template appears to be the only file in our codebase that
file
identifies asutf-8
. All others areus-ascii
:$ find modules -type f -exec file --mime {} \+ | grep utf modules/router/templates/routes.conf.erb: text/plain; charset=utf-8
Attempting to convert that file back to US-ASCII identified the offending character as something that looked like a whitespace:
$ iconv -f UTF8 -t US-ASCII modules/router/templates/routes.conf.erb 2>&1 \ | tail -n5 proxy_intercept_errors off; # Set proxy timeout to 50 seconds as a quick fix for problems # iconv: modules/router/templates/routes.conf.erb:458:3: cannot convert
After I replaced the UTF-8 character (by hand),
file
identifiesroutes.conf.erb
asus-ascii
again:$ file --mime modules/router/templates/routes.conf.erb modules/router/templates/routes.conf.erb: text/plain; charset=us-ascii
Now the tests work! One hour of my life I won’t get back..
Here were my changes:
- I added a high-level summary early in the message.
- I added a more explicit explanation of the UTF-8 character and where it came from.
- I moved most of the author’s original content to a “How I found this” section to make it clear that it’s extra-credit reading.
- I made light grammatical fixes.
- I removed passive voice to reduce ambiguity.
- I simplified the terminal prompt from
dcarley-MBA:puppet dcarley $
to just$
, as the former is mostly noise.
Notably, I didn’t remove details, as the problem wasn’t verbosity but how the developer organized and presented information.
The value of defining your own principles 🔗︎
Revisiting Thompson’s post reminded me how much value there is in defining software engineering principles for yourself.
I accepted the commit as a good example because I agreed with Thompson about its strengths. It wasn’t until I sat down and defined what I think are the most important qualities in a commit message that I saw the shortcomings of Thompson’s example.
I’ve explained my perspective about several different software engineering practices over the years, and every time I do it, it makes me a better developer. It forces me to think critically about ideas I take for granted and helps me remember what my ideal looks like even if I’m not always able to achieve it.
- Reviewing code for teammates
- Sending out my code for review
- Writing unit tests
- Communicating with freelance software developers
- Writing software tutorials
Further reading 🔗︎
- “How to Write Useful Commit Messages” - My more detailed explanation of what I think makes a good commit message.
Excerpts from the govuk-puppet project are Copyright Crown Government Digital Service, used under the MIT License.
Pre-Order My Book
I'm writing a book to capture all the techniques for effective writing that I've learned in my fifteen years as a professional software developer and blogger.
My book will teach you how to:
- Create clear and pleasant software tutorials
- Attract readers and customers through blogging
- Write effective emails
- Minimize pain in writing design documents
If the book sounds interesting to you, consider pre-ordering a copy.
Be the first to know when I post cool stuff
Subscribe to get my latest posts by email.