A Taxonomy of Code and Comments

Last week we had an excellent discussion in the C++ professionals group on LinkedIn (update: and now HN, as well) about my last post on trying to comment code less and make it more self-documenting. Thank you to everyone who contributed to the discussion. I really enjoyed reading everyone's perspective and debating the merits of commenting code. One commenter even pointed out the one line of code that I was still uncomfortable about in my example. I applaud such attention to detail when reading an article!

Forming Comment Camps

Over the course of the discussion I could see three distinct camps emerge that differed in what they thought of commenting based on the experiences they have had programming.

The No Comment camp had seen too many worthless or misleading comments in their day and recommended trying to use less comments and more self-documenting code.
The Document! camp had found documentation comments extremely helpful when using APIs and stressed the importance of documenting interfaces and providing references for algorithms.
The Explain Intent camp had run into too much confusing and convoluted code in the past and either wished it had been more well explained in comments or were thankful that it had been, as the case may be.

All of these ideas have merit, and there is much overlap and nuance involved in the arguments for and against each one. However, those arguments lay mostly in the grey area between these camps where combinations of bad code and bad comments make things interesting and programmers' lives difficult.

My personal experience led me to the No Comment camp, as I showed with an example from the code base that I've been working on for the past year and a half. In it there were nine comments: Six of them were worthless, one was attached to code that I eliminated, one was a question about whether the following code was necessary, and one was the answer to that question stated poorly and in the wrong place. That's essentially 100% bad comments in a little bit more than 50 lines of code. Since I did move and reword the one poorly stated comment, I pruned the code of nearly 90% worthless comments and, I think, made it much more readable in the process.

Of course, this is a small section of code, but it is fairly representative of the code base I'm working on. I didn't have to look too hard to come up with an example. I grabbed the current code I was refactoring. Now that doesn't mean other code bases are the same. In fact, I am certain they are different, and that is where most of the differences in the opposing camps comes from. The rest could be chalked up to differences of opinion and personal style.

The Comment-Code Taxonomy

These three camps can be put into a larger collection of comment-code types that make up a taxonomy. Let's think of code as being either Good, Bad, or Ugly. Good code is clean, self-explanatory, and self-documenting with meaningful variable and function names. Bad code is buggy, wasteful, or under-performing; it's not right, and it needs work. Ugly code is confusing or convoluted; it's working, but it makes you want to tear out your hair - or your eyes.

Comments could also be categorized as Good, Bad, or Ugly, too. Good comments clearly explain the intent of the code and answer why the programmer chose to do things the way they did. They can also provide references and document interfaces for other programmers to easily use. Bad comments are the misleading or flat-out wrong comments that do more harm than good. Ugly comments are irrelevant or redundant because they restate what the code already says. We need a fourth category here for Nonexistent comments - pretty self-explanatory.

So this comment-code taxonomy can show where any particular piece of code falls within the landscape and how you could improve the code to get it to one of the three camps:

	Good Code	Bad Code	Ugly Code
Good Comment	Document! camp	Fix the code	Explain Intent camp
Bad Comment	Fix the comment	Fix the code and the comment	Fix the comment and maybe the code
Ugly Comment	Remove the comment	Fix the code and remove the comment	Maybe fix the code; remove or improve the comment
Nonexistent Comment	No Comment camp	Fix the code	Fix the code or add a comment

My previous post focused on the reasons to work towards good code without comments, but the landscape is much more vast than that. There are excellent reasons to make any one of the camps the goal for the code you're working on. For the Document! camp, if the code is an interface that needs to be documented for your users, or the ideas came from somewhere else and should be credited, then good comments should be written for that code. Header files should be well-documented, and they tend to be the main place for these types of comments. However, comments in the implementation code can be an indication that a function is needed there instead, and the comments can be moved to the header file along with the function declaration.

For the Explain Intent camp, there are numerous reasons why we would decide to put up with ugly code. The code base may be restricted to changes. We may be coding around a bug in a library. We may be working under time constraints that don't allow for significant code refactoring. Or we may not be able to find that clean, self-documenting way to express the code that would preclude a comment. In those cases the comment should be there and it should be good.

One last type of comment that falls somewhere in between ugly and nonexistent is the TODO comment. These comments are extremely useful for remembering what still needs to be done while you're in the middle of refactoring. I usually litter the code with TODO comments while initially working out what will be refactored and then make sure they're cleaned up at the end with a simple search through the code. They would be truly ugly if left in production code, so they should be nonexistent by the time you release.

Those are the special cases. As Don Norman addresses in The Design of Everyday Things, they may seem to be extremely frequent and are easy to remember because they are exceptional, but how often are we working on public interfaces or dealing with ugly code in frozen code bases? I'm sure some programmers do, but, even for them, not all the time. Otherwise, what are we doing spending all of our time staring at code we can't change? In all other cases - the common cases - I would still recommend doing what you can to make the code expressive enough to not need comments. Programming is an intricate puzzle, and we can use all the hints and guidance we can get when reading other people's code, or our own. But those hints don't need to be in ancillary comments when they can be directly imbedded in the code.

Besides, if the code is so difficult to understand that a comment is necessary, what makes you think that the comment will be any more understandable than the code? I suppose it could happen, but I've actually never seen Ugly code with Good comments. Ugly code and Ugly comments both betray a lack of understanding, and they tend to stay together, if there are comments at all. If you've managed to express the intent adequately in a comment, many times the way to make the code better becomes blatantly obvious. That knowledge should be rolled back into the code instead of left hanging in a comment.

Make no mistake, writing good comments is hard - probably as hard as writing good code - because in both cases you have to clearly understand what you are trying to do and how you are expressing that in code. But, why spend the time writing good comments when you could spend that time writing better code? It will improve both your programming skills and your code comprehension skills. It will stretch your abilities, and in the process, your mind, so that thinking in code becomes more natural over time. Self-documenting code should always be the goal. Comments are the exception when we fail to attain it.

Lucid Mesh

Search This Blog