Saturday, February 26, 2005

On Good Code

A long time ago, I believe in a back issue of Compute! magazine, I came across an article by an author I can no longer name about writing code. This, by the way, were in the final days when magazines would publish the source code of entire programs (usually ported to several distinct personal computers), and readers would type them in. The article described three basic criteria for good code: clarity, brevity, and speed.

Clarity refers to how easily your code can be understood. There are at least two factors today that make this even more important. One, software projects are much bigger. As of 2001 or so, Linux consisted of about 30 million lines of code. Windows XP hits about 40 million. There should be no doubt in any programmer's mind that nobody will understand it all. Two, few engineers ever work for the same company, much less on the same project, for their entire careers anymore. It is a virtual certainty that any code that lasts more than three years will change hands, often with very short notice.

Brevity refers to how small your code is. On the face of it, this would seem unimportant in the age of hard disks in the hundreds of GB, and RAM in the GB range.

Speed refers to how quickly your code runs. Similarly, this seems less important today in the face of GHz processors, although there are still lots of processors running in the tens or low hundreds of MHz.

The job of the programmer, then, is to build software that strikes the right balance among the three. Principles are nice, but it's not enough to go on. How do we actually monitor and change the way we write code? Here are some concrete details:

Clarity
Code should be unsurprising. What it reads like it should do is what it should do. You're trying to be understood, not showing off how much smarter you are than the person who inherits your code. Unsurprising code comes primarily from following idioms in the language. We are all familiar with language features like for loops and if statements, but those are just analogues of words in a human language. An idiom is a well-known and complete thought, such as loop that iterates through every element of an array. Every language has its idioms, and it's important to learn to write in them. For example, in C, such a loop is written as:



for (i = 0; i < max; i++) {
a[i] = b[i] * c[i];
}



If you change that to for (i = 1; i <= max; i++), then you've broken the idiom and opened up a possible off-by-one bug later. Nobody cares how it's done in your favorite language. If you're writing Fortran, write in Fortran. If you're writing C, write in C.

Idioms do not end at that level. A state machine, for example, is also a computer science idiom. There are several ways to design and implement such machines, but your readers will expect one of a handful. Follow rules and conventions, even when they seem arbitrary. State machines, by the way, are customarily expressed as nested switch statements, an array of function pointers indexed by state and event, or a variation. These sorts of well-known code structures are sometimes referred to as "patterns".

You should also document your code. I personally don't see the point of documenting the inputs and outputs of every last function, because that becomes a burden for maintenance and may even have negative value if it becomes wrong. As a maintenance programmer, what I generally find is that I can easily understand what the code is doing, but not what it was meant to do, or why we're trying to do that. Well-written code is indeed self-documenting in the sense that any reader competent in the language should know what it is doing. But it's much harder to figure out what the original programmer wanted to do and why, so that's what deserves a couple of lines of text.

Remember always that your writing has two primary audiences: the maintenance programmer who will inevitably have to change something, and the compilers that have to translate it.

Brevity
Why is brevity still important in this age of cheap memory? For one, if you work in embedded systems, then not having enough memory is still a fact of life. Even if you didn't, how concisely you can express your thoughts is a very good indicator of your skills. If you didn't understand either the problem or your solution to it, then your code will be redundant, full of special cases, and probably buggy. I'm not calling for terse code or short variable names at all, but code that is direct and to-the-point. Redundant code are harder to optimize, and will frequently contain the same bug in several different forms repeated all over the place.

The lesson is unsurprising, but bears repeating. Separate complex tasks into small, clearly-defined ones. If you find that you're having trouble thinking of an appropriate function name, it's probably doing too much. If you find that you're about to name it processThing(), you're in some trouble. The inability to pick a good name is a sign that you don't know what it's supposed to do.

When I was building Nonplus, a clone of Boggle, I had to include a list of English dictionary words to score the game. With a little bit of thought, I was able to store the 74,742 word dictionary in an average of 2.6 bytes per word. This is about 10% smaller than what gzip could do, and nearly 25% smaller than bzip2, yet the algorithm I used can be described in about two paragraphs. It's not so relevant in this particular example, but in a real project being able to avoid using large complex algorithms or libraries can save you from plenty of bugs. Complexity and mess are your enemies.

Speed
As with the concept of brevity, writing fast code seems like such an ancient concept. Indeed, we are well past the age of writing everything in assembly language. The best advice for programmers today remains to optimize only when necessary, and only after profiling to identify the "hot spots". Most projects settle for "fast enough" instead of "fastest possible", and that's a good thing. Optimizations lead to many bugs, and usually late in the project.

However, that doesn't mean we should simply write sloppy code. For example, this is a common beginner C error:


for (i = 0; i < strlen(s); i++) {
s[i] = toupper(s[i]);
}


The problem here is that the compiler generally can't know that the strlen(s) is invariant, and so will compute the length of the string s in every iteration of the loop. What is an O(n) algorithm became an O(n2) algorithm. This is something I've seen in production code, and there is no excuse for this because the fix is so trivially simple.

I hope I've driven the point home. The three criteria of clarity, brevity, and speed remain relevant today even after processors have increased in speed literally more than a thousand-fold since that article was written. What does change is the relative weight a programmer should place upon each criteria, but what constitutes good code hasn't changed that much since.

Tuesday, February 22, 2005

Unintelligent by Design?


Some political skirmishes have been launched in the US by advocates of "Intelligent Design" against the teaching of Darwin's Theory of Evolution. The main contention of ID is that some organisms and organs are so complex that it could not have occurred as a result of random mutation. The remaining explanation, they point out, implies some form of design and some sort of designer, though ID advocates generally shy away from identifying whether the designer is the God of any particular religion.

I had originally thought that the fallacy of ID could not be challenged, because was not even science: it assumes far beyond what it can even begin to prove. It was religion using pseudo-scientific terms as far as I was concerned, so could not really be debunked by science just as religion could never be. I was wrong, as Jim Holt writes in Unintelligent Design:

While there is much that is marvelous in nature, there is also much that is flawed, sloppy and downright bizarre. Some nonfunctional oddities, like the peacock's tail or the human male's nipples, might be attributed to a sense of whimsy on the part of the designer. Others just seem grossly inefficient. In mammals, for instance, the recurrent laryngeal nerve does not go directly from the cranium to the larynx, the way any competent engineer would have arranged it. Instead, it extends down the neck to the chest, loops around a lung ligament and then runs back up the neck to the larynx. In a giraffe, that means a 20-foot length of nerve where 1 foot would have done. If this is evidence of design, it would seem to be of the unintelligent variety.


Holt then goes into further detail, citing factual evidence such as high mortality rates, useless (as warning, because they are already too late) pains caused by terminal cancer, and extinction rates that all point to a horribly inefficient designer if one does exist. This is a designer who threw away more than two-thirds of conceptions, inflicted pain needlessly, and discarded perhaps 99% of species that has ever existed.

This is a most beautiful line of argument. ID advocates assume the presence of a designer because of complexity, and therefore must answer why this supposedly intelligent designer is so bad at designing organisms that so few species are left. Holt is brilliant, because once you remove the "intelligent" from ID, it's terribly difficult to argue that there was some sort of design at all, which takes us right back to Darwin.

Monday, February 14, 2005

Barbarians here and there


What brutal times we live in, and I'm not even talking about Iraq.

A couple of days ago in Hwalien, a rural municipality of Taiwan in which my parents happen to reside, the friends of a grieving father who lost his six-month old son in a car accident dragged the driver of the other car out of the hospital, and beat him in front of his mother at the funeral home. The driver died later from the beating, although the mother of the dead infant was driving without a license, and did not secure the child in a safety seat as required by law.

Yesterday, the Abu Sayyaf claimed responsibility for three bombings, one on a bus near the high school I attended. The Makati area is probably the richest area in the entire Philippines, but the wealthy people would never take a bus. The likely victims are blue collar workers and lower level white collar workers. People just trying to get by in a metropolis. The group's spokesperson said on radio, "this is our Valentine's gift to the President." It's hard to find a trace of humanity in that glibness.

About 2,500 years after Confucius coined the Golden Rule, and some 2,000 years after Jesus asked his believers to love their neighbors, we remain brutal, uncompromising, and full of hate. When I was younger I was briefly but genuinely worried that the sun was going to extinguish itself and humanity would be doomed unless we managed to build giant space ships first. Today, I can't even identify with Hollywood movies that begin with the assumption that humanity is worth saving.

I don't think we'll be missed. Happy Valentine's Day.

Tuesday, February 8, 2005

Broken glass, broke and hungry


Our car (and at least two others on the lot) got broken into last night. Somebody smashed the rear window and took my gym bag (old shoes, old shorts, old shirt, a radio, and a gym card that will take US$5 to replace). Naturally, the police declined to even come out to take a look, which reminds me of the one time that somebody's car caught fire and we got "911, please hold."

We filed a police report, and our insurance company tells us we have a US$250 deductible, which is insurance speak for "call us only when really big things break." Amusingly, I was informed that the gym bag would be covered under homeowner's insurance if I had one. Even though it was lost while it was in my car.

This little incident took me about half an hour cleaning up, probably a couple of hours trying to get somebody to come fix it, not to mention about US$200 in glass repair, and benefited the thief or thieves probably to the tune of US$30, tops. They didn't even bother trying for the stereo. Talk about petty theft.

The lesson, of course, is not to leave anything in the car overnight.

Monday, February 7, 2005

Happy New Year


It's almost time for Chinese (lunar) new year again. To reduce the risk of fire, the National Fire Agency in Taiwan is providing free downloadable firecracker sound effects. I hate to make fun of an innovative idea, but at times you really wish culture can change more quickly. The little island of Taiwan is tightly packed with some 23 million people, so you'd think people would be more considerate of each other already.

Wednesday, February 2, 2005

Torvalds v. Tanenbaum Revisited


In early 1992, Professor Andrew Tanenbaum began what later became a series of back and forth comments on the merits of the infant Linux operating system. Linus Torvalds, creator of Linux, obliged the professor with responses, and I thought it might be interesting to revisit the old conversation and take a look at what has happened since.

Tanenbaum's criticism centered on two main points: Linux was monolithic, and it was not portable. Torvalds admitted readily that the monolithic architecture was not aesthetically pleasing, but felt that having something working was more important. On the portability issue, Torvalds basically thought that it was not important. This is evident in:

There is no idea in trying to make an operating system overly portable: adhering to a portable API is good enough. The very /idea/ of an operating system is to use the hardware features, and hide them behind a layer of high-level calls.


and in a later posting:

Simply, I'd say that porting is impossible. It's mostly in C, but most people wouldn't call what I write C. It uses every conceivable feature of the 386 I could find, as it was also a project to teach me about the 386.


As of September 2004, however, Linux has been ported to 19 processor families1, including ones that are very different from the 80386. The Debian distribution has released ports of Linux (along with thousands of applications written for Linux) to 10 processor families2. Today, Linux is used in everything from internet servers to small embedded systems. This means that modern Linux did become very portable, but it doesn't mean that Tanenbaum was right. If Linux had tried to be portable initially, it might not be where it is today. I'm simply not interested in divining alternate realities here, so this isn't about whose decision should've been followed at the time, but whose opinions more accurately predicted where Linux would be or would need later.

What about the question of monolithic architecture? Well, about three years later, circa version 1.2, loadable kernel modules3 were introduced into Linux. The kernel modules are small chunks of code that could be connected and disconnected from the rest of the kernel while the system is running. Initially the work centered on the more obviously modular parts like device drivers, but by 2000 large parts of the kernel could be loaded and unloaded at run-time. Linux is still not a microkernel by any definition, but its support of kernel modules gives it many of the advantages usually associated with microkernels. It would seem that Tanenbaum was right again here that a strictly monolithic architecture wouldn't cut it.

But the professor wasn't always right. What Tanenbaum failed to see when he wrote:

If a group of people wants to do this, that is fine. I think co-ordinating 1000 prima donnas living all over the world will be as easy as herding cats [...] Anyone who says you can have a lot of widely dispersed people hack away on a complicated piece of code and avoid total anarchy has never managed a software project.


was the remarkable organization skill and charisma that Torvalds was able to muster. I have no firsthand knowledge of the temperament of Linux kernel contributors, but the clear fact after all these years is that Torvalds did what Tanenbaum would not. Tanenbaum's Minix declined many contributions because he wanted to keep it simple enough for students, and in the meantime Torvald's Linux embraced contributions and is now an industry workhorse. This cannot have happened entirely by accident. Torvalds must have been the right person for the job, even though I don't think he's ever managed a software project before Linux.

As I mentioned, I have no interest in keeping score in this old debate, but perhaps we can learn lessons from this history. The two major criticisms offered by a professor were ultimately fixed in due time, but I suspect even the good professor could not say what would've happened if Torvalds had listened to everything he said. The world needs young people (that is, Torvalds at the time is younger than I am now) to strike out on their own paths, but also the guidance of maturity. There's also something to be said of extremes. As with the CISC v. RISC debate, the better path turned out to lay somewhere in the middle. It is clear today that a microkernel was not necessary, however irksome Linux is to the beholder. It is also clear that while the heavy 80386 orientation was ultimately misdirected, it also wasn't nearly bad enough to sink Linux.