John Carmack on Static code analysis

This is an mirror of a post from John Carmack. Recently I learned that his articles on #AltDevBlog are no longer acessible. So, in order to archive them, I am re-posting them here. These articles are definitely good reads and worth to be preserved.

The most important thing I have done as a programmer in recent years is to aggressively pursue static code analysis. Even more valuable than the hundreds of serious bugs I have prevented with it is the change in mindset about the way I view software reliability and code quality.

It is important to say right up front that quality isn’t everything, and acknowledging it isn’t some sort of moral failing. Value is what you are trying to produce, and quality is only one aspect of it, intermixed with cost, features, and other factors. There have been plenty of hugely successful and highly regarded titles that were filled with bugs and crashed a lot; pursuing a Space Shuttle style code development process for game development would be idiotic. Still, quality does matter.

I have always cared about writing good code; one of my important internal motivations is that of the craftsman, and I always want to improve. I have read piles of books with dry chapter titles like “Policies , Standards, and Quality Plans”, and my work with Armadillo Aerospace has put me in touch with the very different world of safety critical software development.

Over a decade ago, during the development of Quake 3, I bought a license for PC-Lint and tried using it – the idea of automatically pointing out flaws in my code sounded great. However, running it as a command line tool and sifting through the reams of commentary that it produced didn’t wind up winning me over, and I abandoned it fairly quickly.

Both programmer count and codebase size have grown by an order of magnitude since then, and the implementation language has moved from C to C++, all of which contribute to a much more fertile ground for software errors. A few years ago, after reading a number of research papers about modern static code analysis, I decided to see how things had changed in the decade since I had tried PC-Lint.

At this point, we had been compiling at warning level 4 with only a very few specific warnings disabled, and warnings-as-errors forced programmers to abide by it. While there were some dusty reaches of the code that had years of accumulated cruft, most of the code was fairly modern. We thought we had a pretty good codebase.

Coverity

Initially, I contacted Coverity and signed up for a demo run. This is serious software, with the licensing cost based on total lines of code, and we wound up with a quote well into five figures. When they presented their analysis, they commented that our codebase was one of the cleanest of its size they had seen (maybe they tell all customers that to make them feel good), but they presented a set of about a hundred issues that were identified. This was very different than the old PC-Lint run. It was very high signal to noise ratio – most of the issues highlighted were clearly incorrect code that could have serious consequences.

This was eye opening, but the cost was high enough that it gave us pause. Maybe we wouldn’t introduce that many new errors for it to catch before we ship.

Microsoft /analyze

I probably would have talked myself into paying Coverity eventually, but while I was still debating it, Microsoft preempted the debate by incorporating their /analyze functionality into the 360 SDK. /Analyze was previously available as part of the top-end, ridiculously expensive version of Visual Studio, but it was now available to every 360 developer at no extra charge. I read into this that Microsoft feels that game quality on the 360 impacts them more than application quality on Windows does. :-)

Technically, the Microsoft tool only performs local analysis, so it should be inferior to Coverity’s global analysis, but enabling it poured out mountains of errors, far more than Coverity reported. True, there were lots of false positives, but there was also a lot of scary, scary stuff.

I started slowly working my way through the code, fixing up first my personal code, then the rest of the system code, then the game code. I would work on it during odd bits of free time, so the entire process stretched over a couple months. One of the side benefits of having it stretch out was that it conclusively showed that it was pointing out some very important things – during that time there was an epic multi-programmer, multi-day bug hunt that wound up being traced to something that /analyze had flagged, but I hadn’t fixed yet. There were several other, less dramatic cases where debugging led directly to something already flagged by /analyze. These were real issues.

Eventually, I had all the code used to build the 360 executable compiling without warnings with /analyze enabled, so I checked it in as the default behavior for 360 builds. Every programmer working on the 360 was then getting the code analyzed every time they built, so they would notice the errors themselves as they were making them, rather than having me silently fix them at a later time. This did slow down compiles somewhat, but /analyze is by far the fastest analysis tool I have worked with, and it is oh so worth it.

We had a period where one of the projects accidentally got the static analysis option turned off for a few months, and when I noticed and re-enabled it, there were piles of new errors that had been introduced in the interim. Similarly, programmers working just on the PC or PS3 would check in faulty code and not realize it until they got a “broken 360 build” email report. These were demonstrations that the normal development operations were continuously producing these classes of errors, and /analyze was effectively shielding us from a lot of them.

Bruce Dawson has blogged about working with /analysis a number of times: http://randomascii.wordpress.com/category/code-reliability/

PVS-Studio

Because we were only using /analyze on the 360 code, we still had a lot of code not covered by analysis – the PC and PS3 specific platform code, and all the utilities that only ran on the PC.

The next tool I looked at was PVS-Studio. It has good integration with Visual Studio, and a convenient demo mode (try it!). Compared to /analyze, PVS-Studio is painfully slow, but it pointed out a number of additional important errors, even on code that was already completely clean to /analyze. In addition to pointing out things that are logically errors, PVS-Studio also points out a number of things that are common patterns of programmer error, even if it is still completely sensible code. This is almost guaranteed to produce some false positives, but damned if we didn’t have instances of those common error patterns that needed fixing.

There are a number of good articles on the PVS-Studio site, most with code examples drawn from open source projects demonstrating exactly what types of things are found. I considered adding some representative code analysis warnings to this article, but there are already better documented examples present there. Go look at them, and don’t smirk and think “I would never write that!”

PC-Lint

Finally, I went back to PC-Lint, coupled with Visual Lint for IDE integration. In the grand unix tradition, it can be configured to do just about anything, but it isn’t very friendly, and generally doesn’t “just work”. I bought a five-pack of licenses, but it has been problematic enough that I think all the other developers that tried it gave up on it. The flexibility does have benefits – I was able to configure it to analyze all of our PS3 platform specific code, but that was a tedious bit of work.

Once again, even in code that had been cleaned by both /analyze and PVS-Studio, new errors of significance were found. I made a real effort to get our codebase lint clean, but I didn’t succeed. I made it through all the system code, but I ran out of steam when faced with all the reports in the game code. I triaged it by hitting the classes of reports that I worried most about, and ignored the bulk of the reports that were more stylistic or potential concerns.

Trying to retrofit a substantial codebase to be clean at maximum levels in PC-Lint is probably futile. I did some “green field” programming where I slavishly made every picky lint comment go away, but it is more of an adjustment than most experienced C/C++ programmers are going to want to make. I still need to spend some time trying to determine the right set of warnings to enable to let us get the most benefit from PC-Lint.

Discussion

I learned a lot going through this process. I fear that some of it may not be easily transferable, that without personally going through hundreds of reports in a short amount of time and getting that sinking feeling in the pit of your stomach over and over again, “we’re doing OK” or “it’s not so bad” will be the default responses.

The first step is fully admitting that the code you write is riddled with errors. That is a bitter pill to swallow for a lot of people, but without it, most suggestions for change will be viewed with irritation or outright hostility. You have to want criticism of your code.

Automation is necessary. It is common to take a sort of smug satisfaction in reports of colossal failures of automatic systems, but for every failure of automation, the failures of humans are legion. Exhortations to “write better code” plans for more code reviews, pair programming, and so on just don’t cut it, especially in an environment with dozens of programmers under a lot of time pressure. The value in catching even the small subset of errors that are tractable to static analysis every single time is huge.

I noticed that each time PVS-Studio was updated, it found something in our codebase with the new rules. This seems to imply that if you have a large enough codebase, any class of error that is syntactically legal probably exists there. In a large project, code quality is every bit as statistical as physical material properties – flaws exist all over the place, you can only hope to minimize the impact they have on your users.

The analysis tools are working with one hand tied behind their back, being forced to infer information from languages that don’t necessarily provide what they want, and generally making very conservative assumptions. You should cooperate as much as possible – favor indexing over pointer arithmetic, try to keep your call graph inside a single source file, use explicit annotations, etc. Anything that isn’t crystal clear to a static analysis tool probably isn’t clear to your fellow programmers, either. The classic hacker disdain for “bondage and discipline languages” is short sighted – the needs of large, long-lived, multi-programmer projects are just different than the quick work you do for yourself.

NULL pointers are the biggest problem in C/C++, at least in our code. The dual use of a single value as both a flag and an address causes an incredible number of fatal issues. C++ references should be favored over pointers whenever possible; while a reference is “really” just a pointer, it has the implicit contract of being not-NULL. Perform NULL checks when pointers are turned into references, then you can ignore the issue thereafter. There are a lot of deeply ingrained game programming patterns that are just dangerous, but I’m not sure how to gently migrate away from all the NULL checking.

Printf format string errors were the second biggest issue in our codebase, heightened by the fact that passing an idStr instead of idStr::c_str() almost always results in a crash, but annotating all our variadic functions with /analyze annotations so they are properly type checked kills this problem dead. There were dozens of these hiding in informative warning messages that would turn into crashes when some odd condition triggered the code path, which is also a comment about how the code coverage of our general testing was lacking.

A lot of the serious reported errors are due to modifications of code long after it was written. An incredibly common error pattern is to have some perfectly good code that checks for NULL before doing an operation, but a later code modification changes it so that the pointer is used again without checking. Examined in isolation, this is a comment on code path complexity, but when you look back at the history, it is clear that it was more a failure to communicate preconditions clearly to the programmer modifying the code.

By definition, you can’t focus on everything, so focus on the code that is going to ship to customers, rather than the code that will be used internally. Aggressively migrate code from shipping to isolated development projects. There was a paper recently that noted that all of the various code quality metrics correlated at least as strongly with code size as error rate, making code size alone give essentially the same error predicting ability. Shrink your important code.

If you aren’t deeply frightened about all the additional issues raised by concurrency, you aren’t thinking about it hard enough.

It is impossible to do a true control test in software development, but I feel the success that we have had with code analysis has been clear enough that I will say plainly it is irresponsible to not use it. There is objective data in automatic console crash reports showing that Rage, despite being bleeding edge in many ways, is remarkably more robust than most contemporary titles. The PC launch of Rage was unfortunately tragically flawed due to driver problems — I’ll wager AMD does not use static code analysis on their graphics drivers.

The takeaway action should be: If your version of Visual Studio has /analyze available, turn it on and give it a try. If I had to pick one tool, I would choose the Microsoft option. Everyone else working in Visual Studio, at least give the PVS-Studio demo a try. If you are developing commercial software, buying static analysis tools is money well spent.

A final parting comment from twitter:

Dave Revell @dave_revell The more I push code through static analysis, the more I’m amazed that computers boot at all.