Open source software just keeps getting better, according to a new report from Coverity, a San Francisco-based maker of source code analysis tools.
Specifically, Coverity’s Scan Report on Open Source Software 2008, released last month, found a 16 percent reduction in static analysis defect density in the open source software it analyzed over the past two years, reflecting the elimination of more than 8,500 individual defects.
The study is part of the Department of Homeland Security’s (DHS’s) Open Source Code Hardening Project, and has analyzed more than 55 million lines of code on a recurring basis since March 2006. Coverity assessed more than 250 popular open source projects — including Linux, Firefox and the Apache Web server — amounting to nearly 10 billion lines of code analyzed in two years.
LinuxInsider recently caught up with David Maxwell, Coverity’s open source strategist, to learn more about quality issues and trends in the world of open source software.
LinuxInsider: Tell me a little bit about this project.
Coverity has a three-year contract with the DHS to use our tools to help harden open source code internationally. To do that, we’ve done recurring static analyses of the source code, which means we analyze it and look for errors without running the code being examined. Many of the projects we’ve included have been analyzed on a nightly basis, and we’ve made the results available to developers.
We started off focusing on the biggest open source projects, such as Apache, because if we can help fix a fly in Apache, then all the servers that run it are less vulnerable to attack. Now we’re working on making the analysis available to more projects and conducting it more frequently. Though the DHS contract is coming to an end, we’ve had such a positive response from open source developers that it’s our intent to continue with the project in any case.
LI: How would you describe the prevalence of open source software in the enterprise?
Oh, it’s definitely increasing. Years ago, I had a manager who dismissed open source as worthless — now he has his own company and uses it regularly. The advantages have been there for a long time, but it has really reached public perception now. People have realized that open source is impossible to ignore.
LI: How is the quality of this open source software changing over time?
There are many aspects to the term “quality,” including whether it runs without crashing, whether it’s vulnerable to malicious input, whether it does what the user wants it to do. That said, over time, people keep developing new ways to attack software, and developers keep changing it to make it stronger while adding new features to address user needs. In general, experienced developers and well-tested code lead to higher-quality software, whereas with inexperienced developers and new code, we tend to find more problems.
Between our first analysis and our most recent one, covering a span of two years, some projects have regressed, but the quality of many projects has improved. The difficulty, of course, is that since open source software covers a whole range of developer experience and code maturity, it’s impossible to give one number to represent quality over time. The purpose has been to raise maturity to a higher level more quickly than other methods could achieve.
LI: What’s the picture for security?
Each project tends to improve security over time or else it tends to fall into disuse — if it has a bad reputation, people will look for alternatives. Of course, how does the end user know where a particular tool is on the improvement curve?
In general, security issues have an inverse relationship between severity and frequency. For example, there are more defects that could lead to a denial-of-service attack than there are allowing the execution of malicious code remotely. Our tool finds fundamental issues in code, and security vulnerabilities are often side effects of these. That means it’s hard for the developers to know the impact of a particular issue as well. When we point them to a root cause, the tendency is to just fix it rather than stepping back and looking at the implications. Some projects are more inclined than others to make the effort to release a security advisory about the consequences and tell people to upgrade. It’s good to fix the problem, but you have to tell users to upgrade as well.
LI: What are some best practices that come out of all this?
Projects that use many different techniques to maintain high software quality stand out from other projects. Static analysis is powerful, but it’s not a replacement for unit tests, debuggers, network simulation, system testing — all those tools help find problems more quickly and get them addressed before users find them. Having users identify problems for you is a worst practice — they can only describe the symptoms, and you have to hunt for the root cause.
A lot of the defects we find seem to be the effect of multiple programmers working on a piece of code. You can often find two lines that are an explicit contradiction of each other — maybe 10 lines apart — so that it’s clear the same person couldn’t have written them both, at least not at one sitting. Good commenting and documentation about what the code is supposed to be doing is another best practice. That makes it easier for someone to go through and see if the code is in fact doing what it’s supposed to be. Very rarely is code looked at only once, especially in the open source community.
From a user perspective, a best practice is to look at the security history of a project, both to see the number of issues the project has had as well as the trend over time. Some projects start out with a heavy focus on security and maintain it forever; others start out with many issues but improve processes over time.
LI: What would you say are some of the most important take-aways from the study?
For software developers, one is the number of items that should lead to water cooler discussions about best practices. For managers, there’s interesting data about defect frequencies by type and by size of code base. For users of open source software, it’s probably the importance of choosing your software well. Simply being open source doesn’t automatically mean it’s good and that developers put a priority on security.
LI: Any predictions for where things will go from here?
People will continue to identify new types of security vulnerabilities, and static analysis will continue to improve and be able to address additional issues as well. Historically, buffer overflows were around for many years but didn’t get much public attention until the late ’90s, for example. Typically attacks are known to a small set of people, then become more widely publicized. Integer overflow attacks are now earlier on that curve — they’re harder to identify because they depend on particular values at run time, and so I think probably we’re going to see those continue to crop up for a while. I suspect there are also other types of attacks not widely known yet that will become the next critical issue a few years down the road.