Volunteer Computing and the Search for Big Answers
Mar 4, 2008 4:00 AM PT
The Space Sciences Laboratory at the University of California at Berkeley is nestled into the Berkeley Hills just off the fire trail where runners from around the area come to take on the brutal 3.5 mile climb. The lab, though, doesn't feel like it's part of Berkeley. It's located a few miles from the iconic Sproul Hall and People's Park, Berkeley institutions that were home to numerous student rebellions during the '60s. It's easy to miss, removed as it is from the school's infrastructure, which dominates much of downtown Berkeley.
What's going on inside the lab, though, is more radical than anything the student protesters could have imagined 40 years ago. This is ground zero for one of the largest volunteer, distributed computing organizations in the world: the Berkeley Open Infrastructure for Network Computing (BOINC).
BOINC's three-person team, led by research scientist Dave Anderson, developed software tools that enable scientists to harness the power of computers from around the world in order to tackle problems that require massive computing power.
Division of Labor
The premise of BOINC's distributed computing software is simple: Scientists who need to crunch data break their statistical information into small projects. Anyone interested in helping downloads a desktop application for a specific project. As of now, there are roughly 1 million people who are participating in the various projects. When that person's computer goes into sleep mode, it begins processing data in the background. Once the task is complete, the data is uploaded back to the scientist.
It's a simple, elegant solution to a rather enormous problem: how to find enough computing power to solve some of science's greatest problems. Anderson's answer: volunteer computing.
"Volunteer computing is interesting because there are about a billion private computers in the world," Anderson told LinuxInsider. "Some fraction of people who own them will allow them to be used by causes that they support. Scientists who need a lot of computing power can set up a volunteer computing project and get tens of thousands of nodes working for them."
Like developers who become part of the open source movement, the people who populate these volunteer programs are rabid in their desire to get involved. There is no money to be made. There are no grand accolades, outside the generous thanks from the scientists who work couldn't be done without them.
Yet, small collectives have formed through BOINC and the more formally run IBM's World Computer Grid. Just because there's no money involved, it doesn't mean there isn't recognition for those teams that power through issues. Now, teams receive medals -- in the form of icons that designate a pecking order -- based upon the number of projects completed.
In other words, BOINC's software -- which now powers the World Computer Grid -- has turned science social.
"There is a social component to this," said Richard Mitrich, a 66-year old resident of Highland Park, N.J., who is currently working on six projects. "Every project has forums with topics. You get nothing for these points, though, but that doesn't matter. There are people doing science here, so if you waste six hours for them, you've wasted everyone's time. To me, this is just a great joy."
Signs of Life
The idea for Anderson's flavor of distributed computing began with his work on the SETI@home project, which is where the original BOINC software was developed in 1999. The Search for Extraterrestrial Intelligence (SETI) project had always faced the data-crunching problem. With so much information streaming to Earth, there was no way to keep up. SETI@home -- which still attracts volunteers -- was born. The application enabled people to sift through bits of radio waves, looking for signals that might indicate intelligent life.
The project itself was a smashing success in terms of volunteer computing. Tens of thousands of people joined the project. The press fell in love with the idea of the general public looking for E.T.
Anderson, though, wondered if there was a way to adapt the software into a more ubiquitous platform that other scientists could use. He received a grant from the National Science Foundation in 2002. His goal: make it possible for any scientist to create projects for volunteer computing.
He decentralized open source software. Today, there are more than 20 projects powered by BOINC.
"We develop software," said Anderson. "It's open source software. Anyone who wants to create a volunteer project simply puts that software on their server and then uses the API (application programming interface) to port that information. BOINC has no control over who sets up these projects. We just make the enabling software."
Power of the Grid
Distributed computing now reaches far beyond the academic and scientific fields. In fact, distributed computing is the reason you can quickly order books from Amazon without serious lag times and the reason Google can search the Web so quickly.
The business application of distributed computing, called cloud computing, takes grid computing -- which allows one node to tap into a distributed network for a short time -- to its next logical step.
In the past, companies would have one main database of information. Every transaction would take place in the database -- either inputting data or outputting data. That model works well until the entire world is on the Web.
Today, not so much.
"Databases used to be the center of everything," Geva Perry, chief marketing officer for Gigaspaces, a New York City-based middleware company. "You always had to go to the database, but you can't use the database for high performance computing or Web applications with many users looking for multiple functions that come with fast, real-time performance."
Cloud computing allows companies to set up a series of duplicate servers between the user and the database, called "middleware." Each database has the entire set of business rules and data sets contained within the box. As users type in dynamic information, such as ordering a book from Amazon.com, that data is stored in the in-memory data grid on the server. Once the transaction is complete, the server dumps all the data into the main database.
Limiting the times a database is hit and localizing real-time information transfer creates a more stable -- and scalable -- experience.
Small Project, Big Army
Anderson rejects the notion that cloud computing and grid computing is similar to the idea of volunteer computing. He's careful to lay out the benefits -- and drawbacks -- of each.
Volunteer computing is done anonymously so you're never sure the results are accurate, but you have a chance to get one billion people working for you. In cloud computing you risk CPU (central processing unit) failure because of the heat and power strains, but it's very cheap to do high-performance computing. Grid computing is nothing more than an individual computer accessing data from somewhere else, although that would allow institutions to focus on one particular topic.
For his money, though, volunteer computing is the most interesting. There are problems. Scientists by and large aren't programmers, so there is no way to get every scientist who needs help set up with a BOINC project, without going through IBM.
He and his team plug along, looking for little solutions to big problems.
In practice, I have to get grants from the National Science Foundation to fund these projects," Anderson said. "It's a little project, me and two other computer scientists, but we have a giant army of volunteers from testing to programming to translating and writing documentation."