SETI@home and the Search for Spare Processors

Is there intelligent life in the universe? Here on earth, despite the blunders of humanity, many of us think we qualify, and figure if our species can do it, some other species on some other planet must have done it too. But how do you confirm the existence of intelligent life elsewhere when even the closest star, Alpha Centauri, is about 4.3 light years away? Based at the University of California, Berkeley's Space Sciences Laboratory, the SETI@home project is trying to answer that question. SETI stands for the "Search for Intelligent Life, and SETI@home is one of a number of searches for extraterrestrial life. (Others include the Harvard/Smithsonian Optical SETI Project, the Southern SERENDIP at Western Sydney University in Australia, the Columbus Optical SETI Project in Ohio, and the BETA Project at Harvard University.)

What sets SETI@home apart is the participation of thousands of volunteers, who are donating spare processor cycle times on their PCs at home ("@home"), school and work to help process an intergalactic storm of data collected by the world's largest radio telescope. They can search for signals about ten times weaker than any previous search, because the project makes use of computationally intensive algorithms technique called coherent integration. These algorithms have never been used for this purpose, including by the Berkeley group's previous SERENDIP program, because they haven't had sufficient computer power.

The processing is done on chunks of data, called work units, which are downloaded from the SETI@home site and processed when the computer is not in use--usually when a screen saver is invoked. After the processing is complete, the work unit is uploaded and exchanged for a new one. This approach is sometimes called "cooperative computing," although Project Director David P. Anderson prefers the more general term "loosely coupled distributed computing." Either way, it's a clever answer to the problems of limited financial and computational resources. Even if no intelligent life is found, cooperative computing has already proven a success--with this project and others.

SETI's founders say that even though the chances of success in this lifetime are miniscule, a radio telescope still offers the best hope of success. Dan Werthhimer, an astronomer and chief scientist on the project, argues that despite the Hollywood version of alien contact, a physical visit is unlikely. "Despite all the alleged Roswells and alien abductions and UFO sightings, there is not a single shred of scientific evidence that aliens have ever visited Earth," he writes. "Not one. The amount of energy and time necessary to travel between stars is so immense that a more 'economical' method is mandated." He estimates that a trip from Earth to the nearest star, Alpha Centauri, would cost well over $30 quadrillion. By contrast, radio signals are cheap.

But realistically, what are the chances that another life form is beaming signals our way that are so strong we can receive them? "I wouldn't hold your breath," says Werthimer, in a recent interview. "But I'm optimistic in the long run. I think the universe is likely to be teeming with life--with 400 billion stars in our galaxy, and with billions of galaxies--it would be bizarre if we were the only ones. However, the search that we've been doing, even SETI@home which is very powerful search, is still just scratching the surface. We do talk about SETI like looking for a needle in a haystack. Right now, Earthlings are just looking at the corners of the haystack. We're just getting in the game."

Werthimer is also the first to admit that the project can't do a systematic, comprehensive search covering all the radio bands and frequencies. But the problem he says is limited by computing power. That's good news because global computer power is still on the rise. According to Moore's law, new processors are still doubling in power every 18 months. Beyond that, Werthimer and his colleagues have tapped the ultimate scalable "computer." At this writing, more than 500,000 users--more than the organization expected and enough to temporarily overload its servers--were participating in SETI@home. That represents more than 60 million hours in CPU time. (These numbers are growing quickly. The latest are posted at setiathome.ssl.berkeley.edu/stats/totals.)

About two-thirds of the volunteers are from the United States followed by the United Kingdom, Canada, Germany and Japan. "There's a huge interest in Japan," Werthimer says. "It's been in a few journals and some of the major newspapers in Japan. We've had a couple of Japanese TV crews that have come out here."

Cooperative computing works for SETI@home because there's no supercomputer big enough for the processing task. And as SETI@home has no government funding, it is hardly in a position to build one. "SETI@home is about 10 times more powerful than the biggest supercomputer on the planet--it is the biggest supercomputer on the planet," Werthhimer says. "If you just had one Pentium computer, it would take you 1,000 years to do what SETI@home is doing in one day."

Data collection from the world's largest radio telescope

The SETI@home project was conceived in 1996 by David Gedye and Craig Kasnoff. The analysis code and prototypes of the client and server software were developed the following year, and after a period of fund raising, the project was launched last May after beta testing by some 7,000 volunteers. The data is collected by the National Astronomy and Ionospheric Center's 305 meter radio telescope near Arecibo, Puerto Rico--the world's largest instrument of its kind--which is operated by Cornell University under an agreement with the U.S. National Science Foundation. The dish itself is fixed, while receiving antennas are mounted on a track that SETI@home uses to collect data up to 20 degrees to either side of the zenith.

The project has been as resourceful about obtaining access to the telescope as it has about obtaining free computing power. "We do something called 'piggyback study'," says Werthimer. "The project runs 24 hours a day, all year round. We figured out a way to do these observations without interfering with the normal astrophysical research that's going on at the telescope. We have our own receiver pointing to a different place than the astronomers are pointing the telescope, and our project doesn't interfere with their work."

The telescope searches 28 percent of the sky, from one to 35 degrees north latitude. To make their task easier, the SETI researchers confine their search to a quiet region of the spectrum--from about 1 to 10 gigaHertz--a region just above that used by electronic pagers and wireless phones. The researchers then narrowed the spectrum further by making an assumption that any intelligent civilization would need water. Because neutral hydrogen gas, H, emits radio signals at 1.42 GHz, while hydroxyl, OH, emits at about 1.64 GHz, the researchers have restricted their search to that lower and upper end of the spectrum. That's clearly a wild hunch, but of course, the bandwidth could be widened later. Meanwhile, you've got to start somewhere.

The raw data collected by the receiver is digitized and converted to a 2.5 MHz band that is encoded continuously to tape, along with data on telescope coordinates and time. These tapes are then mailed from Puerto Rico to the University of California at Berkeley for analysis. (Arecibo does not have sufficient bandwidth to send the data via the Internet.) A complete survey of the sky requires 110 35GB tapes, which record a total of 39 terabytes of data.

Once at Berkeley, the tapes are broken down into 256 sub-bands, each 9766 Hz wide via a fast Fourier transform and 8-point inverse transforms. "This spectral analysis divides the spectrum into very fine channels, and then looks at each channel for a strong signal." Werthimer says. The resulting "work units" each consist of 107 seconds of data which are dispersed via to the Internet to participants around the world.

What would an extraterrestrial civilization broadcast look like? SETI@home already makes the assumption that the broadcasts will use the conventional electromagnetic spectrum, although that's not a given. But even here, the bandwidth, time scale, and form of the broadcast are all unknown. Again, you've got to start somewhere, and so the client software searches for signals at 15 bandwidths, ranging from 0.075 Hz to 1220 Hz, and for time scales ranging from 08 Ms to 13.4 seconds.

Part of the processing is involved with "chirping," factoring out changes caused by the Doppler Effect. "The problem is that a signal may not stay at the same frequency because a transmitter may be on a planet that's spinning around, just as we are," says Werthimer. "Because it's moving, the transmitter will introduce a Doppler shift. But because we don't know how fast that planet will be spinning, we have to look through 7,000 different Doppler shifts, or drift rates, to check for all possibilities--moving toward us, moving away from us, different speeds, etc."

In terms of computer time, this is an expensive process. Even the first step requires about 100 billion calculations. A typical computer with a CPU running at 233MHz should take about 24 hours to complete one work unit, and SETI collects over 20,000 work units every day. If an extraterrestrial signal is actually detected, SETI researchers will independently verify the signal and the press and governments will be subsequently notified. If your computer is involved with the detection, you will be listed as a co-discoverer.

About the SETI@home program

To become a participant, all you need to do is download and install the client software from the SETI@home site: setiathome.ssl.berkeley.edu. SETI doesn't care where in the world you reside, only that your computer have at least 32MB of RAM, 10MB of spare disk space, and an Internet connection. The organization will even accommodate a laptop that is connected sporadically. For most users, processing takes place when the computer is not in use and the screen saver mode is invoked. You can also run it in the background. Data is uploaded back from the client every few days, about five minutes at a time.

The program itself was written in C because Java was not considered fast enough. Researchers plan to run the SETI@home for two years, enough time to scan and analyze the sky three times.

While SETI@home is not the first distributed computing scheme of its kind, it has generated the most worldwide interest. Other distributed computing projects involve cracking encryption schemes, discovering new prime numbers, and computing ray trace images. The PiHex project, which was initiated by Colin Percival, a 17-year old mathematics major at Simon Fraser University in British Columbia, is dedicated to discovering ever more precise values for Pi. Unlike SETI@home, which runs during idle time, PiHex runs in the background at idle priority--that is, it uses processing time that no other program wants. After you download the program, Percival will send you a range of the problem to work on.

All of these projects tap what is potentially an enormous research asset. For example, beginning in 1979, a group of researchers at Cray were responsible the highest known prime number, a feat they repeated more than once. Then in 1996, a single PC topped it, and another did so again the following year. The "discoverer" was a part of a cooperative computing project.

Writing in the journal American Scientist, Brian Hayes points out that cooperative computing has both advantages and disadvantages over Janus, the world's largest supercomputer, a 9,126 Pentium monster owned by the Sandia National Laboratory. On the plus side, cooperative computing is cheap--it merely harvests unused processing power that would otherwise go to waste. And it is highly scaleable--there are at least 20 million processors connected to the Internet, although it takes publicity and an interesting project like SETI@home to get a fraction of them working on your particular project.

On the other hand, cooperative computing has terrible bandwidth between processors. "When viewed as a massively parallel computer, the Internet has a peculiar architecture," writes Hayes. "It is extraordinarily rich in raw computing capacity, with tens of millions of processors. But the bandwidth for communication between the processors is severely constrained. The 9,216 Pentiums of the Janus[Sandia?] computer can talk to one another at a rate of 6.4 billion bits per second; for a node connected to the Internet by modem, the channel is slower by a factor of 100,000."

That puts limits on the kinds of algorithms that can be run. The classic problem addressed by massively parallel systems is simulation. The simulation of particles in a force field ordinarily, for example, typlically assigns a processor to each particle in order to track the particle's path in space. "The trouble is, each processor needs to consult all the other processors to calculate the forces acting on the particle, and so the volume of communication goes up as n2," Hayes writes. "That won't fly on the Net."

The other big limitation is the limits of volunteerism. A comparative few fanatics might take the trouble to download SETI@home's software, but that's in part because the idea is new and is a genuine, interesting scientific investigation. If nothing else, you get the world's most interesting screen saver. Other projects, more arcane, may not attract a following. We will run out of volunteers way before we run out of processors. But perhaps a cash payment would work. Back in 1968, Ivan Sutherland--co-founder of Evans & Sutherland--suggested the possibility of a commodities market for computer time. At that point, the Internet hadn't been created, but the idea still holds. Processing power would be priced in a supply and demand basis. What's missing is a viable "e-cash" system so that credits, which are likely to be in the pennies, can be paid without having to go through a credit card. With the collective "minds" of the world's computers harnessed together, who knows what discoveries will be made. In the future, we may all be co-researchers on big science projects yet undreamt of.

An interview with David P. Anderson director of the SETI@home project

David P. Anderson works part time for SETI@home and full time for Tunes.com, a distributor of online music. (see last month's Pacific Connection), where he is chief technology officer. A former member of the Computer Science faculty at UC Berkeley, Anderson has written 65 research papers on operating systems, distributed computing, and computer graphics.

Where did the idea of using idle computers originate?: It goes back to the late 1970s at Xerox PARC [Palo Alto Research Center]. They didn't have any interesting applications, but Xerox was the first one to use computer networks, they invented the Ethernet. If you want to look back at historical sources, that's where you go. In terms of trying to get useful work done, it's pretty recent. And it depends on what you mean by useful. There's one project that computing ever lengthening values of Pi. And there's RC5 [RSA's de-encryption challenge].
At what point did you think this approach might be viable for SETI?: The idea for our project popped up back in 1995. The guy who had the idea and started the project was Dave Gedye. He got too busy to participate a couple of years ago, so I inherited it. He observed that the one thing that seems to fascinate the entire world is the idea of finding other intelligences, hence the X-Files [an American television show] phenomenon, the UFO and crop circles and abduction things, all of which are a bit misguided. They are not scientific, but they do indicate this public fascination with this area. We're leveraging that because it's real easy to get people to participate in this.
And people feel they are doing real science too.: Yes, they are. SETI is a unique problem because first of all, there's the fascination with it, so it's easy to get participants without paying them anything. Secondly, it has this property that it is easily parallelized. In fact, there's no natural limit to how far you can parallelize it. Basically, we're just collecting this giant mountain of data that you can chop up into pieces and work on each one totally separately.
So it's just one processor per work unit?: Yes. We often end up sending the same work unit out more than once just because we have so many participants right now. But basically, one work unit goes to one processor.; The other nice thing is that the ratio of CPU time to communications is very high. You send somebody a quarter megabyte of data, and that can keep a fast processor busy for an entire day--24 hours continuous. It depends on the processor speed, but for a 400 MHz Pentium right now it takes about 24 hours. But we don't need fast communication between processors. Whereas in another case, you might need one second of CPU time and then need to communicate with somebody--to send your result or get their most recent result. The Internet would be a huge bottleneck for that--it wouldn't work.
Do you see a legacy for your application that goes beyond whether or not you discover intelligent life?: Yes. I spend time talking to other scientists and looking around for other types of problems that are amenable to this approach today, and there are a few, including some problems in biotechnology involving molecular simulations and drug design. As time goes by, the range of problems that the Internet and parallel computing can address will increase. That's because the Internet, instead of being mostly modems and telephone lines, will gradually transition to being optical fibers and gigabit speeds and very low communication latencies. It will end up looking more like one of these high-speed networks inside of a Connection Machine box.
And the processors will be more or less continuously connected?: By and large, yes. It will be the default that the connection to your house is on all the time.
Was the problem with getting this project online just funding, or was it difficult to do?: As a computer science problem, it's not difficult at all. The difficult part has been getting our Windows screensaver to work across all versions of Windows, as well as very mundane, boring, detail kind of stuff. The guts of it are not that complicated. There's a database. We send request messages back and forth over HTTP. That part is simple. It's the pragmatics, sweating the details, that's taking the time.; The original people in the project, like me, all have other full-time jobs, so we had to do this in our spare time. The reason for the four-year delay was mostly needing to raise money so we could hire some full-time programmers.
What functions take the most processing time?: Mostly, the remote machines are doing FFTs--fast Fourier transforms. It does FFTs on copies of data that have been chirped, that is, to eliminate the shifting of frequencies from Doppler effect. If somebody were transmitting a single frequency from a rotating planet, it would end up sounding like this [he whistles a low note that slides upward.] Chirping undoes that drift so we can end up looking for a constant frequency.
Is the drift upward as you whistled?: It could go either way, depending on if the transmitter is being accelerated towards us or away from us. Once the data has been chirped, you do the fast Fourier transforms and look for spikes, or for a lot of energy at one particular frequency.
Do the FFTs take up mors of the computational time goes?: Yes. If you look at the screensaver, it constantly tells you what it's doing. We try to give people as much information as possible about what is actually happening in the guts of the analysis. It might not make any sense, but the information is there. About 95 percent of the time it will say 'doing fast Fourier transform."
You've ported the software pretty widely, like including the BeOS and Linux.
: The first version we released to the public was a UNIX version, which doesn't do any graphics. It's not a screen saver. It just runs in the background at a low priority, and it has text output only. That's a very simple program that can be ported to any machine that runs the GNU open software compiler. That gave us all these platforms for free, including Linux. And it turns out that BeOS is able to run UNIX-type programs, and the same for OS/2.; For Windows and the Mac versions, we built them as screensavers with fancy graphics. The reason is that if something is running at a low priority and uses a lot of memory on Windows, it can bog down the performance of your system. If you are running Windows and that's using 20 MB, and SETI@home is using 20 MB, and you only have 32 MB in your system, your machine will run like a dog because of the disk activity from paging from virtual memory. We just wanted to make sure that that didn't happen, so that's why it acts like a screensaver under Windows. You can override that if you want to.
Is this the largest loosely coupled distributed system of its kind?: In terms of cumulative CPU time, we're still behind RC5, which ran for about six months. We've only run for a few weeks, but we'll pass them pretty quick. Our rate of computations--our number of FLOPs [floating point operations] per day is the highest, as far as we know.
So you can claim the world's largest supercomputer, at least in a virtual state.: Yes. Building its equivalent would cost about $50 million.