BitTorrent: Breaking Up (files) is Good to Do. Bram Cohen's big file download protocol proves a boom for open source distributions, a headache for the MPAA
At The Linux Mirror Project website, Linux distributions are stacked like candy at a grocery store. You can get SUSE Linux and Feather Linux, Helix Linux, Mandrake Linux, Gentoo Linux, and Damn Small Linux. There's also FreeBSD, source from Red Source's Fedora project, and much more. These are all big files, and with ordinary FTP, the bandwidth to support timely downloading costs plenty.
And yet the website, TLMP for short, is run as a hobby by two friends: Ross Gynn and Matt Jones, both of the UK. TLMP has a combined bandwidth of 2.
With BitTorrent, files are broken into smaller fragments, typically a quarter of a megabyte each, which get distributed to the peers and are reassembled in random order. Each peer takes advantage of the best connections to the missing pieces while providing an upload connection to the pieces it already has. This approach, sometimes called a "swarm" network because of its hive-like cooperative behavior, did not originate with BitTorrent. But the protocol has succeeded because it is open source and simple to use. You run a client on your machine, search the web for the appropriate "torrent" file. The torrent coordinates with a "tracker"-which serves as the dispatch station, tracking who has what pieces of the file. Click on the torrent, and the actual file, which may be located on another server, comes your way. The process resembles conventional downloading, except that at times, you are uploading, too.
Ross Gynn says TLMP would not have been possible without BitTorrent. "We recently started seeding out a new version of Linux Live Game Project (a Linux Live CD distribution), and the tracker saw over 1TB of transfer in the first day for this torrent alone," he said in an email exchange. "Using FTP to provide this would have meant a very expensive pipe and a lot of bandwidth, especially when you consider this is just one torrent, and we generally have about 150 active torrents at any one time."
In conventional downloading, high demand leads to bottlenecks because many people strain the bandwidth capacity of the host server. With BitTorrent, high demand can actually speed throughput as more bandwidth and additional "seeds" of the completed file are available to the group. BitTorrent inventor Bram Cohen claims that for very popular files, BitTorrent can support about a thousand times as many downloads as HTTP. "Recent changes may change this to ten thousand."
In a study of BitTorrent performance in conjunction with the SuprNova site (which no longer serves torrent files), a team lead by researcher Johan Pouwelse of Delft University of Technology in The Netherlands examined performance and reliability over a month-long period, beginning December 2003. He measured average download speeds at an impressive 240 kbps over a two-week period, enough to download a large file in one day. BitTorrent also proved good at handling surges in demand, a phenomenon known as "flashcrowds." When a new, popular file was "injected" into the network-Lord of the Rings III-early downloaders needed five days-December 21-25-before anyone had the complete file. On December 25, the number of seeds soared to around 250 and the number of downloaders plummeted: because they had now downloaded the complete file.
Bram Cohen, 29, lives in Bellevue, Washington, about five miles southwest of the Microsoft campus and co-founded the cutting edge programmer conference CodeCon. He wrote BitTorrent in Python and released it under the Massachusetts Institute of Technology's open source license. A veteran programmer in the "dot.
Piracy, BitTorrent sites, and the MPAA
BitTorrent's speed makes it ideal for downloading movies and television shows, and it has become the protocol of choice for pirating copyrighted material. A high-quality file of Matrix Reloaded was famously available on BitTorrent sites, even while it was showing in theaters. Anime fans have embraced the protocol, as have TV addicts in search of a series episode they may have missed.
And yet, BitTorrent provides a pretty skimpy cloak of anonymity. Cohen designed the protocol strictly for downloads, with no built-in means of indexing. As a result, the torrent file, the key to obtaining the actual file, usually resides in plain site on the Web. A handful of large sites became clearing houses for torrent files, the most notable being SuprNova.
The threat of litigation closed most of the larger BitTorrent sites, much as the old Napster shut its doors. Among them was SuprNova.
Another site, Sweden's The Pirate Bay, placed two tongue-and-cheek graphs on its Website, one a steepening curve showing the rise in legal threats, another, completely flat, showing the number of torrents deleted in response to those threats: zero. The site has posted a small collection of legal threats from Microsoft, DreamWorks, Electronic Arts, and SEGA, among others, along with its responses. "We understand that you are familiar with Bit Torrent technology," it told one company. "Then you may, or may not, understand that none of the data that you hold the copyright to reside on thepiratebay.
So far, the United States legal system does too. The Ninth U.
Indeed legal precedent has gone the other way. Starting with the Sony Betamax case in 1984, U.
The pros and cons of centralization
For publishers of unrestricted material, BitTorrent's centralized operation is a blessing, not a curse-because actual human beings are involved in maintaining the components. Ross Gynn notes that "if peers are unable to get a file because of a broken seed, they are easily notified to rectify the situation. With a traditional P2P such as Kazaa, if you are downloading a file and the uploader drops out, you cannot contact anyone to get the download "fixed".
In his study, researcher Johan Pouwelse argues that BitTorrent's centralization had made the protocol less susceptible to fake files of the sort that have "polluted" other P2P systems, noting that SuprNova files were "virtually pollution free"-a big difference from other peer-to-peer schemes that are now awash in fake files. His research team attempted to inject some fake files into the system-but the moderators caught all of them. Pouwelse's team was surprised that just 20 moderators could be that effective.
"Decentralization means less control and checks," Pouwelse wrote in an email exchange. "In a fully decentralized system, you can only trust yourselves and people/
But with large sites closing, torrent files may become widely dispersed. The Monkey Methods Research Group argues that such decentralization is inevitable. Monkey Methods is actually three college friends with a "passionate interest translating geeky technology into cool projects that might impact peoples' daily lives," said Andrew Chen in an email exchange. Chen is a recent graduate of applied math and economics at the University of Washington. His cohorts have backgrounds in computer science and psychology. "After spending many hours in coffee shops and living rooms arguing about the Internet and where things were going, we decided to work together to define it ourselves."
Monkey Method's current cool project is TowerSeek.
Regardless of whether peer-to-peer networks prevail in the courts, BitTorrent has proved itself a good way to distribute large files containing non-pirated material. The key is in having enough downloaders be willing to host a complete new version of the file. Such "reseeding" can cost bandwidth because demand for a popular file will quickly spike. But for small torrent sites, the reseeders are angels with bandwidth on their wings.
"You just can't supply enough bandwidth even using a protocol like BitTorrent to 200 downloaders at once across 20 different ISO files with a combined upstream of [just] 2.
Another site offering Linux distributions, as well as the FireFox browser and OpenOffice suite is Solidz.
Gary Lerhaupt, a graduate student in computer science at Stanford University, launched Torrentocracy.
One problem with the Torrentocracy concept is that the people who shoot videos don't necessarily understand how to set up a BitTorrent download on the Web. Lerhaupt's answer is his next project, called Prodigem, a content hosting service still in test mode that acts as a publishing outlet primarily for film makers and other creators of large files. "It provides a simple way to upload a file onto my server, then creates a torrent and an RSS feed with the torrent enclosure. Your RSS aggregators sees it, grabs it, then downloads the content, just as it might an MP3 file." Lerhaupt makes it clear that Prodigem is only interested in "legally licensed material." That stance will undoubtedly keep his torrent file selection small, at least for the time being. But it will also keep him out of court.
Sidebar: BitTorrent Up Close
Like a bee colony, BitTorrent peers contribute to the group and the group gains as a result. But so-called "hive" protocols have their challenges. In his paper on BitTorrent, Bram Cohen lists four of them he has tried to address:
- Figuring out which peer has which parts without incurring large overhead.
- Dealing with the relatively short periods each peer is actually plugged into the network.
- Discouraging "leeches"-people who download, thereby consuming bandwidth, without uploading, giving some bandwidth back.
- Keeping the download/
upload process simple.
Of these, simplicity has proved the key ingredient to BitTorrent's success. To receive files using BitTorrent, you first acquire a client program-from the official BitTorrent site (bittorrent.
To publish a file via BitTorrent, you place a small torrent file on a Web server. The file contains four vital pieces of information about the actual file: its name, size, hashing information to confirm the authenticity of the download, and the URL to a "tracker." The tracker, which is layered on top of HTTP, serves as the matchmaker for BitTorrent, identifying downloaders to other downloaders and telling them who has which pieces of the file. The clients, in turn, report to the tracker which pieces of the file they possess, with the information checked by the hash to ensure the pieces are authentic.
The list of available peers gathered by BitTorrent's tracking algorithm is strictly random. Cohen argues that this approach is the better way to deal with the short "churn" rates, in which people connect to the network only for short periods. Of the pieces available, BitTorrent first chooses a random piece, then subsequently chooses the "rarest" available piece (i.
BitTorrent doesn't require that a complete file or seed be present, as long as the available fragments add up to the whole. "In essence, the peers collectively work together to rebuild the broken seed," says Ross Gynn of The Linux Mirror Project. Gynn says he almost always succeeds in obtaining files over BitTorrent. "At some point the original seed returns or someone makes a new torrent for the file."
To help ensure that files remain available, Matt Jones, Gynn's partner at TLMP, created an application called DynamicSeed that calculates which torrents on the server are in most need of help on the basis its seed-to-peer ration, then downloads the current files and automatically reseeds them. "Before, we had to constantly monitor the tracker and see which files need help, then switch our clients to these manually. Now we can just let DynamicSeed do the work." Gynn says that DynamicSeed is customizable. The seeder could specify the number of torrents to monitor, limit bandwidth, and recalculate that bandwidth allocation on the fly. The application is still being tested.