Pacific Connection(英語)

BitTorrent: Breaking Up (files) is Good to Do. Bram Cohen's big file download protocol proves a boom for open source distributions, a headache for the MPAA

At The Linux Mirror Project website, Linux distributions are stacked like candy at a grocery store. You can get SUSE Linux and Feather Linux, Helix Linux, Mandrake Linux, Gentoo Linux, and Damn Small Linux. There's also FreeBSD, source from Red Source's Fedora project, and much more. These are all big files, and with ordinary FTP, the bandwidth to support timely downloading costs plenty.

And yet the website, TLMP for short, is run as a hobby by two friends: Ross Gynn and Matt Jones, both of the UK. TLMP has a combined bandwidth of 2.25Mb/second (10 times more than when it started) and averages about 30,000 hits per day. What makes it economically feasible is BitTorrent-the peer-to-peer protocol in which unused bandwidth is shared among peers, thereby resulting in faster downloads for all.

With BitTorrent, files are broken into smaller fragments, typically a quarter of a megabyte each, which get distributed to the peers and are reassembled in random order. Each peer takes advantage of the best connections to the missing pieces while providing an upload connection to the pieces it already has. This approach, sometimes called a "swarm" network because of its hive-like cooperative behavior, did not originate with BitTorrent. But the protocol has succeeded because it is open source and simple to use. You run a client on your machine, search the web for the appropriate "torrent" file. The torrent coordinates with a "tracker"-which serves as the dispatch station, tracking who has what pieces of the file. Click on the torrent, and the actual file, which may be located on another server, comes your way. The process resembles conventional downloading, except that at times, you are uploading, too.

Ross Gynn says TLMP would not have been possible without BitTorrent. "We recently started seeding out a new version of Linux Live Game Project (a Linux Live CD distribution), and the tracker saw over 1TB of transfer in the first day for this torrent alone," he said in an email exchange. "Using FTP to provide this would have meant a very expensive pipe and a lot of bandwidth, especially when you consider this is just one torrent, and we generally have about 150 active torrents at any one time."

In conventional downloading, high demand leads to bottlenecks because many people strain the bandwidth capacity of the host server. With BitTorrent, high demand can actually speed throughput as more bandwidth and additional "seeds" of the completed file are available to the group. BitTorrent inventor Bram Cohen claims that for very popular files, BitTorrent can support about a thousand times as many downloads as HTTP. "Recent changes may change this to ten thousand."

In a study of BitTorrent performance in conjunction with the SuprNova site (which no longer serves torrent files), a team lead by researcher Johan Pouwelse of Delft University of Technology in The Netherlands examined performance and reliability over a month-long period, beginning December 2003. He measured average download speeds at an impressive 240 kbps over a two-week period, enough to download a large file in one day. BitTorrent also proved good at handling surges in demand, a phenomenon known as "flashcrowds." When a new, popular file was "injected" into the network-Lord of the Rings III-early downloaders needed five days-December 21-25-before anyone had the complete file. On December 25, the number of seeds soared to around 250 and the number of downloaders plummeted: because they had now downloaded the complete file.

Bram Cohen, 29, lives in Bellevue, Washington, about five miles southwest of the Microsoft campus and co-founded the cutting edge programmer conference CodeCon. He wrote BitTorrent in Python and released it under the Massachusetts Institute of Technology's open source license. A veteran programmer in the "dot.com" era, Cohen could claim BitTorrent as his own creation. In a profile in the magazine Wired, Cohen said he has accomplished more working solo than he ever did as part of a development team. His wife Jeanna described how Cohen would pace around the house all day, then go to his computer "and the code just comes pouring out. And you can see by the lines on the screen that it's clean code."

Piracy, BitTorrent sites, and the MPAA

BitTorrent's speed makes it ideal for downloading movies and television shows, and it has become the protocol of choice for pirating copyrighted material. A high-quality file of Matrix Reloaded was famously available on BitTorrent sites, even while it was showing in theaters. Anime fans have embraced the protocol, as have TV addicts in search of a series episode they may have missed.

And yet, BitTorrent provides a pretty skimpy cloak of anonymity. Cohen designed the protocol strictly for downloads, with no built-in means of indexing. As a result, the torrent file, the key to obtaining the actual file, usually resides in plain site on the Web. A handful of large sites became clearing houses for torrent files, the most notable being SuprNova.com. Each of could claim to be hosting torrent files, not the actual content. But that didn't stop the Motion Picture Association of America from filing lawsuits. In a statement, the MPAA said its member companies filed suit against hundreds of BitTorrent servers, as well as eDonkey and DirectConnect servers, worldwide.

The threat of litigation closed most of the larger BitTorrent sites, much as the old Napster shut its doors. Among them was SuprNova.com, which at this writing is pushing a closed source alternative called eXeem from an undisclosed publisher. A few BitTorrent link sites are fighting back. LokiTorrent took up a collection from its users for legal expenses. In its FAQ, Loki Torrent operators say the site is absolutely legal, with no actual files hosted, giving its owners "absolutely no way of checking what people are sharing. The site "merely tracks the hash ID and the IP addresses of users connected to each particular torrent. It is your responsibility to check that the content of the files which you download are legal in your locality."

Another site, Sweden's The Pirate Bay, placed two tongue-and-cheek graphs on its Website, one a steepening curve showing the rise in legal threats, another, completely flat, showing the number of torrents deleted in response to those threats: zero. The site has posted a small collection of legal threats from Microsoft, DreamWorks, Electronic Arts, and SEGA, among others, along with its responses. "We understand that you are familiar with Bit Torrent technology," it told one company. "Then you may, or may not, understand that none of the data that you hold the copyright to reside on thepiratebay.org's servers....The '.torrent' files that are offered for download at the site in question contain nothing more than hash and checksum information. How this information could, in itself, possibly be an infringement of your copyright is beyond us and apparently the Swedish legal system agrees."

So far, the United States legal system does too. The Ninth U.S. Circuit Court (the second-to-the-top Federal court of appeals serving the West Coast) had ruled in favor of peer-to-peer networks, saying that a network operator could not necessarily be held liable for the actions of its users. A U.S. Supreme Court appeal that will settle the matter is scheduled for late March, with the decision expected this summer. The Electronic Frontier Foundation, which has assisted in the defendants' case, says that the ability of U.S. technology companies to innovate is at stake. "This case has implications that go way beyond peer-to-peer," says Annalee Newitz, policy analyst for the EFF. The fear is that a software developer would be held liable for what other people do with it. "That would open the door to a whole new round of litigation, which would be terrible for U.S. companies." She says that while neither Bram Cohen nor other developers of BitTorrent clients has been sued, a Supreme Court reversal could change that. "There would then be a precedent," she says.

Indeed legal precedent has gone the other way. Starting with the Sony Betamax case in 1984, U.S. courts have generally ruled that technology companies should not be held liable for the actions of their users. (Napster was a major exception.) Moreover, the very technology under legal fire turns out to be a catalyst for growth. Video tapes and now, DVDs, have become a big secondary market for film studios. Newitz notes that the American film industry, had a record-breaking $9.4 billion year. Given that success, the industry would do better to invest in its own networking technology, rather than suing its fans, she says.

The pros and cons of centralization

For publishers of unrestricted material, BitTorrent's centralized operation is a blessing, not a curse-because actual human beings are involved in maintaining the components. Ross Gynn notes that "if peers are unable to get a file because of a broken seed, they are easily notified to rectify the situation. With a traditional P2P such as Kazaa, if you are downloading a file and the uploader drops out, you cannot contact anyone to get the download "fixed".

In his study, researcher Johan Pouwelse argues that BitTorrent's centralization had made the protocol less susceptible to fake files of the sort that have "polluted" other P2P systems, noting that SuprNova files were "virtually pollution free"-a big difference from other peer-to-peer schemes that are now awash in fake files. His research team attempted to inject some fake files into the system-but the moderators caught all of them. Pouwelse's team was surprised that just 20 moderators could be that effective.

"Decentralization means less control and checks," Pouwelse wrote in an email exchange. "In a fully decentralized system, you can only trust yourselves and people/friends that have behaved OK in the past. This is a big P2P design challenge in the future." He believes that some kind of social network structure such as Orkut will be needed, where people can vouch for each other. But centralization has its costs. Only half the SuprNova mirrors stayed up more than 2.1 days, on average, with only 39 of the 243 mirrors up longer than two weeks. Torrent servers were even less reliable. The study concluded "that there is an obvious need to decentralize the global components. However, all the features that make BitTorrent/SuprNova exceptional (easy single-click-download web interface, low level of pollution, and high download performance) are heavily dependent on these global components."

But with large sites closing, torrent files may become widely dispersed. The Monkey Methods Research Group argues that such decentralization is inevitable. Monkey Methods is actually three college friends with a "passionate interest translating geeky technology into cool projects that might impact peoples' daily lives," said Andrew Chen in an email exchange. Chen is a recent graduate of applied math and economics at the University of Washington. His cohorts have backgrounds in computer science and psychology. "After spending many hours in coffee shops and living rooms arguing about the Internet and where things were going, we decided to work together to define it ourselves."

Monkey Method's current cool project is TowerSeek.org, a prototype torrent search engine that shows the possibilities of seeking out torrent files from widely distributed sources on the Web. Monkey Methods estimates that just four percent of the torrent sites contained 80 percent of all torrents. At the same time, it found that nearly 1000 sites host 10 or fewer torrents. Their conclusion: hosting a torrent file is easy, and can be done with just a handful at a time, and therefore, the MPAA will be looking at a many small torrent sites, rather than a handful of big ones. Legal downloads

Regardless of whether peer-to-peer networks prevail in the courts, BitTorrent has proved itself a good way to distribute large files containing non-pirated material. The key is in having enough downloaders be willing to host a complete new version of the file. Such "reseeding" can cost bandwidth because demand for a popular file will quickly spike. But for small torrent sites, the reseeders are angels with bandwidth on their wings.

"You just can't supply enough bandwidth even using a protocol like BitTorrent to 200 downloaders at once across 20 different ISO files with a combined upstream of [just] 2.25Mbps," says Ross Gynn. Just when it looked like his site wouldn't be able to keep up, it started seeing seeders uploading huge amounts of data on the site's behalf. "When we ran a 'whois' on their IP addresses, we found out that they were seeding from places like Lawrence Livermore National Labs, the University of California, the Virginia Polytechnics Institute and even a couple of '.gov' addresses." Gynn said these organizations saved the project. "It's a shame we can't ever track them down and get in touch, it would be nice to thank them."

Another site offering Linux distributions, as well as the FireFox browser and OpenOffice suite is Solidz.com, which gets about 15,000 hits and between 1,500 and 2,000 unique visitors a day, according to Jonathan Zeppettini, the site's BitTorrent administrator. The Website was founded by university students, mostly Canadian. "SolidZ has always been a place where some friends and I could experiment with different technologies, express our ideas, and link to tools and subjects that interest us," he wrote in an email exchange. "Adding torrents has attracted the most attention to our site and we often have people email us with requests, ideas, or even snippets of code that we can use." He says they would like to include public domain literature and video, as well.

Gary Lerhaupt, a graduate student in computer science at Stanford University, launched Torrentocracy.com (BitTorrent+Democracy), which has a modest list of public domain videos and audios, including MP3s of the three U.S. presidential debates. His collection includes interviews from the documentary film Outfoxed, a critical view of Richard Murdoch's Fox Network. Lerhaupt got permission to seed the file from the film's producers. Lerhaupt still hosts the tracker and torrent file, but the actual MPEG "is seeded by God knows who." That ability to actually remove the original source file while still offering its availability is a distinct advantage of BitTorrent.

One problem with the Torrentocracy concept is that the people who shoot videos don't necessarily understand how to set up a BitTorrent download on the Web. Lerhaupt's answer is his next project, called Prodigem, a content hosting service still in test mode that acts as a publishing outlet primarily for film makers and other creators of large files. "It provides a simple way to upload a file onto my server, then creates a torrent and an RSS feed with the torrent enclosure. Your RSS aggregators sees it, grabs it, then downloads the content, just as it might an MP3 file." Lerhaupt makes it clear that Prodigem is only interested in "legally licensed material." That stance will undoubtedly keep his torrent file selection small, at least for the time being. But it will also keep him out of court.

Sidebar: BitTorrent Up Close

Like a bee colony, BitTorrent peers contribute to the group and the group gains as a result. But so-called "hive" protocols have their challenges. In his paper on BitTorrent, Bram Cohen lists four of them he has tried to address:

  • Figuring out which peer has which parts without incurring large overhead.
  • Dealing with the relatively short periods each peer is actually plugged into the network.
  • Discouraging "leeches"-people who download, thereby consuming bandwidth, without uploading, giving some bandwidth back.
  • Keeping the download/upload process simple.

Of these, simplicity has proved the key ingredient to BitTorrent's success. To receive files using BitTorrent, you first acquire a client program-from the official BitTorrent site (bittorrent.com) or elsewhere. (Two other popular clients are BitTornado and Azureus.) You then locate a file on the Web, click on it to save it. A window then appears showing both the download rates, as with a normal download, and upload rates. Following good BitTorrent etiquette, you download the file, but instead of closing the connection as you would with a normal download, you maintain it for uploads until the dialog box closes. In the standard configuration, BitTorrent continues to upload, making your machine and its bandwidth a conduit for other users, until you resume work.

To publish a file via BitTorrent, you place a small torrent file on a Web server. The file contains four vital pieces of information about the actual file: its name, size, hashing information to confirm the authenticity of the download, and the URL to a "tracker." The tracker, which is layered on top of HTTP, serves as the matchmaker for BitTorrent, identifying downloaders to other downloaders and telling them who has which pieces of the file. The clients, in turn, report to the tracker which pieces of the file they possess, with the information checked by the hash to ensure the pieces are authentic.

The list of available peers gathered by BitTorrent's tracking algorithm is strictly random. Cohen argues that this approach is the better way to deal with the short "churn" rates, in which people connect to the network only for short periods. Of the pieces available, BitTorrent first chooses a random piece, then subsequently chooses the "rarest" available piece (i.e., possessed by the fewest downloaders). When downloading, peers take pieces from wherever they can get it. Uploading is more selective. A peer "unchokes" a fixed number of peers (typically four), based on the download rate, as determined by a 20-second rolling average, refusing to upload to others-thereby helping ensure that peers contribute bandwidth as well as consume it. One peer is "unchoked" regardless of the available bandwidth in order to discover whether better rates are available-a process called "optimistic unchoking."

BitTorrent doesn't require that a complete file or seed be present, as long as the available fragments add up to the whole. "In essence, the peers collectively work together to rebuild the broken seed," says Ross Gynn of The Linux Mirror Project. Gynn says he almost always succeeds in obtaining files over BitTorrent. "At some point the original seed returns or someone makes a new torrent for the file."

To help ensure that files remain available, Matt Jones, Gynn's partner at TLMP, created an application called DynamicSeed that calculates which torrents on the server are in most need of help on the basis its seed-to-peer ration, then downloads the current files and automatically reseeds them. "Before, we had to constantly monitor the tracker and see which files need help, then switch our clients to these manually. Now we can just let DynamicSeed do the work." Gynn says that DynamicSeed is customizable. The seeder could specify the number of torrents to monitor, limit bandwidth, and recalculate that bandwidth allocation on the fly. The application is still being tested.

おすすめ記事

記事・ニュース一覧