Pacific Connection(英語)

Orange Rectangles Everywhere: RSS 2.0 is Becoming. the Syndication Standard of Choice

If acceptance of a specification instills a kind of immortality, David Winer is on his way to the heavens. Winer is, among other things, the godfather of RSS, Real Simple Syndication, an XML dialect that allows people to subscribe to an information source and get notified when something new gets added. Many attempts have been made to standardize on a "publish/subscribe" (or "push") technology over the Internet-one that alerts users to updates. XML syndication is the method that has triumphed, and Winer's current version of XML syndication-RSS 2.0-is by its prominence on the Web the standard to beat.

The Web is now dotted with orange rectangular buttons labeled "XML" or "RSS," each one earmarking a separate syndication "feed." They include major news sources, including such English-language publications as the New York Times, Wall Street Journal, the Boston Globe, Christian Science Monitor, Time magazine and Newsweek; online news sources like Salon.com and CNET's News.com, American broadcasters like NBC, CNN, and National Public Radio. Companies are starting to offer syndication on their websites. Apple provides feeds on everything from QuickTime Java development to news on the Darwin kernel. There are feeds for comic strips political organizations, and of course, countless blogs.

RSS feeds are mostly text, but not exclusively. A new twist for syndication is "podcasting," in which RSS audio files, predominantly MP3s, are fed to a client, to be played on a PC, Mac or MP3 player. Early users include commentators like MTV's Adam Curry "podcasting" radio-like programming. Podcasting employs an RSS 2.0 subfeature called , which contains the URL for files. Downloading takes place when the client is idle, perhaps overnight.

Syndication civil war

If you click on one of the orange buttons, you get an XML document, whose top line usually says either or . Behind these two tags is a story of unusual acrimony, even for an evolving specification.

RSS's origin is a bit blurry, but people generally trace it back to scriptingNews, a syndication format created in 1997 by Winer, and RSS .90, which was developed in 1999 by Netscape for its user-customizable page. (Netscape's RSS stood either for "RDF Site Summary" or "Rich Site Summary.")

Winer, who founded Userland software and created one of Web's first blogs, thought RSS .90 too complex because of its use of W3C's Resource Description Framework. He moved forward with versions .91 and .92, eventually publishing the current spec--RSS 2.0. That spec is maintained by Harvard Law School's Berkman Center for Internet and Society, where Winer held a fellowship. Meanwhile, RSS 1.0 (which stands for RDF Site Summary) sprang from .90 as a separate development effort under the auspices of the World Wide Web Consortium (W3C). That specification is incompatible with RSS 2.0 and has not gotten much traction.

The more viable contender is the Atom project, a rival XML syndication protocol, currently in version 0.3. Atom's chief content backer is Google, which purchased Blogger.com in 2003 and went with Atom, while rival Yahoo stuck with RSS. Blogger.com is a free service, Google is behind it, and so, many blog feeds employ Atom syndication. "What we've done in Atom-land is adopted the bits of RSS that get used and ditched the ones that don't get used," wrote Tim Bray, who co-chairs the Atom Publishing Format and Protocol working group of the Internet Engineering Task Force. "Plus we've done some clean-ups and touch-ups here and there: markup-inclusion, date-stamping, namespacing, accessibility, a couple others."

Bray has a point in that RSS 2.0 has some fossil elements . When asked about two of the obscurer elements, and , Userland lead developer Jake Savin thought they went all the way back to Netscape, but wasn't certain. Even so, the people behind Atom have strained to explain why their version of XML syndication is truly superior to RSS 2.0. Meanwhile, Google aside, the sheer quantity of RSS 2.0 (and compatible versions .91 and .92) in the publishing industry, as well as RSS's head start, have made "RSS" synonymous with syndication. RSS 2.0 and RSS .91 are used by Microsoft, Apple, Sun and Oracle, and by Mozilla-making it one of the few things these organizations agree on.

"RSS is by no means a perfect format, but it is very popular and widely supported," Winer writes in the RSS 2.0 specification. "Having a settled spec is something RSS has needed for a long time. The purpose of this work is to help it become a unchanging thing, to foster growth in the market that is developing around it, and to clear the path for innovation in new syndication formats. Therefore, the RSS spec is, for all practical purposes, frozen at version 2.0.1. The .1 changes represent corrections of minor typographic errors, not extensions of the spec. RSS 2.0 is extensible, but new elements must be defined in a namespace to ensure backward-compatibility. At this writing, there is no organized effort to push the standard forward. All energy appears to be going toward adoption.

Is there room for a truce between RSS 2.0 and Atom? Last March, Winer suggested a merger of the formats. But the offer was declined by Atom proponents-some of whom argued that the technologies are too incompatible. "It never went anywhere," says Winer. "And I think that highlights the fact that there is no leadership in Atom. There was never anyone who could answer that question. It was worth considering at that point, but really, things have moved on since then."

RSS Reading and self-syndication

If the standard for XML syndication is becoming clear, the software supporting it has turned into a heavy competition. RSS readers (or "aggregators," as they are sometimes called) have proliferated, and which models will prevail-becoming the Google or Internet Explorer of RSS-remains to be seen. This market could fade quickly if Internet Explorer ever came out with strong RSS reading capability, as the browser is the most obvious place to read RSS feeds. But Internet Explorer's most recent upgrade in XP Service Pack 2 had no RSS support whatsoever-giving Mozilla's Firefox browser an opening.

Here are some of the forms RSS readers now take:
  • Extended bookmarks. As RSS readers eventually lead to some kind of Web page, integrating the reader with a browser makes intuitive sense. One way to go about this is simply to extend the bookmarks to include RSS feeds being tracked. Firefox, the open source browser from the Mozilla project, contains a rudimentary RSS reader called Live Bookmarks that does just that. Pluck, a more extensive reader, tightly integrates with Internet Explorer.
  • Extended mail. Some RSS readers work within a mail program, or at least resemble one-thereby putting email, Usenet feeds, and RSS feeds under one program. The three-pane, full-screen interface of many email readers makes this a good tactic for people who want to track large amounts of information. The Norwegian browser Opera takes this approach in its built-in email facility. Mozilla's Thunderbird does the same.
  • Standalone. Some readers are standalone applications. For example, Australia's Awasu runs in the background under Windows, notifying users when new material comes in, then displays the contents. You could also use Awasu on its own as your browser of choice.
  • Web-based. Some RSS readers are accessible online, thereby allowing more casual users to try their hand at RSS without having to install extra software. My Yahoo, Yahoo's customizable page is the best known, and with a recent site overhaul, the service is very convenient. You can choose among popular RSS feeds or select your own, and they appear as news items on your customizable page. Ironically, My Yahoo finally fulfills the dream Netscape had when it created its version of RSS in the first place.

My Yahoo's RSS support puts the Yahoo portal ahead of Google, which has, at this writing, no syndication support whatsoever. And that has created an opportunity for search engines like Sinic8.com and Feedster that seek out RSS feeds. "We think that RSS is going to take over," says Scott Rafer, Feedster CEO and president. "Within a few years, every significant publisher will be publishing in XML, just as they've been in HTML. And that completely changes the nature of the search." Rafer says that Feedster was launched by two "very experienced search guys who got sick of not being able to find their friends' blogs." The search engine is essentially an XML crawler, indexing (at this writing) more than a million fully searchable feeds.

A Feedster search is not nearly as comprehensive as a Google search and tends to show more blogs than other classes of online information. A search for the New York Times, for example, does not actually come up with any New York Times RSS feeds. But the idea does makes sense. You can subscribe to a feed located by a Feedster query, or even subscribe to the search itself-which enables the tracking of very specific information. That idea appeals to Ben Goodger, lead developer of the Firefox browser. "I find Feedster to be a dandy aggregation engine-and their search results pages are syndicated via RSS" he wrote in his blog. Using Firefox with Feedster is an easy way to get highly customized updates: you run a search on Feedster, subscribe to the results of that search, and add the Live Bookmark to your toolbar. "Easy aggregation - doesn't get much simpler than that."

RSS publishing software and services are also growing-and syndicating your blog has become easy using services like Blogger and FeedBurner. Jim Mahar, a professor of finance at St. Bonaventure University in New York State, used FeedBurner to syndicate two blogs he keeps-one with an international following, the other for his students. Mahar began using the Internet as a way to keep former students informed of events in his field. He began with a newsletter, emailed to about 5,000 addresses-- about 800 of which bounced because of anti-spamming filters. The blog has slowly replaced the newsletter as the better medium, and RSS completes the picture by letting subscribers know when he had made updates. Mahar picked the first company that came up in Google-the FeedBurner syndication service-and got the job done in less than 40 minutes. Almost immediately, he got a spike in traffic. "This past week, I had two interviews for different radio shows who have been reading the blog for corporate finance news stories." He's convinced that syndication was key.

"Blog clog"

As RSS grows in popularity, so do fears that syndication will chew up bandwidth. That already seems to be the case for bloggers big and small. Microsoft, for example, has fed entire blog entries to participants in the Microsoft Developer Network (blogs.msdn.com). But last September, the company scaled back the text to the first 500 characters-so that RSS gives the top-line news of an update but requires individuals to go directly to the blog for more. "Microsoft's flip-flop is a red flag for large enterprises and other groups that host and syndicate bloggers," wrote Paul Festa on the CNET News.com website. "As the practice gains popularity, network administrators could face tough choices in meeting a demand that promises to put new strains on server resources."

Some bloggers objected. "In the blogosphere, there is hardly anything more irritating that an abbreviated RSS feed," wrote Steve Main on his blog. "The WHOLE PURPOSE of an RSS aggregator is so that I don't have to open my freaking web browser to 100 different pages. By having the content right there in my aggregator, I can skim an entire article in the time it takes to open up a new web browser. By not including full content in the RSS feed, you take away some of the productivity gains that RSS offers."

Microsoft responded by upping the limit to 1250 characters. MSDN head Sara Williams asked on her blog: "Why serve up 400k of content when we know that folks...don't read 400K of content on a web page. The truncation idea is borrowed straight from newspapers - read the first bit on the front page, turn to page 12 for the full story."

Her point is well taken, says Gary Lawrence Murphy, who runs a personal blog and a few websites from his home in Sauble Beach, Ontario. Murphy attracts only about 4,000 to 6,500 unique visitors a day. But after he began syndicating, he started getting notices from his carrier that his paid-for network capacity would be exhausted for the day. Murphy says that there are two ways to look at RSS. Either it delivers a notice that the content you are following has changed, leaving it to you to go to the Website; or it delivers the entire content itself. The former, he says, was the original idea of RSS. "The idea was 'microcontent'-the stories should be brief enough where you could get the idea on a cellphone or PDA," he says.

Murphy contends that many RSS readers compound the problem by not correctly implementing the conditional get command that is part of the HTTP specification. "The original idea was for proxy servers to be able to cache content locally," he says "If you send a date and the content does not agree, go and fetch the material again. Otherwise, the server just returns a 200 byte notification that you are current. "The problem was that the dates must match-and most aggregators don't consider this. They look at the field being called date, and if it's a even a few seconds different, the data gets sent again. And the time is often the local server time, rather than the client request time." Consequently, the server is constantly sending out "fresh" data, regardless of whether it is refreshed or not. Do the math, says Murphy, and even 100 blog subscribers can tax a system-each querying the data at least 24 times a day. But that's the minimum. "It's human nature to be the first to have the news, so people reset their reader to query every 10 minutes." That results in a lot of hits.

But others think that network bandwidth is so inexpensive that "blog clog" won't really happen, at least for those people serving text. "Bandwidth is getting cheaper every year," says David Winer. "I've learned as a software developer not to evaluate systems on today's deployment. You should always be thinking two or three years out." Winer says he's a strong believer in delivering enough content of the article in the feed so that a reader has a good understanding of what the full article says. Good examples include feeds from The New York Times and the BBC in which "the descriptions are written very competently, and they know that's all the reader needs. If I want more information, then I click on the link, get the story, and also get an advertisement, which pays for the feed. But with blogs, I'd prefer to see the entire text because I might be reading it on an airplane or commuting where I don't have a net connection to click on a link.

"Where you really have to worry is with podcasting, with these huge MP3 files slogging around. That gets interesting from a bandwidth standpoint: podcasts are sometimes 40MB. If you have a thousand subscribers to that, you can exhaust your allocated bandwidth in one day." Winer says that if bandwidth ever becomes a significant problem for podcasts, BitTorrent, the increasingly popular peer-to-peer file technology in which bandwidth is shared among participants, could be the solution.

Sidebar: An Interview with David Winer

David Winer has long achieved wizard status. In addition to co-authoring RSS .91 with Netscape and authoring RSS .90 and 2.0 on his own, he is the co-author (with Microsoft and IBM) of SOAP 1.1 and XML-RPC (with Microsoft). His blog, Scripting News, is the Web's longest running weblog. Winer founded Userland Software in 1988 and was its CEO until 2002, when he took a year-long fellowship at Harvard Law School's Berkman Center for Internet & Society. Userland's flagship product, the Radio UserLand weblog tool, incorporates features from MyUserland.Com, the first RSS aggregator. Winer was also behind the ThinkTank outliner, an early and invaluable tool for organizing projects and thoughts. Winer was born in the Bronx and lives in Seattle. I spoke with him by phone just before he set off on a short trip to Vancouver.

Are you surprised by the success if RSS?
Not really-I'm actually surprised it took so long. It's a pretty rational idea. I remember the moment in 1999 when I realized that his was going to be the way I was going to read news on the Web forever. And I'm sure a lot of other people have had that moment. RSS has automated a part of the drudgery of using the Internet-and that's how computers evolved: by automating things that human beings do. We did have to wait until there was a critical mass in terms of support from news providers. In 1997 and 1998, that certainly wasn't true. But in 1999, it started taking off with early adopters like Red Herring, Wired, Salon, and News.com, along with lots of blogs.
The way I used to get news was to look at the sites and try to figure out what's new. You do a lot of clicking that way without finding that much new. Today, every hour, my aggregator finds hundreds of things that might be of interest to me. I could never go back.
Is the popularity of RSS driven by news or blogs?
What's the difference between them? We could get started on a real long discussion about that. News organizations are collections of people. And if you take one of those people and put him in his own blog, nothing really changes. I subscribe to about 300 difference feeds, and while I've never done the count, my guess is that about half are blogs.
What was your involvement with The New York Times?
Userland did their feed-that's how The New York Times got into RSS. I was having dinner with Martin Nisenholtz, the CEO of New York Times Digital. I wanted them to do blogs and support blogs, and I got one-half of what I was looking for. We wound up producing their feeds for the first few years.
Was that some kind of benchmark for RSS?
Of course: The New York Times is The New York Times. I don't like feeding that: it drives their arrogance. But I grew up in New York reading The New York Times and modeled my writing after a number of New York Times writers. But yeah, they are one of a small number of publications that can validate a concept. Maybe them more than anybody else. They've been very good at jumping on the Web in many different levels. They were one of the first publications to have a website, as well as to have RSS feeds. Now there is basically universal coverage from the top-tier publications worldwide. Among them, RSS is now more conspicuous by its absence than its presence.
Regarding the RSS 2.0 specification, how did you decide to at least temporarily freeze it?
There's nothing temporary about it. If there had been a cooperative process in the developer community where breakage [i.e. breaking backwards compatibility] was considered an important issue, we could have left it unfrozen and kept going. But that was not the case. People kept arguing about whether or not we should throw the whole thing out and start over again. These weren't people with large, installed bases, they didn't have a lot at state, but they were getting listened to. And working with the publishers and bloggers, I felt they had no interest in upheaval-in changing the way it worked.
Ever since early 1999, there really hasn't been room for change. Once something like that is deployed, you can talk all you want-it isn't going to change. This has been a major misunderstanding, that somehow I have some power over whether the spec is frozen or not. But I don't have that power-it is what it is. RSS is extensible. I have yet to hear of a single thing that anyone wants to do with it through its extensibility.
Once you get adoption as RSS has experienced, why would you want to change it? The whole idea of having XML formats is to get these large and small organizations all playing together on the same field. That's beyond anyone's dream, so why would you want to screw around with that?
What's the legal status of the specification?
RSS isn't owned by anybody; the format isn't copyrighted. But the specification which I wrote is copyrighted; there seemed no way to avoid that. So I put a very liberal copyright on it patterned after the IETF [Internet Engineering Task Force] copyright, that basically says that anybody can used the spec for any purpose as long as you give attribution. Creative Commons [an organization that allows the copying and distribution of work while allowing the author to retain the copyright] puts a legal stamp on that. Being at Harvard, I had access to some good lawyers that are interested in being expansive in what people can do.
One of the things people were looking for at that time was that RSS be independent of a company and person. Harvard's a great brand name and one that had not appeared in technology until then. I thought it appropriate for RSS. The MIT "brand" was all over lots of XML stuff, which is appropriate. The idea here was to put a humanities stamp on RSS, because RSS is largely about human beings, literature, journalism and the values that Harvard stands for. What better way to say that. That's why I insisted RSS be kept simple because this is one of the tricks engineers use to keep users from having any power. By making the simple things appear to complicated, they scare users off.
Is there anything else you'd like to say to Software Design readers?
Ganbatte!

おすすめ記事

記事・ニュース一覧