Pacific Connection(英語)

Apache: Building the Open Source Infrastructure

When Rob McCool left the National Center for Supercomputing Applications back in 1995, he inadvertently changed the course of open source development. As a student, McCool had created an open source, public domain Web server called NCSA httpd (HTTP daemon). But his departure created a gap among the webmasters who had used, debugged, and extended the code. This small community got the idea to create a common code distribution of their own. Two of them, Brian Behlendorf and Cliff Skolnick, assembled a mailing list, and by February 1995, the Apache Group was formed by eight core contributors. They released Apache 1.0 that December. Less than a year later, it surpassed NSCA httpd in popularity. Today, the Netcraft Web Server Survey shows Apache still well ahead of its competitors with around 60 percent market share. Microsoft is a distant second at around 30 percent, with Zeus and iPlanet down in the low single digits.

These days, the server is more a maintenance than a development project, accounting for less than a quarter of the total developer effort on the Apache website. Meanwhile, other projects have become more intense centers of activity-all under the umbrella of the Apache Software Foundation-the successor to the Apache Group. "What makes Apache Software Foundation interesting and useful is that, like a rising tide, open source is working its way up the software stack," said Behlendorf.

If an open source software stack were an ocean, it would have bottom feeders like Linux and BSD, and surface swimmers like Mozilla and Open Office. But most of the activity of the Apache Software Foundation is concentrated in between. For the outside world, such projects are often difficult to understand. Everyday users have hands-on experience with operating systems and applications. Programmers have similar experience with Web design tools and compilers. But Apache projects are neither, and as a result, ASF sometimes gets criticized for being too opaque. Even for developers, Apache is not the sort of place you can necessarily go to make your job easier. There are no easy-to-use shortcuts: ("People want something they can pull down, and, bam, they've got their e-commerce site," says Behlendorf.) Instead, there are libraries for manipulating XML data, Tcl-Apache integration efforts, and content management frameworks. There are a lot of "sub-projects" whose names sound like military code: Avalon, Jetspeed, JMeter, Struts, Log4J, WatchDog.

"They are largely components," Behlendorf says, "things that are designed to be plugged in and talk to other things." But while overlooked in the press and even by many developers, Apache components are vital to open source's future. Behlendorf, who is also chief technology officer for CollabNet, says that open source applications are comparatively scarce because they are too far out in front. The real push for applications will occur only when there's an infrastructure to support it. "Open source developers tend to build on top of the open source infrastructure of libraries and tools," he says. "If that infrastructure isn't there, they don't do a whole lot of that development. You haven't seen really powerful and things at the top of that stack because the infrastructure-think of it as a rising tide-isn't high enough, yet."

Behlendorf's use of the term "component" to describe Apache Software Foundation projects is not accidental. He believes that the open source movement has finally delivered component-style software development-that access to the source was the missing link. "We were all told we should build our applications as reusable components because in the future, as software developers, we'll just plug them together like Lego blocks to build more complex applications. But that really didn't work because people didn't buy and sell just individual components, they bought libraries. People didn't reuse components because components suffer from 'bit rot'- new bugs were always found-or the underlying platforms changed too quickly. When that happened, the developer who was the recipient of that component couldn't fix the bug, because he didn't have the source code."

If Apache were a software company, marketing people would complain that the "brand" was too confusing. Apache XML, Apache Java servlets, Apache HTML servers, Apache PERL projects-what do they have in common? For Behlendorf, the Apache "brand" is more about how the software is developed than what it does. "If a piece of software uses the word 'Apache,' we want people to know that it was built using the principles of collaborative development that re-enforce some open standards. Doing so makes that software approachable, from a programming standpoint, and helps ensure high quality."

XML and Jakarta

For Behlendorf, the two most significant areas of work under Apache are Jakarta and XML, both launched the same time the foundation was created: 1999. Jakarta is broadly defined as "commercial-quality, open-source, server-side solutions for the Java platform." In retrospect, Java has been more successful on the server than on the client, where it was originally intended. "But in 1999, it was still a toss up," he says. "People were still talking about Java applets and applications. That still happens, but we felt that there wasn't any coordination going on in the server side. Jakarta was launched for people who wanted to run on open source application infrastructures." Jakarta has since become a sprawling framework for a wide variety of server side Java developments. Jakarta sub-projects include Apache Ant, a Java-based build tool; Cactus, a test tool for server-side Java code; and Jetspeed, an information portal development tool.

Apache's XML project was born under similar circumstance. "We saw the opportunity to build tools to create, manage, parse and transform XML, as well as create libraries of that code that anyone could use," Behlendorf explains. The group believes that the best way to endorse and perpetuate a standard is to get programmers to do something useful with it. Such was the case with HTTP in its younger days. When companies implemented Apache software inside their products, they helped promote the HTTP protocol specification, and not some proprietary hybrid. "Those of us who were writing the HTTP protocol spec want the big companies to use the software we were creating. That way, they couldn't claim that it was hard to implement HTTP, or hard to be compliant with."

Similarly, back in 1999, people were concerned that Microsoft, who had recently become an XML convert, would start tinking with the standard. The best way to prevent that was to get the tools in place so that people could do something useful. "We wondered if there were people out there who would want to work on XML parsers, XSLT translation engines, and other fundamental tools required to build an XML application." The answer was yes, and the work is still ongoing.

As the Apache Software Foundation has become a natural gathering place for these kinds of development projects, new proposals keep flowing in. The criteria for acceptance is not just open source, but the possibility that a community of developers will form around the code, one that understands how to do distributed development. As much as possible, Apache is aiming for developers who are comfortable not being name-brand superstars. That way, development continues even when the developers move on.

"One thing that distinguishes us from many open source projects is we don't necessarily want a body of code to be tied to a specific individual," says Behlendorf. "We don't want a Linus Torvalds. We love all the personalities in the open source movement-that's part of what makes it fun. But our model doesn't have that central personality." Apache relies instead on development teams, whose members are not necessarily permanent. "You need to have continuity even though some of those people will move on, because they change jobs, or get exhausted, or simply want to work on something else. You want a structure in which, say, a half dozen people can work together, and when some of them drop off and new people join, the work keeps getting done."

Apache's structure encourages this model in a Japanese-like way, by emphasizing the group over the individual, as well as adhering to a hierarchical structure. Becoming an Apache Software Foundation member is like getting admitted to an exclusive club. You begin by volunteering your services to one or more projects, and based on the value of that contribution, you may get voted in. Total membership is currently less than 80. By contrast, some 600 people have commit privileges-that is, they are granted write access to the source code repository. The members, in turn, select Apache's Board of Directors, which has nine members. But there is no god-like figure at the top. There is no Mr. Apache. There is not even much sense of where this is all suppose to be headed, except, of course, upward on Behlendorf's rising tide. If the system works as intended, the current members and Board will retire to the golf course, replaced by younger men (and perhaps even some women) and Apache will keep moving upward.

I asked Behlendorf if he had anything he wanted to say to Japanese software designers. It was perhaps a touchy subject, as Japanese participation is less evident on Apache than on some other international projects. "Admittedly, open source development is overwhelmingly conducted in the English language," he said. "The unfortunate side effect is that those who don't speak it feel left out, especially since so much of the development discussions are verbal. There are so many debates that it can be tough to be involved. In fact, even English speakers who lack communication skills sometimes don't make for good open source developers."

Behlendorf cited the Gnome community as doing a good job of trying to build an international team, but admitted that Apache needs to do more. "Apache Software Foundation would be very interested in hearing from software developers from other parts of the world about how we can better integrate developers from other countries into our activities. We're more interested in doing that than in a suggested alternative: which is create the regionalized Apache Software Foundation. This is a worldwide network. We should be able to communicate and build an organization that links together individuals, wherever they are. That's what we'd like to do."

Sidebar: A Conversation with Randy Terbush

Another of the eight original core contributors to the Apache Group, Randy Terbush launched a career around the server technology. In 1998, he founded Covalent Technologies, which sells Apache services and enhancements. Last October, Terbush raised eyebrows by leaving Covalent (along with Apache Board president Dirk-Willem van Gulik) to found a consulting group, Tribal Knowledge Group. He has since stepped down from the Apache board.

Terbush moved to San Francisco from Nebraska in the flat mid-section of the country. Declaring Tribal Knowledge a "distributed company," he then moved about 700 miles northeast to the edge of the spectacular Teton range of the Rocky Mountain. I spoke with him by phone from his house, in Teton Canyon, in Wyoming near the Idaho border.

Let's start with some history. How did Rob McCool's Web server wind up creating Apache.

Rob was initially working at Illinois Urbana with the NCSA [National Center for Supercomputing Application] and wrote the world's second HTTP server. The first was at CERN. With the Mosaic Project going on at Illinois Urbana with NCSA, I expect that Rob was asked to put together a server. I think the motivation was for it to have a smaller profile, better performance, and simpler code. The CERN server was fairly complicated.

Those of us who came together in the Apache group had some experience with both the CERN and NCSA servers, and there was even a lesser known server written in Perl called Plexus. That was the state of the art in terms of HTTP servers. We came together around the NCSA server mainly because that was our server of choice. The eight of us had exchanged patches for it.

The other side of the history is that Rob was on his way out of Illinois and later went on to Netscape. The future of the NCSA server was in question, as there were some proposed licensing changes. So the eight of us decided that it was important that we pick up the ball and carry it forward.

At what stage did NCSA become an open source project?

Very early, if not at the outset. It was licensed, if I remember correctly, under Illinois Urbana's open source license, and CERN was open source as well. Most of those types of utilities and services that were part of the Internet infrastructure were all delivered as open source.

Were the eight core contributors self-selected?

Yes, very much so. And that's really the nature of any kind of open source effort. People self-select and it's a meritocracy.

Does the NCSA server still exists?

[Terbush looks on the Web] The site says it is no longer under development. "It is an unsupported product, we recommend you check out the Apache server...."

Linux was motivated by building something that wasn't Windows. What about you?

A key motivator was that all of us were involved in serving high traffic websites-the Wired website, the MIT website, the Internet Movie Database, Sesame Street, etc. So we were driven to bring together a software platform that didn't wake us up at night. Apache was very much born in a highly demanding, in-the-fire environment.

When did that environment invite commercial competitors?

Certainly Netscape was the first commercial offering, although Spyglass may have released something in that same timeframe as well. Netscape was highly motivated and the first commercial offering next to IIS and Microsoft.

Was Apache's continued popularity a reflection its price-free-or its technology prowess?

Commercial products were perhaps a more comfortable fit for people unfamiliar with the open source community. But Apache had the reputation as the most robust server, the server of choice, regardless of price-though price certainly powered it through the next several years.

In terms of commercial implementations of Apache, is it fair to equate Covalent with Red Hat?

Covalent's business model was a little different in that we were selling a licensed product and with support options. Red Hat's model is in providing a distribution at minimal cost and selling services around that.

How do most people obtain Apache?

Today most get it with their upgraded system distribution. Solaris comes with it, as do all the Linux distributions. Covalent has added to that as well as IBM, with WebSphere.

Is support typically handled by the vendor?

You can get support from Red Hat for Apache on their platform. Sun and IBM are a little different in that they provide limited support, but they separate it out as being the open source component that we supply. Sun's pitch over the years has been if you really want support, you need to run Netscape.

How much interest and contributions have you gotten from Japan?

Japan is an interesting market. There are some cultural differences that have made the contribution less obvious at times, but they have a very robust open source following there. I'm aware of that as well through my participation in the OSDL [Open Source Development Lab] Board of Directors. It's a consortium created by Intel, IBM, NEC, CA [Computer Associates], and HP, to name a few. It's a lab environment with locations in Oregon and Japan, founded to create more enterprise-class testing and developing environments, to help move forward the development of Linux in these environments.

So it's no coincidence that the second lab is in Japan.

Not at all. There's quite a following in Japan around Linux and open source in general. This is true even with FreeBSD and Apache. The Japanese contingent has done some translations of documentation for Apache, as well as some server enhancements, although those are less obvious.

Are there differences between the Japanese user base and that of the U.S.?

Nothing major. Open source solutions seem to be a little more acceptable for some parts of the IT infrastructure in Japan, although I have also heard that there are more conservative parts of the business, as well. The biggest thing, speaking from Apache Software Foundation's point of view, is that we have not communicated as well between the two countries to fully integrate Japanese participation in the process. That's important.

Is anybody doing anything about it?

There have been a few attempts to communicate. I don't know if the ASF is less approachable from the Japanese public, or what, but we have extended that a number of times.

What should my readers know about the foundation?

I would like them to know that their participation in Apache or any Apache Software Foundation technology would be very welcome. We would view that as a real improvement in the product itself, to be able to deal with some of the unique issues that deployment in Asian cultures requires.

Let's talk about Tribal Knowledge. It sounds like you are moving from developing Apache extensions to customization.

That's close. Since I left Covalent a few months back, I've been thinking about what it will take for true acceptance of open source in the enterprise. I believe these enterprises will take full advantage of the reduced costs by eliminating the need to license software, and that they will adopt software that comes from the Apache project. That's where Tribal Knowledge Group comes in. We can help companies integrate these technologies into their own deploy and test environments. That's in contrast to the software license model, where you purchase a licensed solution that may be derived from open source.

We serve as an integrator and a high level architect. We can help a company design a solution, help them integrate the software releases from the community, maybe do some custom programming or design that would help them integrate with legacy systems, and help them understanding and mitigate security issues.

Do you have any early customers that embody what you are trying to do?

One of our early customers was ADP [Automatic Data Processing, Inc.]. It's a perfect example of a company taking its Web services to the next generation, deploying more high value offerings to their customers. We have helped them scale things a little more, brought a higher level of security, and to leverage all of these disconnected technologies that are coming out of Apache.

Where do you see the company going?

We expect to be about 50 people within another year, and we have a serious focus on Europe as well as the United States. We see a big demand for what we're doing. Quite a few companies are coming out of this recession saying they need to be conservative about the dollars they spend.

What are you doing in the wilds of Wyoming?

I founded Covalent in Nebraska and moved it to San Francisco as part of the initial funding and getting on the treadmill. After leaving that, my experience coming out of Covalent was that we spent more time either on site or working remotely for a customer. So there really wasn't a need to be in the Bay Area and suffer the high prices.

Do you have a nice view?

Definitely. I'm close to skiing and hiking and some of the things that help balance out the other things in life.