Pacific Connection(英語)

The Social Science Candy Shop: A Conversation with Yahoo! Research's Prabhakar Raghavan

A while back, I was looking for a way to connect my MP3 player directly to the radio of my Toyota ist-known here as a Scion xA. The information didn't seem to be on the Web; at least I couldn't find it. Then on a conference board, I read a post from a guy halfway across the country, who was trying to do the same thing. So I emailed him. I didn't expect he'd reply to a complete stranger, but a few hours later, he did just that. Erik, a Wisconsin college student with a Scion tC, provided links to an equipment site and to a step-by-step guide, complete with photos, for retrofitting a Scion xA radio with an auxiliary input converter. He even recommended some music for driving: a compilation from the BBC series Top Gear.

Such examples of altruistic behavior, which happen everyday on the Internet, are a core interest of Prabhakar Raghavan, who heads Yahoo! Research. Both Google and Yahoo! have entered into an intellectual "arms race" to attract smart researchers to their respective companies. But whereas Google has been known for its technology, Yahoo! has placed more emphasis on human behavior. What do people do on the Web? What motivates them? What happens to that behavior if you change the rules? Raghavan sees the Web not just in terms of text and images, but as a human network-a linkage of people and the resources they bring.

Accordingly, Yahoo! Research hires between two and three scientists for each developer. "That may seem like a steep ratio, but we want scientists to spend large chunks of their time thinking before they rush off to build something," Raghavan says." "If you think back to the classical research organization, they would employ a lot of lab scientists, who would in turn rely on lab assistants to do their bidding. Here, we wait until the idea is fleshed out among your colleagues, and only then do we assign developers to build it." Raghavan attracts researchers, whom he calls the "best minds on the planet," with some 10 terabytes of data generated daily-the accumulated record of every user action on Yahoo!

Raghavan, 45, has an unlikely background for such a people-oriented research group. After earning a PhD in electrical engineering and computer science from Berkeley, he looked for university teaching jobs before IBM convinced him to join the Thomas Watson Research Center in Yorktown, New York. "I was lucky to go there at the beginning of my career because it gave me the freedom to explore a wide range of interesting areas: all the way from wiring for IBM chips to crew scheduling for airlines." He later became senior vice president and chief technology officer at Verity before it was acquired by Autonomy Corp. He joined Yahoo! Research in July 2005-which now has four U.S. offices, plus two in Spain and Chile. Raghavan is also a consulting professor of computer science at IBM Tokyo Research Laboratory (Nihon IBM Yamato Kenkyushou), and editor-in-chief of the Journal of the ACM published by the Association for Computing Machinery. Back in the 1990s, he spent several months at the Stanford IBM Tokyo Labs-and speaks Japanese.

Let's start with the basics. What is Yahoo! Research's philosophy and approach to search engines?
We are less preoccupied with the delivery medium, and more about what it fundamentally means to connect people with the right information. There are a number of interesting challenges that come in here. For example, Yahoo! is very engaged in what we call the "social search space." The fundamental idea is that the information people want isn't necessarily in a document on the Web. It could well reside in the head of someone that you might not even know. That means that our charter to get at all the information potentially available on the Internet is broader than we thought. It includes not just crawling the Web for content, not just sucking in RSS feeds, but getting at information that is in people's heads-both expertise and opinion.
Where this information resides varies. In the case of a restaurant review, for example, people often write it up with the expectation that others will find it useful. But there's a lot of information that people are seeking that nobody bothers to record carefully on the Internet. For example, if you are restoring the transmission on a 1957 Volkswagen, you can't count on finding that explicitly on the Web. No well-phrased query will locate it. But there are people out there who are willing to share that information. The question is: how do you find them?
Is that what you mean by "social search"?
Yes. The idea of social search is the power of the search network to bring you information that isn't explicitly documented-and therefore isn't crawlable. This is especially true as information gets more arcane. If you are interested in cryptography, you can read something by Ronald Rivest at MIT. But if you are interested in a narrower sphere of information, you need to dig deeper. This becomes more possible as the network of users grows.
But with growth comes a problem: how do you tell who is an expert and who is not? Who is giving you an accurate response and who is just giving you speculation? Those questions lead to deeper ones: how do you assign a reputation to people? How do you set a value for a level of trust? And if there's someone you trust, how much trust to you assign to someone that they trust?
Is this problem similar to that of link analysis-where you are evaluating the value and trustworthiness of Web links?
Link analysis is a special case. If you think about the "society" of Web pages, the links between them represent the relationship between those pages. The world we live in is a much richer society with a more complex set of relationships. For instance, I might trust you on the subject of software. I might not trust you on the subject of impressionist painting. So we need to develop far more nuanced versions of link analysis. And to do that, we need to ingest a tremendous amount of data in order to make the right inferences. For example, the quality of your writing will tell us something. The fact that you are IM buddies with a whole bunch of people might tell us something. And if these people are themselves experts in impressionism-that would tell us something more. It's these kinds of cues that we have to distill when determining whom can you can trust and where do you find definitive sources for information.
The amazing thing with a service like Yahoo! Answers is that people are extremely forthcoming, sometimes answering a question within seconds of it being posted. The interaction actually resembles that on multi-player video games. Just as some players like the recognition that comes with seeing their name on a "leader board," some people like the recognition that comes with being the first to answer a question. And as respondents are rated, answering itself can become a friendly competition. Other people are less interested in recognition, more interested in the details: they want to dig deeper and learn more.
If we use the definition in the book The Tipping Point, are these people typically "connectors" or are they "mavens"? [Meaning: do they know a lot of experts, or are they experts themselves?]
That's a great question and one that we are trying to understanding even as I speak. It looks like some of the participants are both: they are mavens who are extremely well connected. In participating, they become heavy influencers.
Are you also tapping into what makes Wikipedia work?
Exactly right. Both Wikipedia and Yahoo! Answers are examples of intellectual commons where people are willing to contribute and feel good about doing so.
This philosophy of sharing also applies to us in the research world. When we set up Yahoo! Research, we decided right off the bat to participate in the intellectual commons, where we openly publish our research in the academic model. There are several advantages. Obviously there's a recruiting advantage-academics see our research and are interested in joining us. But beyond that, open research works on the assumption that no one company or group has all the answers, that someone from outside can build on top of what we've done, and when that happens, business opportunities arise that couldn't have been created otherwise. If you don't want to work with the outside world, then you are going to fall behind- because insular technical communities aren't subject to peer review.
How does this philosophy translate into the people you hire?
We do research in the five areas that dovetail very well with the strategic direction of Yahoo! Therefore, the strong people we end up hiring are ones that have an immediate interest in jumping in and making a difference here. This place is like a candy shop for many scientists. We hire people with prestigious and tenured positions. So what can they get here that they can't get at a great university? There are obvious trade-offs, but one thing that weighs heavily in our favor is that we provide access to 10-12 terabytes of data everyday-that's every action of every user on our property. This data is a microscope into human activity.
So you aren't so much researching technologies as studying people.
If you look at our five focus areas, only two or three of them are what you'd recognize as pure computer science: information retrieval and machine learning.
Is machine learning a subset of AI?
Machine learning is the confluence of AI and statistics. It's a huge field because it applies to just about everything-from bomb detection on airplanes to ranking search results.
Another focus area is in community systems. In the early 1980s, database development was driven by the need for a unified data platform underneath enterprise applications. But in a community platform, the needs are somewhat different. You need support for friendships, replication of ratings-and you need to do this across tens or hundreds of thousands of machines. To help further this, we hired Raghu Ramakrishnan, a leading database expert from the University of Wisconsin.
We also focus on microeconomics. Part of this is to analyze the bidding process and the ordering of the advertising-the core of how we generate revenues. But microeconomics as a discipline also has broader implications for us. For example, consider how software design and delivery is done today. A bunch of engineers and architects take a requirements document and build a bunch of software, then hand it over to the people in advertising and sales. That turns out to be the wrong model for our world, because the monetization, how you make money, is an afterthought. Accordingly, we want to avoid a situation where an engineer makes what they think is an engineering decision, but is actually a marketplace decision with serious revenue consequences.
So how do you design incentive mechanisms that make good sense for advertisers but also for Yahoo! and its users? That's key to how we have to design products going forward. Economists can't come in after the product is built; they have to be there right from the beginning. Our product managers are very cognitive of this, and want to have on their staff people with a background in economics and optimization. We want economic thinking to come upstream-at the beginning of the planning cycle, not the end.
And microeconomics affects much of what our users do. The breadth of the transactions on Yahoo! is surprisingly broad. People are, in effect, bidding on jobs, on dates, on vacations-on a whole spectrum of activities. All of these transactions involve people making real decisions about their lives-even when no money changes hands.
Economics comes into play here in unexpected ways. Lets consider a conventional dating site, in which every male can contact as many females as he likes. So they do. And what happens is that each recipient gets multiple emails all expressing interest-but most are nothing more than mass mailings. So let's change the rules. Let's give every male a "budget"-say, five invitations to start with. If he gets favorable responses, we'll increase your quota. Note that this about changing the design of the marketplace itself, not the underlying technologies, with the goal of increasing participant engagement and the sense of value. And it is a market-even though no money is exchanged. Economists can pose the right questions and establish the theoretical underpinnings of the system. Economic theory is grounded in the notion of the rational participation. Behavioral economists take this one step further, factoring in the notions of human behavior and emotion. They can help predict what happens if you implement a particular twist in the market.
And that brings us to our fifth area of focus: the design of media experience. Thirty years ago, Xerox PARC assembled an eclectic cast of characters, with backgrounds in cognitive psychology, ethnography, anthropology, sociology, and behavioral economics. Doing so gave the world two big things: the personal computer, of course, but also a scientific discipline of human-computer interaction. Until then, this wasn't a codified science. Now it is taught at universities and is the subject of conferences. For us, that's the departure point. At Yahoo! Research, we are bringing together a similarly eclectic group. But we really aren't concerned that much about how people interact with a computer. The deeper Yahoo! context is in humans interacting with humans. We want to be the "platform" in which people gather together as communities and conduct a whole range of online activities.
Metcalf's Law says that the value of a network of people grows with the square of the number of people. That's what we are about here. We want to understand why a hundred million teenagers in China who cannot afford PCs, but have cellphones, would find reasons to hang out with us. But we don't have to send an army of interviewers there to find out. The data that we already have in hand gives us the means to do the research. Ultimately, this is all about finding the value of networks and communities, why people should form them, what user experiences can be synthesized-and, our goal: what media experiences can be designed to build audiences that add value.
I notice that building the actual technology comes last.
Right. Our philosophy is to do the thinking up front, understand the audience, and then get to the machines-not the other way around. What you will not see from Yahoo! Research is a bunch of fast prototypes that we keep putting out and hoping that people will catch on. We are trying to do the hard thinking ahead.
So what do your scientists do all day? Do they pour over the terabytes of data and come up with theories?
Yeah, absolutely. For example, they might ask: what is the "health" of a given user community within Yahoo! Is it growing? Is it dying? What are the attributes that lead to one or the other? How do the demographics affect things? These are critical questions for us, which anthropologists and sociologists can help us answer. Forty years ago, sociologists could ask these questions, but before the Internet, you could really not ask them on this scale-to the level of millions of people. That is a luxury we have here. The social scientists who get most excited about this are the ones who realize they can actually test some of their theories on this scale of usage.
Why is Yahoo! Research so geographically dispersed?
Because it really isn't feasible to tell the best minds on the planet to move to the Silicon Valley. There is a synergy from having as many scientists as possible close to the bulk of the engineering. But we are an Internet company, and if we cannot make this work-who can?
I understand you have a "video wall" for bringing people together.
This is a matrix of screens located in lounges-people appear slightly smaller than life-size, but whatever is visible there is visible to people in other locations. What's nice about this setup is that people feel very spontaneous, as opposed to organizing a formal meeting. People hang around these lounges and when they see somebody on the monitors, they extend the conversation. Many times, you'll have a colleague who is an expert in an area you are interested in, but if they are out of sight, they are out of mind-so you wouldn't even think of initiating a phone conversation.
Why is the search engine sector in particular hiring so many researchers?
First of all, the industry is immature. It doesn't matter what amazing growth we've had over the last 10 years, we are still barely scratching the surface in terms of the new online interactive media. Within Yahoo!, that sense goes to the very top: our CEO Terry Semel is asking questions about the underlying science. So when an industry is this immature, there's a feeling that a breakthrough can completely re-route its direction. And nobody wants to lose out on that prospect. For that reason alone, we're passionate about going after the very best people.
I'd also argue that this sector is more complex, because it is a confluence of art and media on one hand, commercial influences on the other. This is so much more than just data plumbing. All the companies in this sector believe there are new markets and revenue models to be tapped. When you are fortunate enough to be growing, that's the time to invest. The best research efforts happen when you get two things: high growth and high margins.
Your background is electrical engineering. How did you make the transition from a matrix of wiring to a matrix of people?
In some ways, it's all pretty much the same thing. In the early nineties, we were all hung up on massively parallel computers-all trying to connect computers for solving huge numerical computations. But what was interesting was the transition from massively parallel, highly synchronous computers to massively distributed asynchronous computing, where it was OK if the server never got back to you. The latter model more closely resembles the network of people we are now considering.
When I decided I wanted to move back to California, IBM was kind enough to send me to the Almaden Research Center in San Jose. When I got there, I heard about something called AltaVista-who's CTO, Andrei Broder, is now with us. AltaVista had scaled the art of library science to something like 20 million documents. I suggested at a conference in January 1997 that even if Alta Vista had unlimited available computation cycles, that alone wouldn't improve it. For search, the key is not just to do things faster, but to do them better. These kinds of problems weren't computationally bound, but bound by the question: what is a better answer. At Stanford, Larry Page and Sergey Brin were working on PageRank and we would run into them at conferences. The methods had some differences of approach-but for a while, two research teams 30 miles apart were all working on the same kinds of problems.