Build your Own Search Engine: Yahoo! BOSS
On the Hakia search engine, the term "Riza Berkan" pulls up some highly organized results. There's a biography section: "Dr. Berkan is a nuclear scientist with a specialization in artificial intelligence and fuzzy logic." There are also separate sections for images, news and interviews, awards and accomplishments, and bibliography, each with its own links. Not every query on Hakia produce results this crisp-it helps that Riza Berkan is Hakia's founder and CTO. But when I plugged in famous names from the U.
Berkan calls Hakia a general purpose "semantic" search engine, but to create it, he didn't have to start from scratch. He used an API called Yahoo! Search BOSS-as in "Build your Own Search Service." With BOSS, anyone with decent Web programming skills can create a search engine of their own making. Launched last July, BOSS is the successor to Yahoo! Search API. The difference is as much conceptual as technical. "The initial API was a way to distribute our search product," said Bill Michels, senior director and general manager of the Open Search Platform. BOSS, he said, represents a philosophical shift. Its purpose is not to drive traffic to Yahoo! search, but to use that engine to build a search platform of your own. "The API gives you more flexibility: you don't have to take our page rankings, our branding, or our presentation. Anything you want to change can be changed. We even allow you to re-rank the search results based on what's relevant to you. Nobody else does that."
Michels said that search APIs have traditionally been about distribution, not for enabling third parties to build their own search products. "That was true with Yahoo! Search API, as well. Whatever you build, it's got to say 'Powered by Yahoo! Search' or whichever API you are using. In addition, you are not always getting the full search index. So you wind up spreading the word about someone else's product and driving people back to the 'real' product." With BOSS, he said, not just the API, but the business proposition itself, allows you to build a search engine of your own. This "white label" approach can be seen both on Hakia and another early BOSS adopter, Me.
Yahoo! has also removed the limit of 5,000 searches per day. Instead, developers agree that in the future, their site will either feature Yahoo! advertising or pay a fee. And so it would appear that Yahoo! is thinking of BOSS as a way to bring in additional revenues by featuring its advertising in search engines other than its own. Yahoo! provides the API, you use the API to build a search engine, users click on the featured ads, and Yahoo! gives you a share of those revenues. In the competition with Google, BOSS will probably not change the balance. So far, there are few search engines that actually use the technology. But Yahoo! spokespeople express high hopes "Over the course of years, we want this to be much more than a blip," said Prabhakar Raghavan, head of research and search strategy at Yahoo, in an interview with the New York Times." The newspaper said that as of last May, Yahoo! had 20.
BOSS's ability to change that 1:3 ratio with its chief competitor remains to be seen. But for developers, the benefits of the API are clearer, especially when it comes to branding. "It's important to point out just how radical the BOSS concept is," said Peter Newcomb, founder and CTO of Me.
Newcomb's point about adding value to other sites is worth repeating. Yahoo! has promoted BOSS by pointing to websites who present themselves largely as alternative search engines. But the API is flexible enough to use that BOSS would appear to have value on existing websites who want to feature a more specialized search engine-in terms of presentation, ranking of results, or both. In other words, a search engine built with BOSS does not need to be at the center of your site to make a difference. You can offer search without calling yourself a new search engine.
For developers, the biggest challenge posed by BOSS, and the key to BOSS's success, is in coming up with an idea that sets your search engine apart. The mainstream search engines may not be all things to all people, but they are what most people call home-if not via an actual home page, then by a search box on the browser. That means giving people something they can't quite get anywhere else. With the BOSS API, said Michels, "you can innovate with rankings, with presentation, and in blending in your own content. You can bring in your technology, data, insights into your user base, and metadata associated with other URLs." Yahoo! also offers an experimental BOSS Mashup Framework, in which SQL-like commands can mash up the BOSS API with third-party data sources. Hence, depending on your creativity and approach, you may be able to attract people with specific interests, whether sports or politics, with a search engine more tailored to their needs. Or you might build a search engine tailored to the users of an existing site-so that the searches they conduct bring in results that meet their interests-and your website's focus.
Hakia: weighing toward credibility
For Berkan, the opportunity to build a search engine came out of his background. A nuclear physicist involved with information processing, he co-authored the 1997 book Fuzzy Systems Design Principles: Building Fuzzy If-Then Rule Bases, which was published by IEEE. "You can't operate a nuclear system with junk information," he said. And "junk information" is how Berkan characterizes the typical results of a Web search. "The information being pushed today is popularity-based," as opposed to at least aspiring to some academic standard. "With All the search engines today, including Google and Yahoo! , it's very much like getting up in the morning and turning on CNN. What is pushed to you is whatever is popular. That's the perspective, and there really has been no other perspective, available."
With a semantic search, he said, the driving force is not popularity but credibility-that is, it comes from academic sources that are less biased and more verifiable. "For instance, if you search on the benefits of aspirin, a conventional search engine has a mixture of sites. As a consumer, you don't know which are credible and which are not. With Hakia, we are trying to bring you results that are more credible. Berkan thinks Hakia will first be attractive to professional users doing what he calls "knowledge-intensive" searches in the areas of medicine, finance and law-"where the quality of information can be critical." That difference is not always apparent, at least yet. When I searched on "aspirin" and the cholesterol-lowering drug "Lipitor" in both Yahoo! and Hakia, the sites provided by each engine largely overlapped. The biggest apparent difference was that Hakia's were better categorized: "Basic information and FAQ," "Diseases treated by this drug," "Side effects," "Clinical Trials," "News," "Research and statistics." Hakia is in beta, and Berkan doesn't minimize the challenge. "We are trying to finish the site this year, but a semantic search is a difficult thing to build, and we expect it could take years." On the other hand, he said, the BOSS API is comparatively easy to use. "Developers who are considering it shouldn't think twice. Yahoo! has provided a lot of resources, and there's no point in re-inventing the wheel."
dium: social browsing
If Hakia emphasizes credibility over popularity, Me.
Newcomb said that the challenge with BOSS has more to do with user expectations than with implementation. "Users expect that search results are whatever Google or Yahoo! gives them. Our results can be somewhat different, and changing those expectations can be difficult." Newcomb maintains that when a query is on a particularly hot topic, Me.
A related challenge, said Newcomb, is that people also expect search results to be fast-within less than half or even a tenth a second. "Yahoo! by itself does a pretty good job on that. But BOSS represents an extra hop, and therefore, the response is slightly longer. There's really nothing major you can do about that, short of building your own infrastructure. But it's worth noting. In our case, it's important that when we get a search term, we don't first run our search, and only then search Yahoo! We do both in parallel."
I asked Newcomb how he can compete in one of the most entrenched markets imaginable, in which "Google" has become a verb ("let's Google it), and Yahoo! and Microsoft are both trying their best to catch up. How does a small search engine with no marketing budget, no big development teams, attract eyeballs? "Getting people to try our service is absolutely a hard thing. In fact, Yahoo! and Microsoft are having a hard time getting people to use their engines instead of Google's. Google's brand is so incredibly strong, and anyone competing against them will have a huge challenge." Newcomb said that for smaller search engine companies, there's really no choice-you can't go head-on, but must think of yourself as filling a niche-finding a need the big search engines have overlooked. "Me.
Newcomb said that Me.
To use BOSS, developers first obtain a BOSS App ID. Registration requires some basic information about the developer, company, and the application being built. From there, developers can use BOSS to access Yahoo! search services: Web, News, and Image, as well as Spelling Suggestions, which is becoming an expected feature on all English-language search engines, helping ensure that a misspelled term will not run into a dead end. Yahoo! promises additional search "verticals," as these selected searches are called, as well as additional data sources. The BOSS API, like other Yahoo! Web Services, is "REST-Like" (representational state transfer), with parameters encoded into the request URL. The returned results are in XML or JSON, as determined by the programmer, who can then change the result order, eliminate any results they don't want, and blend in their own data. Yahoo! says it is also releasing "an experimental Python library called the Boss Mashup Framework, which provides simplified interfaces for retrieving search results via the Boss API. The framework also provides functions for remixing the results with other data sources."
As with the Yahoo! search engine itself, multiple languages are supported, including Japanese-with language and region set using Universal BOSS API arguments that apply to Web, Image and News searches. Other universal arguments include the number of results to return (10 is the default, 50 maximum), the XML/
A set of API query operators can tailor the search. Quotes produce searches on the exact phrase. A minus (-) operative excludes key words, and a site: operator includes or excludes documents based on their domain. Other arguments work specifically on Web, Image or News searches. For Web searches, you can filter out adult and hate content (filter=[-porn] [-hate])on content in 14 different languages, including Japanese. And you can use type= to specify what types of documents to return: HTML, text, pdf, etc. XML response fields also vary by the type of search. For example, response fields for a News search include the total number of hits, the summary abstract of the story, its headline, language, date of publication, and URLs for the story and the publication, itself. Developers are free to use any and all of these XML field descriptions to create their own results layout.
Yahoo! provides the following XML example of a news search, with abstract, article URL, title, language, date, time, source, and publication URL all shown. The total number of hits is 8775, shown 10 (news count="10") at a time from the beginning (start=0) of results.
<ysearchresponse responsecode="200"> <nextpage>/ysearch/news/v1/soccer?format=xml&start=10&count=10</nextpage> <resultset_news count="10" start="0" totalhits="8775"deephits="8775"> <result> <abstract>June 16 (Bloomberg) -- Adidas AG , the world's second - largest sporting-goods maker, will ``clearly exceed'' its full- year sales target for soccer-related goods and gain share in all major markets, Chief Executive Officer Herbert Hainer said. </abstract> <clickurl>http://www.bloomberg.com/apps/news?pid=20601100&sid=aSSf0jMZtvBU</clickurl> <title>Adidas Will `Clearly Exceed' Soccer Sales Target, Hainer Says</title> <language>en english</language> <date>2008/06/16</date> <time>14:21:15</time> <source>Bloomberg.com</source> <sourceurl>http://www.bloomberg.com/</sourceurl> <url>http://www.bloomberg.com/apps/news?pid=20601100&sid=aSSf0jMZtvBU</url> </result> </resultset_news> </ysearchresponse>