Welcome | Sign In
LinuxInsider.com
Tech Stew

What's New in Open Source Search?

Print Version
E-Mail Article
Reprints
What's New in Open Source Search?

Some critics of existing search engine products say there is a growing need for alternatives to the proprietary search companies and the big business associated with sponsored information and ad revenue from search results. A few innovators are conducting a quest for new search engines and an alternative to the influences of ranking done by proprietary search platform.


Yahoo (Nasdaq: YHOO), Google (Nasdaq: GOOG) and MSN hold a huge lead in search engine technology over open source alternatives. These search giants are competing in a battle among themselves to be a computer user's default search site for search.

Where can a computer user go to find an adequate open source alternative to mainstream engines? Choices appear to be limited. A few established open source projects provide corporate IT managers some additional choices; however, a new offering from the founder of Wikipedia may soon change the search engine landscape.

The concept of finding essential information with the fewest key word refinements is a challenge for both searcher and search engine company. Searching for information online and within local storage drives is an integral part of the work flow process.

The need for an open search engine tool with the ability to catalog and retrieve data stored within the user's network as well as find information on the Internet holds potential for innovation from open source projects. However, few alternatives exist today in open source search engine technology.

"The difference in using Google or Yahoo is the ability for searching inside my firewall or searching privately. You can buy a proprietary product [for intranet searching], but very few open source search engines are in use," David Christian, chief technical officer of Mindbridge, told LinuxInsider. Mindbridge is a provider of business process outsourcing (BPO) services.

Unrest Grows

Some critics of existing search engine products say there is a growing need for alternatives to the proprietary search companies and the big business associated with sponsored information and ad revenue from search results. A few innovators are conducting a quest for new search engines and an alternative to the influences of ranking done by proprietary search platforms.

For instance, take the experience of Matt Burkhardt, chief executive officer of Impari Systems, as an example of the growing user need for new search engine options. Impari Systems is a startup focusing on bringing open source software to schools.

Burkhardt is unhappy with his efforts to disperse his information displayed on Google news feeds. He put out two press releases only to find that soon after posting, they disappeared. Even worse, his notices seemed to be replaced with competing information that was two years old.

That experience and others convinced Burkhardt that search is broken on the Internet. He is hoping that something better comes along.

"Existing open source caters to [a] vertical market. We need something more mainstream," he told LinuxInsider.

Different Strokes

Search engines such as Google, Yahoo and MSN differ in their methodologies and search algorithms. Search engine technology is mostly secret, given the proprietary nature of their platforms.

Preferences for one search engine over another sometimes reach fanatic status, as users rely on a favorite search platform to find content. One of the leading search product alternatives, according to Mindbridge's Christian, is Apache Lucene.

Most open source searching involves a component embedded into a larger project, he noted. Similarly, most of the open source projects using full text search are built with Lucene as the basis.

These alternative open source search projects include both desktop technologies and server-side technologies, alone or in combination, he explained.

The Lucene Model

Apache Lucene is an open source, full-featured text search engine library written in Java that is compatible with cross-platform searching. It is available for free download.

Its June update includes new features that include a payloads package for query mechanisms. This new version is able to boost a search term's relevancy score based on the value of the payload located at that term.

Lucene is now able to use "point-in-time" searching over NFS (network file system) structures. It also has a new API (application programming interface) for pre-analyzed fields.

A Starting Point

Using the Lucene platform as a basis for new open source search products may offer more choices. It is capable of integrating current technology.

"From a programmer's perspective, Apache Lucene has a robust API and .net and Java compatibility. Lucene is the basis for a number of search platforms," said Christian.

NET Framework is a software component developed by Microsoft (Nasdaq: MSFT) that is included in the Microsoft Windows operating system. It provides a large library of pre-coded instructions. Java is a programming language developed by Sun Microsystems (Nasdaq: JAVA).

Inherent Problems

Developing new search engine strategies, for both Internet and intranet use, runs the risk of other problems for potential users, warned Christian.

For example, one problem with using an alternative search product is that components may not talk to all data containers. Another problem is that most people are not good at managing metadata (mechanisms that help define the structure of various document types).

"We need to search multiple indexes and return results in a cohesive fashion. We see some companies just beginning to explore this. We need a search vehicle that will pull everything together," Christian said.

New Approach

Perhaps one of the most promising new open source search offerings will become available by the end of this year by Wiki.com, which recently completed a purchase of the Grub Web crawler tool from LookSmart.

Until now a proprietary search engine, Jimmy Wales, Wikia chairman and Wikipedia founder, told LinuxInsider he will release the Grub code as open source.

Grub is a Web crawler that creates an index of the World Wide Web by borrowing the processing power donated by volunteer computers, similar to the SETI@home project, which looks for extraterrestrial life. This will allow Wales to jumpstart his new search product without having to develop its own computer network to crawl the Web to build and maintain a catalog of content.

"We plan to build all the software needed for free licensing for searching. I want to make all content available license free. Nothing like this exists today," Wales said.

Wikia Search

Wales' plan for a new open source based search engine calls for an expansion of previous open source efforts begun by projects such as Lucene. His goal is to create an open and transparent search tool that does not mask its methodologies and search algorithms.

"There were several open source search projects. They were a start. Some of the pieces have existed. Now we are trying to give it full support," he said.

Wales plans to release some form of a very rough first cut of his new search offering by the first of the year. He will use an ad-based model for the Web site but is not sure about the rest of the business model yet.


Print Version E-Mail Article Reprints More by Jack M. Germain


More by Jack M. Germain

Microsoft FOSSifies .Net Micro Framework
November 18, 2009
Microsoft has declared its .Net Micro framework open source under the Apace 2.0 license. Not all bits of .Net Micro are covered, however. Its TCP/IP stack has been stripped, as has its cryptography libraries. Rights to the TCP/IP stack aren't Redmond's to give, and the cryptography libraries are used outside of the scope of the .Net Micro framework, according to the company.
New Ubuntu OS Features Create Good Karma
November 13, 2009
Amidst the OS upgrades from Apple and Microsoft over the last few months, the Linux OS Ubuntu got a version bump of its own. Ubuntu 9.10, or Karmic Koala, is well worth the effort to upgrade, and its developers have made the process easier -- if you're using the full-sized desktop/notebook version. The Remix version, intended for netbooks, caused quite a few headaches.
Samsung Chimes In With Bada Mobile OS
November 11, 2009
With Android, iPhone, BlackBerry, WinMo, Symbian, WebOS and plenty other mobile platforms fighting for space, is there room for one more? Samsung believes there is, and it's announced a new open mobile platform called "Bada." The company, which already makes handsets for several existing platforms, says Bada will make app-making easy for developers. The first Bada handset should be out in the first half of 2010.
Don't miss a story -- sign up for our FREE e-mail newsletters and view the latest headlines at a glance.
Tech News Flash [ View Sample ]
E-Commerce Minute [ View Sample ]
ECT News Network Weekly Newsletter [ View Sample ]
Shortcuts
ECT News Network Information
Reader Services
Corporate
ECT News Network