Welcome | Sign In
LinuxInsider.com
Community

Yahoo's Hadoop to Run Free in the Wild

Print Version
E-Mail Article
Reprints
Yahoo's Hadoop to Run Free in the Wild

Yahoo is letting the source code behind its version of Hadoop run free. Hadoop is an open source distributed file system, and Yahoo's own version of the technology represents the largest implementation of Hadoop in world, according to the search company.


Crystal Reports - Discover the Latest Innovations.
Download a free trial, view real-time 'behind the scenes' functionality, and learn about new Crystal Reports Server trade in options! Learn more.

Although it's struggling against both giant rivals like Google (Nasdaq: GOOG) and smaller ones like Microsoft's (Nasdaq: MSFT) new search venture Bing, Yahoo (Nasdaq: YHOO) is handing over the source code for its version of Hadoop to the community.

Hadoop, a top-level Apache project, is an open source distributed file system and parallel execution environment that lets its users process massive amounts of data.

Yahoo, which uses Hadoop extensively in its Web search and advertising businesses, has been a major contributor to the project.

A Leap of Faith

Yahoo announced the release of its Hadoop source code at the second annual Hadoop summit, held in Sunnyvale, Calif.

With Microsoft launching Bing so recently, and Google still on the king of the search mountain, is it wise for Yahoo to release the source code behind its search technology and other properties?

Well, it can't hurt, at least. "I don't think this could land Yahoo in more trouble," Jeff Rogers, founder of Open Source Analysts Group, told LinuxInsider.

About Hadoop

Hadoop is a free Java software framework that supports data-intensive distributed applications.

Yahoo, which says Yahoo Search is the world's largest Hadoop implementation, says the application runs on more than 25,000 servers at the company and analyzes tens of billions of Web pages, multiple petabytes of storage and billions of new records every day.

Inspired by research papers about Google's MapReduce and File System applications, Hadoop replicates data aggressively so that the data is not affected even when physical machines go down.

Hadoop's Uses

Users include the Tennessee Valley Authority (TVA), which employs a commercial version of the application from Cloudera, which was founded to provide enterprise-level support to Hadoop users.

The TVA uses Hadoop to analyze data collected about phasor measurement unit (PMU) data throughout the eastern United States. A PMU measures the electrical waves on an electricity grid. It has about 15 TB of data filed and expects this to grow to about 40 TB by the end of next year.

Cloudera's distribution of Hadoop includes the Yahoo source code, adding credence to Yahoo's claim that it's releasing the code to the community to speed up collaboration around open and collaborative research and development.

The Hadoop distributed file system is part of the Hadoop Core, the flagship sub-project of the Apache Hadoop project.

The project also includes Chukwa, a data collection system for managing large distributed systems; HBase, which provides a scalable, distributed database; Hive, a data warehouse infrastructure; Pig, a high-level data flow language and execution framework for parallel computation; and ZooKeeper, a highly available and reliable coordination system.


Print Version E-Mail Article Reprints More by Richard Adhikari


More by Richard Adhikari

Steve Jobs Conquers the Decade - Now What?
November 07, 2009
Apple CEO Steve Jobs has been named the chief executive of the decade by Fortune, and it's hard to call that a bad pick, considering the turnaround Apple has undergone since Jobs returned to the helm in the mid-'90s. What's next on the list for a tech leader who's already changed the way we use computers, how we listen to music, and how we use our cellphones?
Verizon Launches a Droid of a Different Color
November 06, 2009
Motorola's new handset wasn't the only Droid that Verizon brought to market Friday. HTC's Droid Eris also made its debut. The phone closely resembles the HTC Hero, a handset Sprint started selling last month. The similarity in names for the two Verizon phones is no accident -- Verizon says the name "Droid" will be used as a brand within the carrier's lineup.
There's Something About Droid
November 05, 2009
For Verizon, the Droid is an answer to AT&T. For Motorola, it's a path to relevance in the smartphone world. For the Android platform, it's the debut of a brand-new version of the operating system. And for some smartphone shoppers, it could be a tough choice between a Droid and an iPhone.
Don't miss a story -- sign up for our FREE e-mail newsletters and view the latest headlines at a glance.
Tech News Flash [ View Sample ]
E-Commerce Minute [ View Sample ]
ECT News Network Weekly Newsletter [ View Sample ]
Shortcuts
ECT News Network Information
Reader Services
Corporate
ECT News Network