Welcome | Sign In
LinuxInsider.com
Community

Yahoo's Hadoop to Run Free in the Wild

Print Version
E-Mail Article
Reprints
Yahoo's Hadoop to Run Free in the Wild

Yahoo is letting the source code behind its version of Hadoop run free. Hadoop is an open source distributed file system, and Yahoo's own version of the technology represents the largest implementation of Hadoop in world, according to the search company.


Although it's struggling against both giant rivals like Google (Nasdaq: GOOG) and smaller ones like Microsoft's (Nasdaq: MSFT) new search venture Bing, Yahoo (Nasdaq: YHOO) is handing over the source code for its version of Hadoop to the community.

Hadoop, a top-level Apache project, is an open source distributed file system and parallel execution environment that lets its users process massive amounts of data.

Yahoo, which uses Hadoop extensively in its Web search and advertising businesses, has been a major contributor to the project.

A Leap of Faith

Yahoo announced the release of its Hadoop source code at the second annual Hadoop summit, held in Sunnyvale, Calif.

With Microsoft launching Bing so recently, and Google still on the king of the search mountain, is it wise for Yahoo to release the source code behind its search technology and other properties?

Well, it can't hurt, at least. "I don't think this could land Yahoo in more trouble," Jeff Rogers, founder of Open Source Analysts Group, told LinuxInsider.

About Hadoop

Hadoop is a free Java software framework that supports data-intensive distributed applications.

Yahoo, which says Yahoo Search is the world's largest Hadoop implementation, says the application runs on more than 25,000 servers at the company and analyzes tens of billions of Web pages, multiple petabytes of storage and billions of new records every day.

Inspired by research papers about Google's MapReduce and File System applications, Hadoop replicates data aggressively so that the data is not affected even when physical machines go down.

Hadoop's Uses

Users include the Tennessee Valley Authority (TVA), which employs a commercial version of the application from Cloudera, which was founded to provide enterprise-level support to Hadoop users.

The TVA uses Hadoop to analyze data collected about phasor measurement unit (PMU) data throughout the eastern United States. A PMU measures the electrical waves on an electricity grid. It has about 15 TB of data filed and expects this to grow to about 40 TB by the end of next year.

Cloudera's distribution of Hadoop includes the Yahoo source code, adding credence to Yahoo's claim that it's releasing the code to the community to speed up collaboration around open and collaborative research and development.

The Hadoop distributed file system is part of the Hadoop Core, the flagship sub-project of the Apache Hadoop project.

The project also includes Chukwa, a data collection system for managing large distributed systems; HBase, which provides a scalable, distributed database; Hive, a data warehouse infrastructure; Pig, a high-level data flow language and execution framework for parallel computation; and ZooKeeper, a highly available and reliable coordination system.


Print Version E-Mail Article Reprints More by Richard Adhikari


More by Richard Adhikari

Google Answers Searches Before You're Finished Asking
September 08, 2010
With its new Google Instant feature, the search king will attempt to answer users' queries before they're even finished telling it what they want to know about. Predictive analysis guesses what the user is about to type based on the first few letters, and a new search is immediately executed. Google says it'll save users time; critics call it a gimmick at best and a nuisance at worst.
Ruling Tosses Location Privacy Issues Deeper Into the Gray
September 08, 2010
Judges may require federal authorities to produce a warrant before being able to access records from cellphone providers that indicate where a person has been and when, according to a recent federal ruling. The decision is the latest in a series of technology-related court rulings that paint a muddled picture regarding the balance of privacy and the powers of law enforcement.
AAPL May Yield a Bumper Crop This Fall
September 08, 2010
Though Apple shares have gained a bit since the company's iPod event last week, the new hardware hasn't exactly catalyzed an AAPL explosion. Instead, look for the company's hotter product lines to lead the way, as iPad and iPhone sales pick up through autumn. A quarterly report this October is expected to be strong, which should gin up share prices further.
Don't miss a story -- sign up for our FREE e-mail newsletters and view the latest headlines at a glance.
Tech News Flash [ View Sample ]
E-Commerce Minute [ View Sample ]
ECT News Network Weekly Newsletter [ View Sample ]
Secure Your Online Business
Save 50% with Entrust SSL Certificates
Shortcuts
ECT News Network Information
Reader Services
Corporate
ECT News Network