Welcome | Sign In
LinuxInsider.com
Community

Yahoo's Hadoop to Run Free in the Wild

Print Version
E-Mail Article
Reprints
Yahoo's Hadoop to Run Free in the Wild

Yahoo is letting the source code behind its version of Hadoop run free. Hadoop is an open source distributed file system, and Yahoo's own version of the technology represents the largest implementation of Hadoop in world, according to the search company.


Although it's struggling against both giant rivals like Google (Nasdaq: GOOG) and smaller ones like Microsoft's (Nasdaq: MSFT) new search venture Bing, Yahoo (Nasdaq: YHOO) is handing over the source code for its version of Hadoop to the community.

Hadoop, a top-level Apache project, is an open source distributed file system and parallel execution environment that lets its users process massive amounts of data.

Yahoo, which uses Hadoop extensively in its Web search and advertising businesses, has been a major contributor to the project.

A Leap of Faith

Yahoo announced the release of its Hadoop source code at the second annual Hadoop summit, held in Sunnyvale, Calif.

With Microsoft launching Bing so recently, and Google still on the king of the search mountain, is it wise for Yahoo to release the source code behind its search technology and other properties?

Well, it can't hurt, at least. "I don't think this could land Yahoo in more trouble," Jeff Rogers, founder of Open Source Analysts Group, told LinuxInsider.

About Hadoop

Hadoop is a free Java software framework that supports data-intensive distributed applications.

Yahoo, which says Yahoo Search is the world's largest Hadoop implementation, says the application runs on more than 25,000 servers at the company and analyzes tens of billions of Web pages, multiple petabytes of storage and billions of new records every day.

Inspired by research papers about Google's MapReduce and File System applications, Hadoop replicates data aggressively so that the data is not affected even when physical machines go down.

Hadoop's Uses

Users include the Tennessee Valley Authority (TVA), which employs a commercial version of the application from Cloudera, which was founded to provide enterprise-level support to Hadoop users.

The TVA uses Hadoop to analyze data collected about phasor measurement unit (PMU) data throughout the eastern United States. A PMU measures the electrical waves on an electricity grid. It has about 15 TB of data filed and expects this to grow to about 40 TB by the end of next year.

Cloudera's distribution of Hadoop includes the Yahoo source code, adding credence to Yahoo's claim that it's releasing the code to the community to speed up collaboration around open and collaborative research and development.

The Hadoop distributed file system is part of the Hadoop Core, the flagship sub-project of the Apache Hadoop project.

The project also includes Chukwa, a data collection system for managing large distributed systems; HBase, which provides a scalable, distributed database; Hive, a data warehouse infrastructure; Pig, a high-level data flow language and execution framework for parallel computation; and ZooKeeper, a highly available and reliable coordination system.


Print Version E-Mail Article Reprints More by Richard Adhikari


More by Richard Adhikari

New Pogoplug Brings Mobile Devices Into the Cloud
November 20, 2009
The Pogoplug allows a user to run a personal cloud server from a home network. The data resides on hard drives and thumb drives that plug directly into the Pogoplug device; from there, the data can be accessed from anywhere via the Internet. Keep in mind that some ISPs forbid customers from hooking servers up to residential connections, though those rules are rarely enforced.
Google Spills Chrome OS' Guts
November 19, 2009
Google has made public the source code for its upcoming Chrome operating system. The OS will begin appearing on consumer-targeted netbooks next year. Chrome is built to live completely on the Web -- very little data is stored directly on the user's hard drive. This could make for much faster boot times and enhance security.
Cyberfraud Arrests Unlikely to Stem ZeuS Rampage
November 18, 2009
Two alleged cybercrooks have been nabbed in the UK on suspicion of using a well-know Trojan to commit banking fraud. The malware in question in known as "ZeuS" or "Zbot," and althought it's quite common, it's also sometimes difficult for antivirus applications to nail. Simple software kits exist online for relatively inexperienced hackers to create unique malware for the purpose of fraud.
Don't miss a story -- sign up for our FREE e-mail newsletters and view the latest headlines at a glance.
Tech News Flash [ View Sample ]
E-Commerce Minute [ View Sample ]
ECT News Network Weekly Newsletter [ View Sample ]
Shortcuts
ECT News Network Information
Reader Services
Corporate
ECT News Network