VMware Hatches Spring Hadoop Cross-Breed for Big Data

Virtualization giant VMware has unveiled Spring Hadoop, which integrates its Spring Framework with the Apache Hadoop platform.

Spring provides a comprehensive, lightweight framework that will make it easier for devs to build solutions around the Hadoop platform, according to the company.

Spring Hadoop is available under the open source Apache 2.0 license and can be downloaded free.

“Spring is the most popular development framework for enterprise Java, and this release makes the power of Apache Hadoop available to the vast Spring community of developers,” VMware executive Adam Fitzgerald told LinuxInsider.

“Because of the association of Java with the Web’s request-response approach to transactions, Spring in some ways merely simplifies a problematic split in which a [Java] Bean back-ends a series of responses and combines them into a persistent data transaction,” Wayne Kernochan, president of Infostructure Associates, said.

Java Beans are reusable software components, or objects, for Java. They provide all the benefits of Java’s write once, run anywhere paradigm.

What’s Hadoop? What’s Big Data?

Apache Hadoop is an open source software framework that lets distributed applications work with thousands of nodes and petabytes of data.

Big data consists of data sets so large that it’s difficult to capture, store or search them or share analytics or visualize the data using regular data management tools.

Big data is taking off because larger data sets allow analysts to more clearly spot trends and wring more value out of the data. For example, the U.S. healthcare system could create more than US$300 billion in value every year by harnessing big data, according to the McKinsey Global Institute.

More Info on Spring Hadoop

Spring Hadoop supports the configuration, creation and execution of MapReduce. MapReduce is a software framework Google introduced in 2004 to support distributed computing on large data sets on clusters of computers.

Developers don’t have to rewrite MapReduce jobs in Java when using Spring Hadoop. Instead, they can use non-Java streaming jobs seamlessly because jobs are regarded as beans, and are created, configured, wired and managed like other objects.

Spring Hadoop supports existing Hadoop Tool implementations. Instead of specifying custom Hadoop properties through the command line, devs can simply inject them.

Further, Spring Hadoop provides integration for higher-level abstractions such as HBase, Hive or Pig. This makes it easy to configure and consume these data sources inside a Spring app.

HBase is the Hadoop database. It’s for when users need random, realtime read/write access to their big data. Hbase can host very large tables consisting of billions of rows and millions of columns on clusters of commodity hardware.

Hive is a data warehouse system for Hadoop.

Apache Pig is a platform for analyzing large data sets. Pig programs can be parallelized, and this lets them handle very large data sets.

Gotta Cuppa Java?

Most of Spring’s benefits “are localized to those who find the Java philosophy the best way to attack data processing,” Infostructure’s Kernochan told LinuxInsider. However, “most enterprise programming” is done in Java.

“For the typical enterprise doing quick or cheap Hadoop per-project implementation, this nicely translates the old, familiar Spring model to Hadoop data access,” Kernochan remarked. “While it can be of particular help for developers … it also helps administrators trying to optimize the operation of multiple data-accessing Java Beans.”

Spring Hadoop and Virtualization

Spring Hadoop provides connectivity and integration with Apache Hadoop systems whether or not they’re virtualized, VMware’s FitzGerald said.

However, VMware’s vFabric product family has an execution environment that’s optimized for Spring workloads, “so as Spring Hadoop becomes more widely used, we expect to see customer interest in vFabric increase in comparison to platforms offered from other vendors,” FitzGerald remarked.

Although Spring Hadoop is “more about tuning developers’ favorite Java development platform for Hadoop in a virtualization-agnostic way,” it may boost VMware’s profile, Kernochan speculated. “In that respect, VMware can gain in the battle of perception with Oracle, which has been criticized for not offering full-throated support for open source and open standard Java.”

Overall, Spring Hadoop is “good news for dyed-in-the-wool Java fans, especially given VMware’s strong presence in the clouds on which Big Data resides,” Kernochan concluded.

Leave a Comment

Please sign in to post or reply to a comment. New users create a free account.

Related Stories

LinuxInsider Channels