Hadoop gets a makeover: moving from batch processing to real time

This week Dell will be announcing support for Cloudera Enterprise 5 Reference Architecture built on award winning Dell PowerEdge Servers and Networking. Customers value this expert guidance that starts with the bare metal install and leads to a highly tuned Hadoop cluster. The big news with Cloudera Enterprise 5 is the introduction of Apache YARN (Yet Another Resource Negotiator) as the new resource manager for Hadoop 2.0. YARN provides a resource management framework for implementing distributed applications without being tied to MapReduce. With Apache Hadoop v 2.0, MapReduce has been overhauled, and is now re-architected as an application on YARN, MapReduce v 2 (MRv2).

This is exciting for people like me who live, breathe, and build Hadoop solutions. But what does it really mean for customers?

This is the big makeover – moving from batch to real time processing. Apache YARN is the type of evolution customers want to see in the open source community. Customers see the value in Hadoop, yet their needs have evolved over the last five years beyond batch processing. This is all about real time analytics and faster processing of data with new computing frameworks.

The introduction of YARN has been anticipated by the market over the last two years. Customers are ready to build new solutions on top of HDFS that enable real-time analytics. YARN allows for new computing frameworks to work with HDFS. Bottom line? YARN opens up Hadoop to a whole new set of users and applications that were previously impossible.

Apache Spark is a prime example of how YARN enables customers to build a real-time analytics platform.

Spark enables applications in Hadoop clusters to run up to 100x faster in memory, and 10x faster even when running on disk. The other huge benefit: Spark supports SQL queries, *** data, and complex analytics such as machine learning and graph algorithms out-of-the-box, combining all these capabilities seamlessly in a single workflow. This is important because it allows customers to utilize a single platform instead of traditional specialized systems for each type of analysis. Customers can now do iterative, interactive and *** data analysis with one tool, simplifying their environments.

Customers are excited about the new possibilities. I’m excited to help customers start building these new platforms with solutions using Dell best-of-breed technologies: Boomi, Shareplex for Hadoop, Toad BI Suite, and Kitenga.

Come learn more at Hadoop Summit 2014

We hope you can join us at Hadoop Summit 2014 to learn more about Dell Big Data Solutions. And if you’re at the Summit, don’t miss the Dell, Cloudera and Intel Fireside Chat, Thursday, June 5, 12:35 – 1:20PM, to discuss Hadoop tuning and real world benchmarking.

About the Author: Armando Acosta

Armando Acosta has been involved in the IT Industry over the last 15 years with experience in architecting IT solutions and product-marketing, management, planning, and strategy. Armando’s latest role has been focused on Big Data|Hadoop solutions, addressing solutions that build new capabilities for emerging customer needs, and assists with the roadmap for new products and features. Armando is a graduate of University of Texas at Austin and resides in Austin, TX.