A New Paradigm for Hadoop

A new ESG Lab Review is a “must read” for any organization looking to consolidate unstructured data, eliminate infrastructure ‘silos’ and leverage Hadoop analytics to gain insight (And who isn’t these days?).The ESG Lab Review: VCE Vblock Systems with EMC Isilon for Enterprise Hadoop,[1] documents lab testing of a converged infrastructure (CI) solution based on VCE Vblock Systems and VCE technology extension for EMC Isilon storage. The ESG Lab Review also describes how VCE Vblock Systems and EMC Isilon storage can be combined with VMware vSphere Big Data Extensions (BDE) to provide a fully integrated platform that easily supports growing big data and analytics requirements. This platform is also easily extensible for a wide range of traditional and next-generation workloads.

Figure 1. Enterprise Hadoop with VCE Vblock Systems and VCE technology extension for EMC Isilon storage

As shown in Figure 1, Hadoop compute resources are provided by the VCE Vblock Systems while EMC Isilon shared storage is used to store the unstructured data and provide Hadoop storage functionality. In addition to streamlining the analytic workflow, this approach provides break-through efficiency and cost savings relative to a “traditional” DAS-based approach. With Isilon, there is no need to create a separate environment to ingest data into a Hadoop cluster because the data can be written directly to Isilon using NFS, SMB, HTTP, or FTP and read by the Hadoop cluster using HDFS. Isilon allows for Hadoop analytics to be done on data that is in-place, while eliminating the need for 3x replication required with traditional direct attached storage (DAS). This lowers costs and simplifies management which is especially important for those organizations to expand from R&D or POC environments to full production. (To get an idea of how much your organization can save with in-place Hadoop analytics on Isilon, be sure to check out the on-line TCO analysis tool here).

Along with very appealing money saving prospects, in-place analytics with Isilon scale-out storage provides a number of other important advantages relative to a “traditional” Hadoop infrastructure utilizing direct-attached storage. As described in the new ESG Lab Review as well as in a previous EMC white paper, EMC Isilon Scale-Out NAS for In-Place Hadoop Analytics, these include increased resiliency, improved data protection and security.  In their analysis, ESG found that with VCE Vblock Systems and EMC Isilon, security and compliance were robust, enabling multi-tenancy and read-only access to data when needed. ESG Lab also validated that vSphere Big Data Extensions (BDE) allow the automatic provisioning of Hadoop nodes, as needed, for both virtual Hadoop clusters and as virtualized node additions to existing bare-metal clusters. This enables Hadoop clusters to be expanded quickly and easily.

This is all great, but for me, the most exciting finding in the ESG Lab Review, is in their analysis of Hadoop performance. Using three different tests, ESG Lab used the Hadoop TeraSort suite to validate the HDFS and MapReduce layers of a VCE Vblock Systems and EMC Isilon joint-solution. In the testing process, the data set size was scaled from 100GB to 1TB and job completion time was monitored in each test case. The results from these tests were then compared to the performance of a traditional Hadoop cluster consisting of commodity servers and DAS. An example of these test results is summarized in Table 2.

Table 2. Performance Comparison: Traditional Hadoop versus Hadoop on VCE Vblock Systems with EMC Isilon Scale-out NAS

These results show that VCE Vblock Systems with EMC Isilon are well suited to deliver levels of virtualized Hadoop performance comparable to bare-metal installations in a scalable, flexible package. In ESG Lab’s TeraSort Suite tests, the VCE Vblock Systems and EMC Isilon solution delivered significant performance benefits, completing Hadoop jobs in as little as half the time compared to a traditional Hadoop configuration. Obviously, this is one of those “your mileage may vary” things but the evidence is compelling.

Interestingly, the ESG Lab Review also describes how ESG Lab measured the performance impact of the loss of a node in the Isilon cluster by intentionally powering down one of the eight Isilon nodes in the tested configuration. They observed Isilon’s data resiliency (attributable to the built-in data protection of Isilon’s OneFS operating system) and confirmed only a 12 percent performance difference, or seven-eighths of the performance of the healthy eight-node cluster (this also demonstrates the linear scalability of Isilon with respect to capacity and performance).

Taking all of these factors into account – efficiency, cost savings, management simplicity, data protection, security and performance – it seems clear that organizations interested in consolidating their Big Data and using Hadoop analytics to accelerate time to insight should check out this ESG Lab Review and learn more about how VCE Vblock Systems with EMC Isilon can benefit their business and transform the ways you run your analytics.

Welcome to the new paradigm for Hadoop!

Source:

[1] ESG Lab Review, VCE Vblock Systems with EMC Isilon for Enterprise Hadoop, November 2014

About the Author: Michael Noble