Cloudera Enterprise and EMC Isilon: Filling In The Hadoop Gaps

SHARE:
Copied!

As Hadoop becomes the central component of enterprise data architectures, the open source community and technology vendors have built a large Big Data ecosystem of Hadoop platform capabilities to fill in the gaps of enterprise application requirements. For data processing, we have seen MapReduce batch processing being supplemented with additional data processing techniques such as Apache Hive, Apache Solr, and Apache Spark to fill in the gaps for SQL access, search, and streaming.  For data storage, direct attached storage (DAS) has been the common deployment configuration for Hadoop; however, the market is now looking to supplement DAS deployment with enterprise storage. Why take this approach? Organizations can HDFS enable valuable data already managed in enterprise storage without having to copy or move this data to a separate Hadoop DAS environment.

Cloudera

As a leader in enterprise storage, EMC has partnered with Hadoop vendors such as Cloudera to ensure customers can fill in the Hadoop gaps through HDFS enabled storage such as EMC Isilon. In addition to providing data protection, efficient storage utilization, and ease of import/export through multi-protocol support, EMC Isilon and Cloudera together allow organizations to quickly and easily take on new, analytic workloads.   With the announcement of Cloudera Enterprise certified with EMC Isilon for HDFS storage, I wanted to take the opportunity to speak with Cloudera’s Chief Strategy Officer Mike Olson about the partnership and how he sees the Hadoop ecosystem evolving over the next several years.

1.  The industry has different terminologies for enterprise data architectures centered around Hadoop. EMC refers to this next generation data architecture as a Data Lake and Cloudera as Enterprise Data Hub. What is the common thread?

 

The two are closely related. At Cloudera, we think of a data hub as an engineered system designed to analyze and process data in place, so it needn’t be moved to be used. The most common use of the “data lake” term is around existing large repositories (and Isilon is an excellent example), where data is collected and managed at scale, but where historically it’s had to be piped out of the lake to be used. By layering Cloudera Enterprise right on top of Isilon as a storage substrate, we layer a hub on the lake – we let you keep your data where it lives, and put the processing where you need it.

2.  Cloudera leads the Hadoop market. What does EMC Isilon bring to the table for your customers?

Best-of-breed engineered storage solutions, of course; manageability, operability, credibility and a tremendous record of success in the enterprise as well. And, of course, a substantial market presence. The data stored in Isilon systems today is more valuable if we can deliver big data analytics and processing on it, without requiring it to be migrated to separate big data infrastructure.

3.  What are the ideal use cases for a Cloudera-Isilon deployment?

We don’t see any practical difference in the use cases that matter. The processing and analytic workloads for big data apply whether data is in native HDFS managed by Apache Hadoop, or in Isilon. The real question is what the enterprise’s requirements and standards around its storage infrastructure are. Companies that choose the benefits of Isilon now get the benefits of Cloudera as well.

4.  SMB and NFS are examples of protocols that have been around for generations. Will HDFS stand the test of time or be replaced with another protocol to support for example real time applications or applications to support the Internet of Things?

Software evolves continually, but HDFS is a long-term player. SMB and NFS are more scalable and more performant today than they were ten or twenty years ago, and I’m confident that you’ll see HDFS evolve as well.

5.  MapReduce provides an excellent alternative to traditional data warehouse batch processing requirements. Other open source data processing techniques for Hadoop such as Hive, Spark, and Apache HBase, etc provide yet additional capabilities to meet enterprise application requirements.   How do you see this data processing ecosystem evolving in the next 5 years?

It’ll be faster, more powerful, more capable and more real-time. The pace of innovation in the last ten years has been breathtaking, in terms of data analysis and transformation. The open source ecosystem and traditional vendors are doing amazing things. That’ll continue – there is so much value in the data that there’s a huge reward for that innovation.

Continue Reading
Would you like to read more like this?

Related Posts

Who’s Holding Your Data Wallet?

The volume of data created by today’s enterprise workloads continues to grow exponentially. Data growth combined with advancements in artificial intelligence, machine learning, and containerized application platforms, creates a real … READ MORE

Michael Richtberg March 26th, 2020

Bringing Opportunity to OEMs

How OEMS can Unlock New Revenue Streams by Leveraging Software Design and Monetizing Customer Data Thanks to digital transformation, application workload continues to increase exponentially. As an OEM, you and … READ MORE

Quentin Esterhuizen March 23rd, 2020
Click to Load More
All comments are moderated. Unrelated comments or requests for service will not be published, nor will any content deemed inappropriate, including but not limited to promotional and offensive comments. Please post your technical questions in the Support Forums or for customer service and technical support contact Dell EMC Support.