New Integrated Hadoop Solutions from Dell EMC

The Promise

Back in August 2016, prior to Dell’s acquisition of EMC, the largest in technology industry’s history, Michael Dell promised new Engineered Systems from the combined company.

The analyst community envisioned that the combined company will be a powerhouse in Data Analytics, a one stop shop for Big Data Platforms with a broad portfolio of solutions that span the Enterprise Data Center and the Cloud.  Analysts also predicted that, as with any merger of giants, one can expect to see hurdles in how quickly the two large portfolio of products and solutions can come together.

A lot under the sun

Granted, data analytics is a broad domain area. To make sense of this, at Dell EMC, we use the 3-layered taxonomy of data analytics platforms.

  1. Infrastructure Layer consisting of basic building blocks: Storage, Compute, Networking and Analytics Software
  2. Integration Layer where these building blocks are pre-integrated so companies can focus on higher level applications and use cases.
  3. Analytics Layer where pre-integrated “layered-cake” stacks with unified operational capabilities can bootstrap analytics initiatives at large enterprises.

Furthermore, data analytics software platforms are quite diverse, ranging from Hadoop to NoSQL to proprietary search, business intelligence, data discovery and visualization etc.

Industry’s most comprehensive Integrated Hadoop Solutions

So, coming back to what our CEO has promised back in August, what Engineered Solutions have we built so far?

For this article, lets focus on a subset of solutions: Hadoop Solutions at the Integration Layer. Even this subset of solutions could be quite diverse, so we used two guiding principles in building this solution portfolio:

  • The portfolio should support companies of every size and stage of maturity in using Hadoop solutions to solve business problems
  • The platform configuration for any given Hadoop solution should be driven by business requirements.  Once-size-fits-all approach does not deliver optimal return-on-investment for companies.

Lets look at these two elements in turn.

The Portfolio

The figure below shows the new Dell EMC Solution offerings based on maturity level of an enterprise in leveraging Hadoop solutions.

QuickStart offerings are for companies that have no current investments in big data solution but are looking to solve business problems using data-driven approaches. Data Warehouse Optimization offerings enable companies to gain quantifiable cost savings which can help bootstrap investment into broader set of analytics use cases that can deliver further cost savings, improve customer experience, mitigate business risk and generate revenue, which leads to the Dell EMC Data Lake offerings.

The Platform Configurations

We have converged on four primary platform configurations all driven by business requirements.

For companies looking for high performance Hbase (random real-time access in Hadoop), a shared storage configuration that splits Compute from Storage using DSSD (Dell EMC’s rack-scale flash array) for high performance storage is ideal. Examples of application areas for such a configuration include fraud and anomaly detection, real-time marketing, genomics research, etc.

Dell EMC localized storage, aka Direct Attached Storage configuration is ideal when high performance is needed but there is no requirement for enterprise-grade file management (data protection, disaster recovery, data tiering, encryption, etc.) and in-place analytics (ie. consolidating Hadoop and enterprise file data in one place).

Dell EMC Shared storage configuration with Isilon is ideal for companies looking to scale storage faster than compute, for instance when 80% of the data is older, needs to be stored efficiently and is subject to regulatory compliance, but is less frequently queried relative to newer data, what’s referred to as “Active Archive”. This configuration is also a great fit for consolidating Hadoop and generic IT workloads, so a single large-scale file server can serve both workloads, minimizing data movement in and out of Hadoop, what’s referred to as “In-Place Analytics”.

Companies looking for a geo-scale single logical (aka namespace) Hadoop cluster are best served using a shared storage configuration with ECS, Elastic Cloud Storage.

These solutions will come pre-integrated with industry leading Hadoop distributions, so companies can focus on implementing use-cases rather than integrating the Hadoop stack.

Are there other supported configuration for Hadoop Data Lake offered by Dell EMC besides the ones shown here?

There certainly are. A good example is Dell EMC Vblock for Hadoop compute in lieu of Engineered System for Hadoop, for companies focused on virtualized Hadoop and need tools for large-scale operational management of the compute cluster.

Wrap

We are very excited to launch this comprehensive portfolio of Hadoop offerings in early 2017, a testimony to how cross-functional teams with the singular goal of solving customer’s problems can quickly deliver, even in the largest merger in technology industry’s history.

Please reach out to your local Dell EMC Sales Executive, so we can help you solve your business problems with these new Dell EMC offerings for Hadoop.

About the Author: Sai Devulapalli

Sai Devulapalli is an accomplished leader with 21 years of broad experience in building, launching and managing B2B product and solution portfolios and services lines of business, forging strategic partnerships and aligning portfolio offerings in major acquisitions. His domain expertise is in Data Analytics, Internet of Things, Enterprise PaaS in IT and Telecom industries. Devulapalli is currently responsible for Analytics and Hadoop lines of business for emerging storage portfolio for Dell EMC where he manages the global business, portfolio and partnerships and the integration of former Dell and EMC product offerings with industry-leading Analytics and Hadoop stacks into a cohesive solution portfolio.