Fast-track Data Strategies: ETL offload Hadoop Reference Architecture

In 2010, Gartner predicted that big data would grow by an impressive 650-percent over the next five year period. The analysts’ prediction proved more than correct.  According to an IDC report, the digital universe is now doubling in size every two years. By 2020, that universe is expected to reach 44 trillion gigabytes, a 10-fold increase from 2013.  This is forcing customers to look for data platforms that allow them to process, store, transform, and analyze data without lock-in and high TCO. Customers understand that data analysis is not a choice, but necessary to stay competitive in their market.

All this data can provide invaluable insights to business decision makers. However, unlocking the value of the data can prove to be cumbersome and expensive, data transformation workloads is a prime example. To meet this growing demand for a more effective Extract, Transform and Load (ETL) strategy, Dell has partnered with Intel, Cloudera and Syncsort to offer the first-of-its kind Reference Architecture (RA) for data warehouse optimization – ETL offload. The Dell | Cloudera | Syncsort Data Warehouse Optimization – ETL Offload Reference Architecture is designed as a cost-saving solution to provide a blueprint for building an environment that allows organizations to augment their Enterprise Data Warehouse (EDW).

With Syncsort’s DMX-h, users can begin developing Hadoop ETL jobs within hours, and the system can become fully productive within days by using a drag-and-drop interface rather than learning additional complex technologies.

Adding to this convenience, the SILQ offload utility helps to obtain drilled-down, detailed information about each step within the data flow, including tables and data transformations. This can reduce expert analysis from 20-plus hours to less than 30 minutes.

This new RA for ETL offload allows companies to reduce Hadoop deployment times, develop ETL jobs within hours, and become fully productive within days. In turn it can lead to lower data transformation costs and can provide operational efficiencies that lay a strong, cost-effective, secure and scalable foundation for managing data on an ongoing basis.

This new ETL Offload solution includes:

  • A use case-driven Hadoop RA to lower data transformation costs
  • PowerEdge™ R730 and R730xd servers, Dell Networking, Intel® Xeon® E5 2600 v3 processors, Cloudera Enterprise software, and Syncsort DMX-h and SILQ
  • A tested, validated, and certified solution allowing a Hadoop data warehouse optimization solution to be built from bare metal
  • A flexible configuration that scales to business needs
  • Greater processing power and capacity
  • Professional services and support

As data becomes an increasingly important part of doing business, an easy-to-use, cost-effective ETL offload solution becomes an imperative tool in any big data arsenal. The industry’s first and only solution to this challenge, the Dell | Cloudera | Syncsort Data Warehouse Optimization – ETL Offload, will be available in July.

You can learn more about the new Dell | Cloudera | Syncsort Data Warehouse Optimization – ETL Offload RA at our “Optimizing Your Hadoop Infrastructure” panel presentation at the Hadoop Summit on Wednesday, June 10, 2015 at 5:25 pm.

About the Author: Armando Acosta

Armando Acosta has been involved in the IT Industry over the last 15 years with experience in architecting IT solutions and product-marketing, management, planning, and strategy. Armando’s latest role has been focused on Big Data|Hadoop solutions, addressing solutions that build new capabilities for emerging customer needs, and assists with the roadmap for new products and features. Armando is a graduate of University of Texas at Austin and resides in Austin, TX.