Data Lakes Redefine Storage Infrastructure & Hybrid Clouds Come of Age


More than ever, I’m excited by the changes occurring in the IT landscape. This year, I see two major technology inflection points occurring simultaneously: data lakes and hybrid clouds.

The reason I believe this will hold true is because of the vast amount of data that has been and continues to be generated – driving demand across all aspects and types of storage. Specifically, unstructured data is seeing storage growth more than double every two years, while structured and semi-structured data is experiencing 20%+ annual growth.

This growth is being experienced across all industries, from financial services and life sciences to healthcare and manufacturing. And, organizations’ increasing reliance on data for intelligence-based decisions makes storage the most essential component of the infrastructure stack.

This level of data growth, combined with its importance as a corporate asset, leads businesses to look at means of reducing the aggregate cost and complexity of storage while ensuring there is no compromise on scaling of capacity or the performance demanded by the current and future applications that call on that storage.

The need for balancing cost and performance will lead to enterprises looking for innovation in their storage resource management and in their storage infrastructure encompassing cloud, file, transactional and analytics workflows.

Vast binary code Sea

Data Lakes Replace Silos

From my vantage point, I see little doubt that the transition from traditional silo-based storage infrastructures to consolidated data lakes that are managed through intelligent software and that can scale to meet massive data growth and performance demands will become commonly adopted.

The data lakes, with the support of Hadoop, will enable organizations to obtain value from the vast volumes of data stored in their data lake foundations. Data lakes will drive workflow optimization within the enterprise and provide for an economical means of managing massive amounts of data and obtaining value from it.

But the transition to data lakes isn’t one that can be taken lightly. It will require a level of planning and analysis to ensure the foundational data lake architecture is aligned to the organization’s data-types and workflows.

Some organizations may incorporate multiple types of data lakes in their enterprise – a data lake for ultra-high performance transactional and analytics workflows; an Exabyte-scale geo-disbursed object data lake; a file-based multiprotocol data lake; or even a hot edge cold core data lake that combines ultra-high performance rack scale flash architectures at the edge and high capacity geo-scale platforms at the core.

From a vendor’s perspective it means that we’ll need to provide you with choice and flexibility in scale-out data lake architectures and products, and deliver solutions that encompass block, file, object and analytics workflows. It’s an exciting time to be overseeing the Emerging Technologies Division of EMC!

From the Data Lake to the Cloud

Most enterprises have embarked on a path to a cloud infrastructure for compute, storage, or both. In parallel with data lakes, 2015 will be the year that the hybrid cloud emerges as the dominate enterprise cloud storage strategy, leveraging external providers for bursting and archival repositories from primary on-premise storage.

What’s been limiting the hybrid cloud approach is the missing intelligent software management layer that’s needed to seamlessly orchestrate and integrate from the enterprise to the cloud. This is an area where we’re eagerly investing our engineering resources for the simple reason that with this resource management layer, the reduction in management personnel and floor space, as well as decreases in power and cooling resources in the data center is finally realized, with substantial cost savings that justify the shift to a hybrid model.

Hybrid model is not just about storage, it is about enabling our customers to build their own cloud computing footprint and deliver infrastructure as-a-service that is fully interoperable – not just compatible – with the leading cloud services. The future of these extensible cloud solutions will be based on OpenStack technology, the fastest growing open source cloud platform on the planet.

So there you have my top storage infrastructure predictions for 2015. Have comments or 2015 predictions of your own? I’d love to hear from you.


Continue Reading
Would you like to read more like this?

Related Posts

Click to Load More
All comments are moderated. Unrelated comments or requests for service will not be published, nor will any content deemed inappropriate, including but not limited to promotional and offensive comments. Please post your technical questions in the Support Forums or for customer service and technical support contact Dell EMC Support.
  • Sankar Prabhukumar

    Hi CJ, Couldn\’t agree with you more. Just one comment…leveraging commodity solutions where it makes sense would be critical to getting a better handle on costs and scale. So, there will an increasing shift towards commodity. However, the world is not black and white…critical applications will continue to operate on enterprise platforms for the foreseeable future. Companies that offer capabilities to seamlessly manage enterprise AND commodity solutions would be in a much better position to lead customers towards data lakes and hybrid clouds.

    • I can’t agree more. There will always be a need for specialized systems for critical applications and for those requiring ultra-high performance (say like real-time analytics). But one item of note here, commodity hardware does not mean it is not enterprise grade. A great example is ECS Appliance which is built on enterprise grade commodity hardware so customers get the best quality product at the most optimum price point. They also have the option to buy their own commodity hardware. That said, I believe that an enterprise environment will always be a mix of commodity and specialized hardware tailored to meet varying business needs.

  • Sumit Nigam

    Hi CJ – Agree with you completely. I have a few observations –

    1. You mentioned about the missing intelligent software mgmt layer to integrate enterprise to a public cloud. VMware vCloud Connector should be addressing a lot of those gaps? You mentioned about EMC investing its resources in this area. I am missing the point somewhere. What would EMC be providing in addition to the connector from VMWare?

    2. Hybrid clouds always bring about an interesting concern which is that applications require architecting to handle cloud level latencies and concerns which are not really a big problem within a LAN setup because the network is within enterprise\’s control and its characteristics are well known. Ability to partition workloads still remains a good challenge for enterprises. Cloud bursting as a term may have gained prominence but in summary, is just intelligent load balancing.

    3. Data lakes are the future, no doubt. A very common point I hear from enterprises who would like to explore data lakes is in understanding its true merit. The idea of keeping entire data/ transformation lineage makes sense to most but beyond that people are left wondering if there are software stacks that companies such as EMC are building that would make data lakes configurable through some button clicks itself. The confusion rests around what in addition to their ML/ big data setups do they need to do to turn it into a data lake.

  • Vinay Marwaha

    Hi CJ, Adapting a Hybrid Cloud model has been a challenge for enterprises because of regulations and slow internet. How you think this will change in 2015, given the ever increasing threat from hackers, increased regulations in different industries, and skepticism around net-neutrality.