Demystifying Software-Defined Storage and Hyperconvergence

If you read the storage news these days you simply can’t miss a story around hyper-converged storage or yet another vendor looking to release a software version of its platform. If you believe Gartner, by 2019 about 70% of existing storage array products will become available in “software-only” versions. The information industry is quickly waking up to the fact that the thing that turns a cheap white box server into a branded product that commands high margins is the software. Increasingly, end users are looking to standardize on low cost servers in order to reduce operational costs and obtain better purchasing leverage to get better pricing. Some web scale customers do this to the extreme and from that the Open Compute Project was born.

To participate in this market,Computer applications different strategies have emerged by data storage technology companies and the borders between software-defined, hyper-converged, and commodity hardware have gotten blurred.

Before I delve into what’s out there, let’s define terms. Software-defined storage has been around for a long time. A software-defined storage solution provides a hardware agnostic solution to data management and provisioning based on storage virtualization. Said more plainly, software-defined storage takes a bunch of disks and processors and turns them into the functional equivalent of a storage appliance. This could be object, block or file based storage. Hyperconverged refers to the ability to run both your software-defined storage services (a virtualized storage appliance) and applications on the same servers. This could be a cluster of servers where direct attached hard disks and flash drives are virtualized and made available to applications running (potentially virtualized) on the same physical infrastructure.

“Commodity hardware” refers to servers that are built from commonly interchangeable, standards based, high volume components. Data storage companies are bringing all of these aspects together to build low cost, highly customizable alternatives to the legacy storage architectures of the past.

In EMC’s portfolio there are several unique and powerful software-defined storage offerings for object, block and (soon) file based storage. For today, I am focusing on the EMC® ScaleIO® product which enables a “Software-defined Scale-out SAN” by  virtualizing servers with DAS to provide block storage for applications running either hyper-converged or on separate sets of servers dedicated to storage and applications (“two-layer” approach). The EMC ScaleIO product was designed from day one to be a software-defined storage offering that takes any server hardware and pools its storage in scale-out fashion. What does it mean to be scale-out? Scale-out (as opposed to scale-up) means that the design center for the product is optimized to incrementally add capacity and compute. Scale-out storage products allow end users to start small, often times with only a few nodes (another term for servers) and incrementally grow as their business demands increase.

One of the advantages that EMC ScaleIO has over some of the other approaches to software-defined block storage is that it was designed for scale, performance, and flexibility out of the gate. ScaleIO is first and foremost a software product. As such, it can be easily applied to a wide variety of commodity servers allowing customers to avoid vendor lock-in, maximize their existing server vendor relationships, and pick and choose the storage media that meets their performance requirements. The ScaleIO product was also designed exclusively as a high performance block storage virtualization product, so it does not have to suffer from the performance overhead that comes with trying to take-on “multiple storage personalities”, which I will explain later. Finally, the ScaleIO team recognized the importance of platform choice and implemented support for a wide range of hypervisors and operating systems including integration with cloud management products like OpenStack.

Why the SDS Approach for Hyperconverged InfrastructureServers Conquers All
With the recent shift in thinking towards taking advantage of commoditization and convergence, many vendors are now competing in the hyper-converged storage market. There are several approaches they have taken: an appliance model, a layered model, or a hypervisor model.

Appliance Model:
The first approach, where vendors have taken an appliance model to the solution, has had moderate success. However, in an effort to rush to market, these solutions have made rigid assumptions around hardware choices and rules. These rules help when you are trying to force a quick solution into a new market, but ultimately they lead to pain for the end users. Rigid rules around how to grow your hyper-converged appliances, which components you have to use, flash to spinning disk ratios, and other “non-solutions” to engineering rather than customer problems are forcing these product vendors to rethink their approach. Many of them are now looking at how to take their embedded software and reengineer it to run on a wider variety of hardware vendor platforms. Ultimately, what they are finding is that what customers really want is the software bits and not the vendor lock-in. Unfortunately, systems designed to take advantage of hardware choice shortcuts aren’t so easily repurposed for a hardware vendor neutral world. Fortunately, EMC ScaleIO was built as a software product from inception. This means it can easily be adapted to hardware delivered solutions later, but will never have to worry about struggling to become a software-only product.

Layer Model:
The second approach is to take a layered model to building software-defined block storage services on top of object storage architecture. Now there is nothing wrong with using abstractions in any systems design – abstractions help to simplify things. The problem comes when you have a system that is designed to optimize around the underlying abstraction and not the service layered on top. It’s really hard to do a good job of building one data paradigm on top of another when the two are optimized for totally different parameters. For example, a block storage system should be optimized around maximum uptime, minimal resource utilization, and maximum performance even if it means taking advantage of more expensive media like flash for persistence. On the other hand, an object file system should be optimized for billions or even trillions of objects, geographic dispersion of data, and low cost relatively static data.  Layering a block storage system optimized for uptime and performance on top of a system optimized for object sprawl and low cost seem at odds with one another! That’s exactly what we see in practice; software-defined block storage built on object stores tend to be slow, consume a lot of resources, and require a lot of care and feeding into the underlying storage paradigm to keep operational. These offerings have been successful primarily because their business model is a freemium model that allows end-users to download and use the product without a support contract. The performance penalties and reliability issues have certainly not played in their favor. In order to make sure that end users have choices other than the current cumbersome freemium offerings, this summer EMC ScaleIO will be releasing the first “Free and Frictionless” versions of its product, designed to give anyone the ability to download and operate a software-defined SAN storage cluster, for an unlimited time and capacity, for non-production workloads.

Hyperconverged Model:
Finally, hypervisor vendors (of which there are only a few) have also jumped on the commodity bandwagon. The advantage of these solutions is that they are generally built into the hypervisor platform, which means that if you have the hypervisor platform deployed then you have a block storage virtualization product ready to go. Hypervisor clusters of servers tend to be small though and so while this can provide a quick and easy way to get going with block storage, they tend not to be a scalable, high performance, solution and as with solutions designed for a specific hardware platform, come with a level of rigidity. End-users that have a mix of Windows and Linux platforms, or may be looking to take advantage of less expensive virtualization platforms like KVM and OpenStack will find themselves limited by solutions that are built into a single vendor’s hypervisor. Once again, EMC ScaleIO addresses the needs of these end-users looking for choice of platforms, high performance, and massive scale while in some cases plugging directly into the hypervisor for optimal performance. While EMC ScaleIO can be deployed in conjunction with hypervisor platforms in a hyper-converged fashion, it is different from the hypervisor vendor solutions in that you aren’t forced to run hyper-converged. You can choose to deploy your storage servers and your virtualized application servers separately if that’s what suits your organization.

It’s no surprise given the rapid growth of the software-defined, commodity storage market that every large vendor and many more startups are introducing or tailoring their products for this new world. But the approach matters. Products designed with hardware constraints early on will have a real challenge trying to disentangle themselves from the assumptions that they have made. Products built with dual personalities that attempt to imitate one storage type on top of another will find themselves optimized for one thing while trying to deliver another, leaving end-users dissatisfied. And finally, hypervisor-based solutions, while simple to set up and integrated into the hypervisor, may work for some small deployments but will lack the flexibility and scale of a true software defined storage solution for the enterprise. Fortunately for end-users, the EMC ScaleIO software block storage solution avoids these limitations since it was born and raised in the software defined world.

About the Author: David Noy

David Noy brings 25 years of experience in the storage and data management industry. He spent nearly a decade leading engineering and product management teams for numerous companies, including Dell Technologies, NetApp, Veritas and Cohesity. Today, David leads two industry-leading technology divisions at Dell Technologies, Unstructured Data Storage and Data Protection, where he is helping to embolden innovation around data management and hybrid cloud; and driving advancement of holistic solutions to help heighten business success for customers worldwide.