Breakfast with ECS: Searching for Your Competitive Advantage

Welcome to another edition of Breakfast with ECS, a series where we take a look at issues related to cloud storage and ECS (Elastic Cloud Storage), EMC’s cloud-scale storage platform.

Breakfast wtih ECSThe ECS storage system is designed to hold billions of objects, each uniquely named, with a name that is meaningful to the application and which is often used to organize the stored information.  Along with the name of each object, each object can have arbitrary metadata tags associated with it, making the object fully self-describing and usable not just for the application which wrote it but also for other value-added applications which can access the object.  However, with such a large volume of data, there are typically multiple ways of organizing it depending on the use case, and the limitation of each object having one, unique name disallows applications from exposing and leveraging these multiple views of the data.

What’s unique about metadata with ECS?

ECS is now the industry’s first object storage platform with an integrated metadata index and search capability, providing applications a way to present multiple views of the same data set and allowing similar data objects to be grouped together under a common identifier.

For instance, in the case of a medical image repository, the unique name of the medical image may be an opaque identifier, with additional tags HTMLspecifying the name of the patient, the date/time of the image, the name of the test, the doctor that ordered the test, the id of the machine taking the test, notes from the test review, the diagnosis, the location where the test was performed, and more.  With ECS, specific metadata can be automatically indexed, allowing for quick searches based on any of the attributes.  In the situation mentioned above, a user could quickly enumerate all tests that were performed at one location as part of a decision to expand the site or not, rather than having to laboriously search every object and filter out the results from non-relevant locations.

With ECS, data organization becomes fluid, able to respond to the needs of various different audiences.  Doctors can view information based on the patient name, while specialists in a particular area could view information based on a particular test, or an epidemiologist could view information from particular locations to look for environmental influences on health – all based on the same underlying data, but with a view tailored for each audience.

Visit the landing page for EMC Elastic Cloud Storage (ECS) for additional information and detailed content.

How does this work?

In ECS, the metadata index is configured at the bucket level, allowing different applications to establish indices that are specific to the individual data set being stored.  Up to five different indexable metadata keys may be specified per bucket, and objects stored in the bucket are automatically scanned and the appropriate indices updated, based on the presence (or absence) of the particular metadata key on the object.  It is important to note that the application does not need to change to populate these indices – the scanning and indexing is all performed automatically by ECS – so as long as the system administrator knows the metadata items to index, the metadata search and index feature brings value immediately, even to existing applications.

Querying the index

At any point, an application can query the index by doing a GET on the bucket with the ?query operator, specifying the indices to search and one or more conditions which must be satisfied, e.g. “machine_model==PS56923 and date>=20150101 and date<=20150131” to quickly find all tests run on a particular machine model during January, 2015.

The searches can range from a simple, single attribute query, such as “test==glucose_level”, to a more complex version using “and” or “or” operators to query multiple attributes simultaneously (though “and” and “or” cannot be intermixed in a single query.)  For large data sets, the results support pagination and also sorting, so a large result set can be returned in a friendly format.

Combined with the other features of ECS, metadata index and search provide a powerful mechanism to derive the most value from the data.  Applications can leverage this feature to provide search capabilities to their end users with a shortened time to market and without the need for an external database to manage the indices and without additional hardware or storage for the database, while analytics users can avoid transforming the data to organize it by a different attribute, instead using a metadata search as the first step in identifying an analytics data set.

As data volumes continue to grow, native index and search capabilities will become a hard requirement for an enterprise storage product.  ECS’s industry leading technology and architecture were created to handle such next generation use cases and the new metadata index and search feature is a perfect demonstration of the flexibility of the technology.

Want more ECS? Try the latest version of ECS for FREE for non-production use by visiting www.emc.com/getecs.

About the Author: Mark O'Connell