When Six Nines Availability is Simply Not Good Enough

Topics in this article

In this new world of the Internet of Things (IoT) where social media and mobile is king, data is expected to be always-available and data centers always-on. It is an arduous task for IT departments to procure the right products. It is even harder to create the right infrastructure and offer highest availability while adapting to constant change. When we in Engineering meet customers and talk about customer use cases, we respond to two key questions frequently:

  • How do we stress and validate the vast portfolio of EMC products as a single system before it reaches customer’s hands?
  • Do we understand the customers’ use-cases and challenges?

Relative to the first question, we anticipated these challenges a few years back and invested in the Mission Critical Center or MCC.

MCC Blog 1A truly unique initiative, MCC is a fully operational customer-like environment within the Engineering walls, running real enterprise applications across three data centers and subject to highly accelerated stress and fault injection.

This EMC competency center works just like any other customer datacenter, following accepted IT business practices for capacity planning, change control and maintenance windows. It has round the clock monitoring, is globally managed, and is escalated through EMC’s Customer Support. The datacenters currently host four enterprise applications with 1.5PB of data serving more than 1000 simulated users. The environment is never stagnant and changes continuously, growing through incremental hardware and software addition, updated through tech refreshes and NDUs, and replicated through migrations and restores. You can listen to more details on the infrastructure and configuration in this video:

https://youtu.be/cFr7jTLyzIY

Now on to the second question: Customer use cases
We meet with customers regularly to understand their business and IT challenges. We also work closely with EMC’s Global Services and Support teams to understand new changes and requirements in the field. Our own IT is also a great learning source. These real life customer scenarios are adopted into the MCC Lab and accelerated to simulate decades of customer-like operations. We shake them out using home-grown and industry-standard off the shelf tools, orchestrate degraded operations, and inject hardware and environmental faults. In short, MCC is where EMC best practices get the worst-case treatment.

During these projects, we track availability across the entire infrastructure as one system. No one in the industry measures this level of availability from the application perspective. This helps us look at problems holistically, not only from a product perspective, but from the perspective of the complete solution that the customer deploys. This also teaches us the impact our support processes have from a customer perspective.

And the last but the most important: How do we measure success?
Success is measured through our customers’ eyes by maintaining uptime, maximizing run hours, and preventing customer outages. MCC feeds our findings and discoveries back into Engineering and Support teams for early prevention and correction. Keeping our customer’s data centers un-interrupted with no Data Unavailable or Data Loss outages is our true metric.

The Mission Critical Center reduces risks and cost as well as increases trust and availability for EMCs portfolio. We have put together an infrastructure that helps run customer’s mission critical applications on EMC solutions for many, many years to come.

About the Author: Ramesh Balan

Topics in this article