Garbage Collection & XtremIO – Fact and Fiction

One of the eye opening claims we made during our launch on November 14th was that the XtremIO array doesn’t have any system-level garbage collection processes.  In the coverage and chatter that followed our launch we noticed that some people interpreted this to mean that the flash in our arrays was somehow impervious to the need for garbage collection, which of course is impossible.  To be clear, all flash requires garbage collection.  What matters is where and how it is performed.  With XtremIO, performance is always consistent and predictable because garbage collection is handled in a very novel way, only possible with XtremIO’s unique architecture.

So let’s discuss the “dirty” issue of garbage collection – and how XtremIO is the only all-flash array that requires no system-level garbage collection yet maintains consistent and predictable performance.  But before we begin, let’s first define what garbage collection is and does.

One of the ways flash is different from disk is that with disk new data can literally be written right on top of existing data.  The HDD head just seeks to the location to be overwritten and re-magnetizes the media.  With flash, existing data must first be erased (a very slow operation) and then new data can be reprogrammed into those flash cells.  Exacerbating the problem is that you can’t just erase precisely what you want to.  Flash operates in so-called “erase blocks”.  Imagine an erase block that is 256KB in size.  To replace only 8K of the 256KB erase block, the entire 256KB must be read and buffered while the erase block is erased, and then a revised 256KB can be written.  Having to read and write all that extra data during these overwrite operations is called write amplification – and with flash having a limited number of program/erase cycles before it wears out, write amplification is bad.

For a more detailed explanation, check out Wikipedia.

So we’ve addressed that garbage collection does need to take place in XtremIO arrays, as it must in all flash devices.  But as we’ve said, it’s where and how garbage collection is performed that matters.  And to understand why, we have to go back for a history lesson.

Back in 2007-2008, SSDs were tiny in capacity, expensive, error prone, and subject to all kinds of performance issues when their internal garbage collection processes turned on.  Specifically the performance might suddenly drop by wide margins, response time could fluctuate wildly, and sometimes the entire SSD would stop responding for several seconds.  SSD controller technology of the day was relatively immature and if you set out to build an enterprise flash array at that time, you might conclude that SSDs are not to be trusted and you needed to prevent them from doing their own garbage collection at any cost.  So how do you do this?  By writing to the SSD as if it were an HDD – in large sequential streams enabled by log structuring in the array controllers.  Log structuring has been used in storage arrays for decades to gang random writes that would cause HDD heads to seek wildly, and reorder them into sequential I/Os that can be streamed to the drive without large head seeks.  You get a huge performance boost when the array is empty and space is easy to find.  But as the array fills, free sequential space is harder and harder to locate.  And hosts eventually begin to overwrite existing data, which must be invalidated and cleaned from the array.  This is where garbage collection comes in.  That’s right – even disk arrays have garbage collection processes.  And it’s why many arrays work great out of the box and slow down substantially as the fill up.  It’s a well-known effect that any storage admin can vouch for.

If you choose a log-structured approach for your flash array, you can effectively stream data to the SSDs, but eventually you have to garbage collect out free space for incoming writes.  With the poor SSDs (and we use this term to mean both SSDs like in the XtremIO array as well as flash controller ASICs and FPGAs that are used in conjunction with proprietary flash form factors) circa 2007-2008 many vendors decided to write their own garbage collection algorithms, figuring they could do a better job.

XtremIO was based in Israel and we were incredibly lucky to have a tight engineering relationship with an Israeli SSD supplier well ahead of its time.  Their SSD was vastly superior to others on the market.  It exhibited completely consistent performance, no latency variation, and no “hiccups” in its response time.  XtremIO made a bet that turned out to be wise.  We bet that when our product came to market, other SSD vendors would catch up and there would be several SSDs available to us with similar characteristics.  Our gamble paid off.  Rather than invest time trying to outsmart the garbage collection in the SSD, we bet that we should trust it.  We bet that an industry full of SSD suppliers, each with an army of engineers working on better and more sophisticated garbage collection algorithms, and with specific knowledge of the flash inside their SSDs – could design better garbage collection algorithms than we could.  This let us free XtremIO’s design from log structuring and system level garbage collection and truly leverage the random access nature of flash.  In other words, why sequentialize accesses to flash when it is a random access media?

The array controllers in an XtremIO system do not garbage collect at all.  With 25 SSDs (per X-Brick) garbage collecting as needed (and transparently to the XtremIO controllers), they do not need to.  After all, the SSD manufacturers know their “flash” best.  The drive controller inside the SSD is best equipped to garbage collect the media.  Knowing this, and relying on it spares the array storage controllers from spending premium-processing cycles (and back-end I/O on the array) on garbage collection.   Over tens of thousands of hours of rigorous testing our SSDs rarely “hiccup” (and if it does happen, our dual-stage metadata engine and XDP handles it efficiently) providing an XtremIO user the industry’s most consistent performance over years of heavy use.

By contrast, every company that invested heavily in system-level garbage collection continues to solve yesterday’s problem – a problem that has ceased to exist just as the heavy CRT television or the immobile rotary telephone are both relics of time gone by.

This historical anecdote is not widely known outside of XtremIO.   Much better known is the tale of an adopted boy from California who, similarly inspired by a peek at the future at a Xerox PARC lab demonstration, went on to change computing with the best user interface the world had ever known and, most coincidentally, also acquired the very SSD company that XtremIO is so indebted to.  Could it be that EMC also glimpsed the future when acquiring XtremIO?

Now consider a couple of advantages of our unique architectural choice:

  1. We can easily swap out our SSDs for the latest and the greatest SSDs from virtually any manufacturer that meets our reliability and performance metrics.  We have no platform dependent code because we do not need to customize garbage collection based on the specific flash architecture or firmware within an SSD.  We simply qualify the industry’s best SSDs and bring them to market quickly while others must adapt their garbage collection for every new SSD they consider.  Hence, while others focus on garbage collection (and make no mistake, once you’re down the system-level garbage collection path, it is woven into your architecture and cannot be removed – just as XtremIO couldn’t add it), we perfect the all-flash enterprise array.
  2. XtremIO is leveraging 25 ASIC-based garbage collection engines in every X-Brick, courtesy of the SSDs.  We don’t try to perform the work of these 25 ASICs in our array controllers.  Our controllers, offloaded from unnecessary garbage collection overhead, are dedicated to serving host I/Os 100% of the time.  Whether the XtremIO array is 1% full or 99% full, whether we’re five months into production or five years in, our controllers perform identical operations for host I/Os.  There is never any garbage collection penalty or unpredictability about when system-level garbage collection will need to run or the effect it will have.  If you visualize how this looks when testing array performance, XtremIO has no wide swerving “S” curve like below, where IOPS suddenly drop and latency suddenly increases:

Performance Snake

With enough “soak time” under constant I/O load in typical enterprise data centers other all-flash-arrays substantially degrade in performance. How would you feel about your virtual servers slowing down after a few months of use?  Or your OLTP workload suddenly dropping in transactions per second?  IDC has compiled a great set of recommendations on how to test an all-flash-array so that these effects are visible upfront during the evaluation phase.

Does this sound too good to be true? After all, if it’s as simple as just letting modern SSDs do their job, why can’t other products perform like XtremIO?  The answer is that modern SSDs are a necessary, but insufficient technology to enabling consistent and predictable performance.  You also have to have the right physical architecture and metadata model (to allow freedom of data placement into any available free space in the SSDs) and the right data protection model (optimized for partial stripe updates that dominate as arrays fill up).  Only XtremIO has these.  You can learn more about them here:

XtremIO Content Addressing

XtremIO Dual-Stage Metadata

XtremIO Data Protection (XDP)

XtremIO In-Memory Metadata

XtremIO controllers are freed from the task of garbage collection and everything else in our architecture (true inline deduplication, in-memory metadata, XDP) relieves the SSDs of a huge percentage of writes – and by extension – a lot of garbage collection, too. Through elegant engineering we are able to strike a fine balance that cannot be achieved by others.

In summary, we hope you’d agree that our historical decision to avoid system-level garbage collection was a wise one.  It saved us from developing and maintaining redundant technology.  It allowed us to innovate in several other areas of the array such as eliminating log-structuring, developing superior metadata models, and improving on decades old RAID.  And it makes XtremIO arrays the most consistent and predictable performers on the market.  So while, garbage collection is a necessity with flash, it’s not a necessity for the array controllers with the right architecture…XtremIO.

About the Author: Dell Technologies