Many of us fondly remember the traditional “15 squares puzzle.” With just one empty square, shifting pieces to new locations requires a fair amount of effort because there is so little free space. Having the puzzle fifteen-sixteenths full makes for challenging and entertaining gameplay, but it’s certainly not the kind of performance limitation you want in your solid state drive (SSD). Imagine if the same puzzle were only half-full with eight pieces. It could be solved almost instantly. More free space enables faster piece movement and task (game) completion.
SSDs work on a very similar principle. Visualize the NAND flash memory inside of an SSD as a puzzle, except that the amount of free space in a drive is not fixed. Manufacturers utilize various tactics to improve performance, and one of these is to allocate more free space, a process known as over-provisioning.
The minimum amount of over-provisioning for an SSD is set at the factory, but users can allocate more space for better performance. Either way, a moderate understanding of over-provisioning is necessary in order to make better SSD purchasing decisions and to configure drives in the most advantageous way possible for each unique environment and use case.
Background: The Nature of HDD vs. SSD Writes
Here’s another visualization exercise: Imagine having a 3,000-page encyclopedia that has been completely randomized. None of the entries are in order. The only feasible way of finding any given piece of information is through the table of contents, which keeps track of each entry’s location. Without the table of contents, the book becomes essentially unusable.
This model explains how information is generally written to hard disk drives (HDDs). Data gets placed wherever it will fit and often in a way that best assists read/write performance. The physical location of the data is not important because the master file table (or MFT, effectively the drive’s table of contents) keeps track of every chunk of data. Deleting a file from an HDD is unnecessary. Simply erase its entry in the MFT, and from the host’s point of view it’s gone. Only when new data physically overwrites the old is it truly gone, which is why forensic software can often recover “deleted” files from systems. The key point, though, is that the hard drive doesn’t care if there is data in sectors or not. The host only sees sectors in terms of occupied or available for writing.
SSDs work very differently. The fundamental unit of NAND flash memory is typically a 4 kilobyte (4KB) page, and there are usually 128 pages in a block. Writes can happen one page at a time, but only on blank (or erased) pages. Pages cannot be directly overwritten. Rather, they must first be erased. However, erasing a page is complicated by the fact that entire blocks of pages must be erased at one time. When the host wants to rewrite to an address, the SSD actually writes to a different, blank page and then updates the logical block address (LBA) table (much like the MFT of an HDD). Inside the LBA table, the original page is marked as “invalid” and the new page is marked as the current location for the new data.
Of course, SSDs must erase these invalid pages of data at some point, or the usable space on the SSD would eventually fill up. SSDs periodically go through a process called garbage collection to clear out invalid pages of data. During this process, the SSD controller, or flash controller, that manages NAND flash memory in an SSD, reads all the good pages of a block
(skipping the invalid pages) and writes them to a new erased block. Then the original block is erased, thus preparing it for new data.
Amount of Over-provisioning
All SSDs reserve some amount of space for these extra write operations, as well as for the controller firmware, failed block replacements, and other unique features that vary by SSD controller manufacturer. The minimum reserve is simply the difference between binary and decimal naming conventions. Many people are blissfully unaware that one gigabyte (GB) is precisely 1,000,000,000 bytes, and one gibibyte (GiB) is precisely 2^30 = 1,073,741,824 bytes, or about 7.37% more than a GB. Many people are also blissfully unaware that storage is properly measured in gigabytes, whereas memory is properly measured in gibibytes. Even though SSDs are built from NAND flash memory chips, they are marketed as storage devices, and SSD manufacturers reserve the extra 7.37% of memory space as a provision for background activities such as garbage collection. For example, a 128GB SSD will inherently include 128 * 73,741,824 = 94.4 million bytes of built-in over-provisioning.
A “Billion” Bytes of Storage
A “Billion” Bytes of Memory
|# of Bytes||
So even if an SSD appears to be full, it will still have 7.37% of available space with which to keep functioning and performing writes. Most likely, though, write performance will suffer at this level. (Think in terms of the 15 squares puzzle with just one free square.)
In practice, an SSD’s performance begins to decline after it reaches about 50% full. This is why some manufacturers reduce the amount of capacity available to the user and set it aside as additional over-provisioning. For example, a manufacturer might reserve 28 out of 128GB and market the resulting configuration as a 100GB SSD with 28% over-provisioning. In actuality, this 28% is in addition to the built-in 7.37%, so it’s good to be aware of how vendors toss these terms around. Users should also consider that an SSD in service is rarely completely full. SSDs take advantage of this unused capacity, dynamically using it as additional over-provisioning.
|True Physical OP*||7%||15%||25%||37%|
|SSD Physical Cap||Resulting SSD User Capacity|
Some SSD manufacturers provide software tools to allow for over-provisioning of drives by the user. Actually, even without special software, any user can set aside a portion of the SSD when first setting it up in the system by creating a partition that does not use the drive’s full capacity. This unclaimed space will automatically be used by the controller as dynamic over-provisioning.
There is one obvious drawback to over-provisioning: the more unused capacity one reserves to increase writing speeds, the less capacity there is for storage. With hard drives, the somewhat similar practice of short stroking, which confines reads and writes to the fastest outer tracks of the drive platters, is less penalizing because the cost per gigabyte is lower with magnetic media. At $1 to $3 per gigabyte for enterprise-class SSDs, the decision to give up 25% or more of a drive’s capacity becomes more difficult. There had better be some real benefit when giving up those expensive bytes!
In fact, there are plenty of benefits—under the right circumstances.
Over-provisioning, Random vs. Sequential Writes, and Entropy
The above graph represents testing conducted by Seagate using an SSD based on Toshiba 24nm MLC NAND flash memory and a Seagate® SandForce® SF-2281 Flash Controller with DuraWrite™ data reduction technology. There is more to consider here than may be initially apparent. Let’s explain several key elements about sequential vs. random data patterns.
When an SSD arrives new from the factory, writes will gradually fill the drive in a progressive, linear pattern until the addressable storage space has been entirely written. Essentially, this reflects an ideal sequential writing condition. No garbage collection has swung into play at this point, and the little pockets of invalid data caused by deletions has yet to impact performance because there has been no need to write to those pockets with new data.
However, once garbage collection begins, the method by which the data is written – sequentially vs. randomly – begins to affect the performance. Sequentially written data from the host will constantly fill whole flash memory blocks, and when the data is replaced it generally replaces the entire block of pages. Then during garbage collection all pages in that block are invalid and nothing needs to be moved to another block. This is the fastest possible garbage collection – i.e., no garbage to collect. The horizontal lines in the “sequential write graph” show how sequential write performance stays relatively constant regardless of how much over-provisioning is applied.
What does affect performance is the entropy of the data, provided the SSD is using a flash controller that supports a data reduction technology, such as a SandForce Flash Controller. The entropy of data is the measure of the randomness of that data, not to be confused with the data being written randomly vs. sequentially. For example, a completely encrypted data file, an MPEG movie, or a compressed ZIP file will have the highest entropy, while database, executable, and other file types will have lower entropy. As the entropy of the data decreases, the write reduction-capable flash controller will take advantage of the lower entropy and provide higher performance. However, the performance remains constant with a given over-provisioning level when written sequentially.
In contrast, when data is written randomly to the SSD, the data that is marked invalid is scattered throughout the entire SSD creating many small holes in every block. Then when garbage collection acts on a block containing randomly written data, more data must be moved to new blocks before the block can be erased. The red line of the Random Writes graph (above) shows how most SSDs would operate. Note that in this case, as the amount of over-provisioning increases, the gain in performance is quite significant. Just moving from 0% over-provisioning (OP) to 7% OP improves performance by nearly 30%. With flash controllers that use a data reduction technology, the performance gains are not as significant, but the performance is already significantly higher for any given level of OP.
Over-provisioning and Write Amplification
As mentioned earlier, SSD writes generally involve writing data more than once: initially when saving the data the first time and later when moving valid data during multiple garbage collection cycles. As a result, it’s common for more data to be written to an SSD’s flash memory than was originally issued by the host system. This disparity is known as write amplification, and it is generally expressed as a multiple. For instance, if 2MB of data is written to flash while only 1MB was issued from the host, this would indicate a write amplification of 2.0. Obviously, write amplification is undesirable because it means that more data is being written to the media, increasing wear and negatively impacting performance by consuming precious bandwidth to the flash memory. Several factors can contribute to write amplification, chief among these being the percentage of data written randomly vs. sequentially.
Surprisingly, it is also possible to write less data to flash than was issued by the host. (This would be expressed as a write amplification of, say, 0.5 or 0.7.) DuraWrite data reduction technology is probably today’s best-known method of accomplishing this through real-time data manipulation. Only SSDs with a similar data reduction technology can create a write amplification of less than one. As the entropy of the data from the host goes down, DuraWrite technology results in less and less data being written to the flash memory, leaving more space for over-provisioning. Without a similar data reduction technology, an SSD would be stuck with higher write amplification.
Note that additional over-provisioning and a data reduction technique such as DuraWrite technology can achieve similar write amplification results with different trade-offs. Benchmarking reveals that only drives with DuraWrite data reduction technology or something similar are able to take advantage of entropy-related write amplification reduction and the resulting performance improvements. Conventional SSDs without a similar technology are limited to the write amplification from a given over-provisioning level. As an example, a conventional SSD with 28% over-provisioning will exhibit the same write amplification (3.0) as an SSD with DuraWrite technology writing a 75% entropy stream with 0% over-provisioning, all other factors being equal. In other words, this scenario shows how an SSD equipped with DuraWrite technology could display the same level of write amplification as a standard SSD while reclaiming 28% of the storage capacity.
The Next Efficiency Level
An SSD does not natively know which blocks of data are invalid and available for replacing with new data. Only when the operating system (OS) tries to store new data in a previously used location does the SSD know that a particular location contains invalid data. All free space not consumed by the user becomes available to hold whatever the SSD believes is valid data. This is why the storage industry created the TRIM command. TRIM enables the OS to alert the SSD about pages that now contain unneeded data so they can be tagged as invalid. When this is done, the pages do not need to be copied during garbage collection and wear leveling. This reduces write amplification and improves performance. The graphic below shows how much of a difference TRIM can make in allowing more capacity to be available for over-provisioning.
TRIM is yet another method that vendors can employ to boost over-provisioning, thereby increasing performance and drive longevity. It shows a more preferable way to reclaim SSD capacity for acceleration compared to forcing drives to permanently surrender large swaths of their capacity. Using TRIM with DuraWrite technology, or similar combinations of complementary technologies, can yield even more impressive results.
Buyers should take a close look at their workloads, assess the typical entropy levels of their data sets, and consider which SSD technologies will provide the greatest benefits for their invested dollars. By reducing write amplification and employing technologies that make SSD operation ever more efficient, buyers will not only get more storage for each dollar, but that storage will perform faster and last longer than other options could possibly provide.