Tiering is the ability to move data between different classes of storage to optimise user investment in performance. The original tiering solution was probably an IBM version of hierarchical storage management (HSM), dating back to the 1980s, which offered the ability to move files between disc and tape depending on activity levels. Later, more companies developed HSM products, but none was particularly successful. More recently, thin provisioning brought a refinement, where the management of data was at the block, rather than the file level. This block-oriented functionality provided a context in which much more fine-grained movement of the performance-critical disc data could occur.
The introduction of solid state drives (SSD) has brought an imperative need to dynamically place data. But the first approach to SSD sales was simply to tout how fast they were and expect users to purchase them and be suitably impressed. This approach was not as successful as SSD suppliers had hoped.
The SSD Challenge
SSDs bring two unique characteristics to enterprise storage: unprecedentedly high performance, and unprecedentedly high cost. Users want to use them to accelerate access to their most critical data, but cannot afford to relocate a significant percentage of their capacity to SSDs. They must decide on the most beneficial strategy for improving their workload service and move only that data to SSDs.
In trying to deploy SSDs, users face three immediate problems:
- They lack tools that can adequately identify the most active data.
- Even when they can determine the busiest data, they cannot always segregate it (such as the busiest database records from the rest of the database, or file system metadata from the rest of the file system).
- As conditions change over time, the best candidate for SSD data at any given time may not always be the best.
Tiering offers the prospect of solving these problems, and of giving OEM customers the opportunity to add more value than simply packaging storage devices and sticking their brand on the box.
The Tiering Solution
Assume a subsystem with a number of hard drives and several SSDs. In such a configuration, tiering software will monitor the activity to the drives and dynamically stage the busiest blocks to the SSDs, so that most disc activity will be serviced from the SSDs. Periodically, the tiering function will review the activity levels on the subsystem, move data that has become less active back to the magnetic drives, then replace it with what is now the most active data.
This addresses all three problems of the SSD deployment challenge. 1) Tiering has relieved the user of having to measure the disc activity and adjust the placement of busy data. 2) Since tiering works at the block level, movement is not restricted to file or database granularity. And 3) tiering is a subsytem function that runs constantly and dynamically readjusts the SSD/HDD location of data based on recent activity.
The Devil is in the Details
In theory, a tiering function as described above should enable the user to attain the maximum performance improvement out of an investment in SSDs. In practice, it is not quite that simple. It is not feasible to manage every block in a subsystem individually. It would require too much memory in the subsystem controller and impose excessive overhead.
To achieve a practical balance between overhead and improvement, the tiering function must optimise itself based primarily on two variables: 1) how frequently to re-evaluate staging of data between tiers, and 2) how much data to include in the minimum chunk of storage to monitor and move. A reasonable value for the latter might be something between 1MB and 0.5GB. The tiering service would then keep statistics on the activity of each 1MB, say, of disc space in the subsystem. Periodically, it would move those that showed the highest activity to SSD storage, de-staging back to HDDs the least active of the MBs already on the SSDs. (If there had been no write activity to a given MB, it could just be discarded, of course.) However, if this re-evaluation is done too frequently, the overhead of moving data back and forth between tiers could negate the performance benefit from having the data on the SSDs. Therefore, a frequency policy may have to involve dynamic decision making based on the overall activity level and the rate at which I/O peaks move from some MB chunks to others. Another difficulty is the complexity a tiering subsystem poses. Even though it relieves the user of the need to manage the placement of data, there is still a lot to oversee. Tuning these two aforementioned policies, as well as others related to getting the most out of a tiering solution, demands significant operator/administrator training. Experienced management is needed to set up and keep a tiering subsystem achieving the best possible performance.
The Need for a Third Tier
Tiering proponents once claimed that, with the improvement SSDs offered, storage could be simplified to an SSD layer (often called Tier 0) and a low-cost, high-capacity layer (Tier 2), eliminating the need for mission-critical drives altogether. Recent research and customer comments make the case for keeping a third layer (Tier 1) with high-performance hard drives to provide optimal performance on the data patterns not well serviced by SSDs: sequential reading and heavy write activity, sequential or random. In fact, it is actually a more general problem. A reasonable investment in SSDs leaves too much performance improvement left unresolved (or more specifically, left on hard drives). There is now solid evidence that some among our customers are quoting tiering subsystems in exactly this way, with three tiers to achieve optimal performance. In one case, when the IT manager asked for a tiering solution to solve a critical manufacturing system performance issue, the vendor quoted a three-tier system with over half the aggregate capacity being on 15K RPM drives, not the Tier 2 drives!
Tiering has evolved from fairly simplistic first efforts to something far more sophisticated and effective. While it has not eliminated all the management challenges associated with multiple performance levels of storage, tiering is proving to be an invaluable ingredient in a storage subsystem, giving the end user the best chance of attaining the performance benefit of investing in SSD drives.
However, there is no reason to think that this evolution is complete. Technology developments will continue to refine and improve user ability to attain maximum performance from a storage investment.