The World of Storage Has Changed: Data Types, Access Models, Demand and Use Cases
Over the past decade, the explosive growth of large-scale, data-driven applications has begun to shift the nature of enterprise storage infrastructure fundamentally. The traditional paradigm of hardware-centric, file-based systems is moving aside to make way for new software-defined, object-based approaches.
The new paradigm is an object-oriented one: a world of pictures, movies, ecommerce and Web data, search, and games, and archives of all of these. In this world, objects (information) are written, read and deleted but never modified. Increasingly, systems and data centers are designed for capacity rather than performance. Tiering and distribution have become imperative. Analytics have become both routine and essential—on demand, in real time, across any dimension and any heat of data.
In addition, because Ethernet is the undisputed fabric of data centers and application traffic, it has unsurprisingly emerged as the backbone of storage infrastructure. This can be seen by the phenomenal growth of virtual server and desktop infrastructure, as well as the rampant adoption of Amazon Web Services to validate this latter point.
This growth is driven by the increasing prominence of mobile, social, cloud computing and big data. These applications rely on data that is primarily unstructured (or semi-structured), and easy and inexpensive to create. As a result, the value of analytics, regulation and personal expectations for data preservation will fuel the growth in storage infrastructure now and in the foreseeable future.
The combination of these factors redefines storage devices, interfaces and full-stack architectures. In order for the industry to achieve the growth demanded to support these storage demands, we must strip out layers of inefficiency from legacy architectures and introduce a new approach optimized for scale-out application and data center needs.
The Seagate Kinetic Open Storage developer tools do exactly this. It implements and enables the most efficient stack (devices, protocol, interface, software, systems) to optimize for current and future application demands, uniquely enables object-oriented applications to take direct advantage of storage, and fuels scale-out data center innovation. In doing so, it enables significant gains in performance, manageability and total cost of ownership (TCO).
The Seagate Kinetic Open Storage Data Center vs. the Traditional Model: Disintermediation, Disaggregation, Security
The Seagate Kinetic Storage platform represents an opportunity to substantially address the inefficiencies of traditional data centers whose legacy architectures are not well-adapted to highly distributed and capacity-optimized workloads of exploding unstructured data and applications.
Current data centers are characterized by multiple layers of software and hardware stacked together in order to enable a data path between two poorly compatible systems: an object-oriented application layer and a hardware layer (spanning HDDs, SSDs and tape) based on block-storage. The transit path from application to storage requires multiple layers of manipulation from databases, down through POSIX interfaces, file systems, volume managers and drivers. Information passes over Ethernet, through Fibre Channel, into RAID controllers, SAS expanders and SATA host bus adapters.
For example, a traditional stack might look something like Figure 1, below:
Figure 1: Model of the Traditional Storage Stack: Server, Storage Server and Devices
Beyond the obvious inefficiency of having to move through multiple layers, this model relies on an outdated assumption about the operation of local storage devices, which was organized close to, and based on, the physical attributes of a device. This has all changed. However, the software stack has not adequately evolved.
The majority of today’s mass scale object applications do not need either file semantics or a file system to determine and maintain the best strategy for space management on a device. Modern applications only need object semantics (e.g., write the whole thing, read the whole thing, delete the whole thing, refer to it by a handle chosen by the client and cluster manager), not where data resides on a given device.
In order to manage this complexity, an entire ecosystem of storage server technology providers (both hardware and software) has developed purely to abstract it from both the device and the application layers. Not only is this inefficient, it also introduces additional barriers between the two sides that can impede surfacing of storage features and functionality.
What if we could start over and restructure the stack from the bottom up? What would it look like if object-oriented applications could connect directly to in the language of the storage device? Well, it would look like the Seagate Kinetic Storage platform.
The Seagate Kinetic Open Storage Platform
Seagate Kinetic Storage is:
- A new class of key/value Ethernet drives plus developers tools that include an open application programming interface (API) and associated libraries
- Designed to provide the simplest semantic abstraction and enable the broadest set of applications through easy-to-use APIs
- An efficient platform to maximize innovation
Together, these pieces enable applications to target storage devices directly and take best advantage of storage features. Drives communicate in keys and values. For example, they do gets, puts and deletes. They allow applications to distribute objects and manage clusters while letting the drive efficiently manage functionality, such as
- Managing key (object) ordering
- Quality of service
- Policy-based drive-to-drive data migration
- Handling of partial device failures and other management
- Data-at-rest security
So, in contrast to the traditional stack described above, the Seagate Kinetic Storage stack might look like Figure 2.
Figure 2. The Seagate Kinetic Storage Stack
The Seagate Kinetic Storage model has a number of significant and exciting implications. For example:
- Superfluous layers of legacy software and hardware are removed.
- The need for the traditional storage server tier is obviated.
- Storage can truly be disaggregated from compute.
- Racks can be more dense.
- Fans are minimized.
- Data traffic leverages the existing data center transit fabric (Ethernet).
- Data center operational management is simplified and both cost- and risk-reduced.
Scale-out is simplified, cost-effective and unconstrained by legacy architectures and infrastructure. Information is now just an IP address away.
Segate Kinetic Open Storage platform APIs
APIs are designed to provide developers with direct access to essential and optimized storage features and functionality in an open and extensible way. Developers using multiple software stacks—open source and proprietary—and, working within the data center architecture, can build upon the Seagate Kinetic Storage foundation to tackle the most difficult storage challenges. The design is intended to fuel open innovation in software, enabling unique problems to be solved flexibly and optimally.
Furthermore, many long-desired capabilities of hard disk drives are now possible. For example:
- Data Sharing—Data sharing between drives has actually been very difficult. With Kinetic Storage APIs, data can be easily shared between applications from multiple sources. One application can write a key and value to a drive, while another has the ability to read the data.
- Drive-to-Drive Data Transit—Traditionally, moving data from one drive to another required routing it through expensive storage servers. With Kinetic Storage APIs, data can now be moved directly between drives with peer-to-peer data copy commands where ranges of keys can be moved between drives.
- Data Integrity—Unfortunately, silent data corruption is a fact of life. With Kinetic Storage, data can be stored with comprehensive end-to-end integrity checks that ensure the data was received at the drive correctly, allowing the drive and the ultimate recipient to be able to guarantee that the data is still correct.
In addition to these values, the Seagate Kinetic Storage key/value semantic abstraction enables drive innovation (e.g., in media technology, sector size) to advance in parallel with, and independently of, software innovation above the Seagate Kinetic Storage layer. Developers no longer have to implement changes in software in order to benefit from underlying drive technology advances. The application simply operates on defined keys and values, and the drive executes seamlessly and optimally behind Seagate Kinetic Storage.
The Seagate Kinetic Open Storage Value Proposition: Performance, Scale, Simplicity, TCO and Security
The Seagate Kinetic Storage platform is architected to enable simple, flexible storage performance and scaling. It delivers optimal TCO for data center storage providing savings both in capital outlays and operational expenses.
By design, Seagate Kinetic Storage-supported drives are native key/value stores. This shifts the burden of maintaining the space mapping of a device from a file system to the drive itself. Applications can put and get objects; they no longer need to guess at LBA layout or prescribe data location. This shift largely eliminates a very significant amount of drive I/O that moves no data but rather represents metadata- and file system-related overhead.
There is also incremental benefit here for scaling: As both device manufacturers and cloud data center operators ramp up device capacity as aggressively as possible, the increased I/O efficiency—and resulting net I/O utilization—enables a more balanced scaling of I/O and capacity, in addition to absolute performance on a given device and across a Seagate Kinetic Storage cluster.
Incremental downstream performance gains come from the improved manageability enabled by the key/value semantic abstraction. For example, this abstraction allows for graceful handling of device failures, including partial failures, in some cases without the corresponding extensive rebuild times characteristic of large-capacity drives.
The Seagate Kinetic Storage platform is uniquely optimized for explosive-growth, scale-out data centers. The Seagate Kinetic Storage architecture with its disaggregation of storage from compute enables cloud data center operators to simply add storage as the need for capacity grows. Additionally, the combined impact of Ethernet connectivity and the key/value API command structure enables incremental capacity to be scaled in a highly distributed manner with the replication of data directed from drive to drive with minimal incremental system and capex cost.
Simplicity, Ease-of-Use Adoption
Customers can build their own management applications or call the drive directly using the Seagate Kinetic Storage APIs. The APIs are designed to enable rapid integration into a wide variety of storage software applications. Additionally, Seagate Kinetic Storage devices can be discovered dynamically, enabling adoption into existing data centers and a heterogeneous environment evolution.
Specific Seagate drives are provided with a comprehensive user-space library that allows applications to access the drive directly. This library provides the complete interface to access the data and to manage the drive. It bypasses the normal operating system storage stack and lets the application talk directly to the drive as if it were talking to another service in the data center. This process utilizes a typical application remote procedure call (RPC). This Kinetic Storage API platform currently provides libraries for Java, C++, C, Python, and Erlang, and other languages will be provided over time.
The Seagate Kinetic Storage API allows applications to interact with the drive as if it were a typical key/value service on the network; it allows applications to put data in the form of keys and values to the drive and to get this data back by specifying just the key. As one would expect, keys and their values can be deleted. Additionally, the keys are ordered so that searching of the keys within ranges and finding the next and previous keys are possible.
The schematic below shows the basic architecture.
Initial implementations of the Seagate Kinetic Storage API include Swift and Riak CS, with others in process. These systems allow thousands of drives to be managed as a single, reliable storage cluster. With such third-party management software, not only is the data stored reliably (using replication and/or erasure coding), but failed drives are also recovered transparently to the applications.
There are also extensive drive management commands that allow the drive to report its health and to manage who is allowed to communicate with the drive.
The Seagate Kinetic Storage platform allows implementation of new data center architectures. This is because Seagate Kinetic Storage drives can interface directly with the applications, thereby eliminating an entire tier of hardware.
This technological advantage allows denser storage racks, which impacts TCO in a number of different areas:
- Lower Capital Expenditure—Seagate Kinetic Storage architectures allow the removal of storage servers from the data center. This translates directly into lower capital expenditure in actually building out data centers. Alternatively, with the a comparable level of capital expenditure, customers can allocate much greater storage capacity in the same physical space.
- Labor—As the Seagate Kinetic Storage architecture removes the need for storage servers, this then reduces the number of technicians required to maintain them. In addition, the denser storage enabled by the Seagate Kinetic Storage architecture could potentially reduce the number of technicians a data center needs to employ in general, leading to significant labor savings.
- Power consumption—The elimination of the tier of storage server architecture and more efficient rack density allow for fewer racks to support the same volume of storage. This reduces energy consumption.
- Uptime/technician error—The greater reliability of the Seagate Kinetic Storage architecture with regards to automatic replication and failover reduces the number of errors related to the management of the storage data center. In addition, in-drive error management promises to reduce major technician incidents to the level of routine maintenance.
The increase in rack density provides another strong cost benefit for cases where physical real estate is a significant consideration, for example, data centers located in co-location facilities. The greater rack density means a significantly lower physical footprint for the data center, which translates directly into cost savings.
Exact impact on TCO will vary according to a number of factors (number of HDDs behind each storage server, real estate characteristics, etc.) specific to a given data center.
The security of storage services within the cloud data center is a difficult task. The interface library supports:
- Authentication—A full cryptographic authentication of servers that have access permission to the drive
- Integrity—Full integrity check of the command and the data
- Authorization—A clear set of roles by server as to what the application is allowed to do. Typical roles are read, read/write, management of the drive and management of the security in the drive.
- Transport Layer Security(TLS)—For the security of very sensitive data and/or management commands, a full industry-standard TLS suite is also provided.
This is a marked difference to other distributed storage systems where, inside the data center, traffic between services are not only unsecured but also unauthenticated. This gives anyone who has access to the data center complete and unfettered access to the storage to read, modify and even delete all the data. In these situations, the security becomes the responsibility of the networking infrastructure and higher-cost networking, separate network islands or complicated VLANs. The Seagate Kinetic Storage security architecture allows low-cost and flexible data center networking architectures.
The Seagate Kinetic Storage platform represents a fundamental and important leap forward in storage architectures. The demands of our always-connected, mobile and online world—and the corresponding massive cloud storage infrastructure required to support it—mandate true re-envisioning of best practices and technology. The Seagate Kinetic Storage platform delivers the new paradigm necessary to enable us collectively, as an industry, not only to meet this mandate but to do so optimally and in the most cost-efficient manner required.