The evolving storage needs for AI infrastructure.

PraveenViraraghavan

AI storage solutions must evolve to handle massive data growth. Innovations like dual actuator drives, NVMe-based storage, and energy-efficient solutions are shaping the future of AI infrastructure.

As artificial intelligence (AI) continues to advance, the infrastructure supporting it must evolve to handle increasing demands for data storage and processing. Data storage plays a critical role in the AI infrastructure lifecycle, and solutions must be able to withstand current and future AI challenges.

The amount of data generated daily is staggering. From smart cities producing 143 petabytes of data per day, according to a Kaleido Intelligence report¹, to autonomous vehicles generating terabytes (TBs) of data, the need for efficient data storage solutions is more pressing than ever. Autonomous car companies upload vast amounts of data to cloud service providers, where it’s processed and used to improve AI models. This continuous flow of data necessitates robust storage solutions that can handle both the volume and speed required for AI applications.

Performance vs. power.

Despite the focus on cutting-edge technologies like graphics processing units (GPUs), hard drives remain a critical component of AI infrastructure. They provide the necessary storage capacity for the massive data sets used in AI training and inference. While GPUs handle the heavy lifting of data processing, hard drives store the data that feeds these processes. This symbiotic relationship ensures AI systems can operate efficiently without being bottlenecked by storage limitations.

One of the biggest challenges in AI infrastructure is balancing performance with power consumption. As GPU clusters grow, the power required to run them increases significantly. For instance, large deployments like those at AI leaders involve thousands of GPUs, each consuming substantial amounts of power. This creates a need for storage solutions that not only offer high performance but also operate efficiently in terms of power usage. To put this into perspective, a single GPU can consume up to 700 watts, and large-scale deployments can involve up to 100,000 GPUs, resulting in a power requirement of 70 megawatts. This component is equivalent to the total power allocation of a large data center. Therefore, storage solutions must be designed to minimize power consumption while maximizing performance if they want to fit into the solution along with GPUs.

The importance of checkpoints.

In AI training, checkpoints are critical to prevent lost progress in case of system failures. These checkpoints save the state of the AI model at regular intervals (say, every few minutes), allowing the training process to resume from the last saved state rather than starting over. This is particularly important for long-running training sessions that can span weeks or even months. Efficient checkpointing requires fast storage solutions that can quickly save and retrieve large amounts of data.

For example, some large training platforms perform checkpoints every minute during training, saving data to solid-state drives (SSDs) and then transferring it to hard drives. This process makes sure that even if a failure occurs, the training can resume with minimal data loss. The size of these checkpoints can be substantial, with some models requiring up to 12TB of storage per checkpoint.

Hard drives are essential for AI checkpointing due to scalability, cost efficiency, power efficiency and sustainability, and longevity.

Future trends and innovations.

Looking ahead, the demand for AI storage is expected to grow exponentially. According to data from Bloomberg Intelligence, IDC, eMarketer, and Statista², by 2032, the AI storage market is projected to reach $92 billion. This growth will be driven by the increasing complexity of AI models and expanding use of AI across various industries. To meet these demands, storage solutions will need to become more sophisticated, offering higher capacities, faster speeds, and better power efficiency.

Several technical innovations are being explored to address the storage needs of AI infrastructure:

Areal density growth. The continued growth of hard drives by innovating the heads and media of the devices allows for a larger capacity footprint in the same form factor. Seagate Mozaic-enabled hard drives are the world’s most efficient hard drive storage, capable of lowering acquisition and operational costs while increasing productivity. With Mozaic’s drive featuring increased areal density, customers can store more data without increasing consumption of space, power, or natural resources. Mozaic 3+ helps customers achieve sustainability goals—a top priority for large-scale data centers—by offering a 55% reduction in embodied carbon per terabyte³.
Dual actuator drives. These drives offer increased performance by using two actuators to read and write data simultaneously. This can significantly improve data throughput, making it easier to handle the large volumes of data generated by AI applications.
NVMe-based hard drives. Non-volatile memory express (NVMe) technology provides faster data access compared to traditional SATA (serial advanced technology attachment) or SAS interfaces (serial-attached SCSI [small computer system interface]). By adopting NVMe-based hard drives, data centers can achieve higher performance and lower latency, which is crucial for AI workloads.
Optical interconnects. As data transfer rates increase, traditional copper interconnects can become a bottleneck. Optical interconnects offer higher bandwidth and lower latency, enabling faster data movement between storage devices and processing units.
Energy-efficient storage solutions. With the growing power demands of AI infrastructure, storage solutions need to be more energy efficient. This includes developing drives that consume less power while maintaining high performance, as well as exploring new cooling technologies to manage the heat generated by large-scale deployments.

Evolving AI storage demands.

The storage needs for AI infrastructure are evolving rapidly, driven by the exponential growth of data and the increasing complexity of AI models. As we move forward, it will be essential to develop storage solutions that can keep pace with these demands, so AI systems can continue to advance and deliver on their promise of transforming industries and improving lives.

Cellular IoT Connectivity Series: Smart Cities Opportunities & Forecasts, Kaleido Intelligence, 2023, https://kaleidointelligence.com/smart-cities-2027/

Generative AI to Become a $1.3 Trillion Market by 2032, Research Finds, Bloomberg Intelligence, 2023, https://www.bloomberg.com/company/press/generative-ai-to-become-a-1-3-trillion-market-by-2032-research-finds/

30TB Mozaic 3+ drive compared to a 16TB conventional PMR drive. Embodied carbon includes emissions generated during raw material extraction, product manufacturing/assembly, and all transportation of materials from extraction through manufacturing and from manufacturing to customers.

Praveen Viraraghavan

Praveen Viraraghavan is a Technologist in the Products and Markets organization at Seagate Technology.

Products

Knowledge Base

Support Downloads

Articles

suggested searches

Sign Up for Email and Text Alerts

Learn more

Read the report

Read the article

Mozaic 3+

The evolving storage needs for AI infrastructure.

Performance vs. power.

The importance of checkpoints.

Future trends and innovations.

Evolving AI storage demands.