The evolving storage needs for AI infrastructure.
Mar 03, 2025
AI storage solutions must evolve to handle massive data growth. Innovations like dual actuator drives, NVMe-based storage, and energy-efficient solutions are shaping the future of AI infrastructure.
As artificial intelligence (AI) continues to advance, the infrastructure supporting it must evolve to handle increasing demands for data storage and processing. Data storage plays a critical role in the AI infrastructure lifecycle, and solutions must be able to withstand current and future AI challenges.
The amount of data generated daily is staggering. From smart cities producing 143 petabytes of data per day, according to a Kaleido Intelligence report1, to autonomous vehicles generating terabytes (TBs) of data, the need for efficient data storage solutions is more pressing than ever. Autonomous car companies upload vast amounts of data to cloud service providers, where it’s processed and used to improve AI models. This continuous flow of data necessitates robust storage solutions that can handle both the volume and speed required for AI applications.
Despite the focus on cutting-edge technologies like graphics processing units (GPUs), hard drives remain a critical component of AI infrastructure. They provide the necessary storage capacity for the massive data sets used in AI training and inference. While GPUs handle the heavy lifting of data processing, hard drives store the data that feeds these processes. This symbiotic relationship ensures AI systems can operate efficiently without being bottlenecked by storage limitations.
One of the biggest challenges in AI infrastructure is balancing performance with power consumption. As GPU clusters grow, the power required to run them increases significantly. For instance, large deployments like those at AI leaders involve thousands of GPUs, each consuming substantial amounts of power. This creates a need for storage solutions that not only offer high performance but also operate efficiently in terms of power usage. To put this into perspective, a single GPU can consume up to 700 watts, and large-scale deployments can involve up to 100,000 GPUs, resulting in a power requirement of 70 megawatts. This component is equivalent to the total power allocation of a large data center. Therefore, storage solutions must be designed to minimize power consumption while maximizing performance if they want to fit into the solution along with GPUs.
In AI training, checkpoints are critical to prevent lost progress in case of system failures. These checkpoints save the state of the AI model at regular intervals (say, every few minutes), allowing the training process to resume from the last saved state rather than starting over. This is particularly important for long-running training sessions that can span weeks or even months. Efficient checkpointing requires fast storage solutions that can quickly save and retrieve large amounts of data.
For example, some large training platforms perform checkpoints every minute during training, saving data to solid-state drives (SSDs) and then transferring it to hard drives. This process makes sure that even if a failure occurs, the training can resume with minimal data loss. The size of these checkpoints can be substantial, with some models requiring up to 12TB of storage per checkpoint.
Hard drives are essential for AI checkpointing due to scalability, cost efficiency, power efficiency and sustainability, and longevity.
Looking ahead, the demand for AI storage is expected to grow exponentially. According to data from Bloomberg Intelligence, IDC, eMarketer, and Statista2, by 2032, the AI storage market is projected to reach $92 billion. This growth will be driven by the increasing complexity of AI models and expanding use of AI across various industries. To meet these demands, storage solutions will need to become more sophisticated, offering higher capacities, faster speeds, and better power efficiency.
Several technical innovations are being explored to address the storage needs of AI infrastructure:
The storage needs for AI infrastructure are evolving rapidly, driven by the exponential growth of data and the increasing complexity of AI models. As we move forward, it will be essential to develop storage solutions that can keep pace with these demands, so AI systems can continue to advance and deliver on their promise of transforming industries and improving lives.
Praveen Viraraghavan
Praveen Viraraghavan is a Technologist in the Products and Markets organization at Seagate Technology.