27 Jan., 2025
AI workloads generate massive volumes of structured and unstructured data. To support training, inference and retraining cycles, enterprises need scalable, high-capacity storage that can handle continuous data growth.
There is no AI success without data — heaps of it.
And there are no massive datasets without ample, efficient data storage. AI workloads create continuous data streams — from training datasets and inference logs to metadata, embeddings, and model outputs. As generative AI and large language models (LLMs) expand, the volume and variety of enterprise data grow exponentially. This rapid scaling demands storage architectures that can handle constant ingestion, high-speed access and reliable preservation over time.
Data upholds AI and mass-capacity hard drives uphold data.
These insights are brought into clear view by a 2025 survey from the research firm Recon Analytics.
The global survey provides details of how enterprises across multiple industries are adapting their infrastructure to support AI. Respondents represent organizations already using or planning to use AI, offering insight into storage demands, scaling challenges and the future of enterprise data infrastructure.
The Seagate-commissioned global survey queried 1,062 respondents. They are IT storage buyers and decision-makers who work in storage infrastructure roles for companies that report over $10 million in annual revenue, have over 50 terabytes (TB) of current storage usage, have adopted AI or plan to adopt AI within the next three years and are located in the United States, China, United Kingdom, South Korea, Singapore, France, India, Japan, Taiwan and Germany.
The survey focused on the effects of AI adoption on infrastructure priorities, data retention and data management. Results shed light on how AI will impact infrastructure needs over the next three years.
The latest Recon Analytics survey reveals a pivotal shift in how enterprises are planning their data ecosystems for the AI era. Rather than treating AI as an isolated initiative, organizations are now reevaluating storage strategies, resource allocation and long-term infrastructure design in response to accelerating AI adoption. The survey captures how global IT leaders are preparing for a future where data growth, retention requirements and performance expectations will rise faster than ever before.
First and foremost, the survey demonstrated that AI adoption is driving exponential growth in data storage demand through 2028.
As many as 61% of respondents from companies that predominantly use cloud storage said their companies’ cloud-based storage would have to increase by over 100% — that is, it would have to double — over the next three years.
Figure 1. 61% of respondents whose companies primarily use cloud storage for their AI data management expect to increase their storage requirements by 100% or more.
As AI applications drive unprecedented data creation, the more data organizations save, the more they can validate AI is acting as expected. With access to behavioral data — like training datasets, model checkpoints, prompts and answers — companies can scrutinize algorithms, and better understand and refine AI decision-making. Without the scale and efficiency of data centers, AI’s potential would be limited, as the ability to store and retrieve massive datasets is central to AI’s success.
It’s not just the amount of storage that drives AI success. Duration of data storage matters, too.
Industries such as finance, healthcare, manufacturing and government operations depend on long-term retention to meet compliance requirements and audit needs. Retaining historical data strengthens governance frameworks, supports regulatory reporting and makes AI outputs more accurate over time.
Of the survey respondents employed by businesses that have adopted AI technology, 90% believe longer data retention improves the quality of AI outcomes.
Figure 2. 90% of companies that use AI today believe retaining more historical data improves model accuracy.
This finding points to a correlation between preserving data for longer periods and more reliable AI insights. This may be underpinned by several factors. First, constant iterative processing is intrinsic to how AI algorithms work. Content outputs feed back into the model, improving its accuracy and enabling new models. Raw datasets and outcomes become sources for further development and new workflows.
But holding onto datasets for longer serves other business critical functions, too, because it protects a company’s intellectual property. It keeps ‘receipts’ of the model’s original data sets and processes, providing an explanation of results when required (say, as part of a legal process).
These receipts establish data lineage, outlining a clear record of the journey data takes from input to output. Data lineage allows organizations to trace the origin and usage of datasets, so AI models are built on accurate data. It enables AI systems to be fully auditable, and supports both regulatory compliance and internal accountability.
Additionally, companies may choose to store more data for longer because they realize that they can’t know today what new, valuable insights the algorithms of tomorrow might uncover from yesterday’s data. Longer data retention allows the processing of old data by yet-undeveloped AI models. For these reasons, longer data retention boosts the business value AI can provide.
In a related finding, infrastructure decision-makers view extended data retention as essential for building trust — a critical foundation without which AI insights hold little value.
88% of respondents whose companies use AI today believe adoption of trustworthy AI increases the need to store more data for longer periods of time.
Figure 3. 88% of respondents whose companies use AI today said the adoption of trustworthy AI requires increased need to store more data for longer periods of time.
Seagate defines trustworthy AI as AI data workflows and models that use dependable inputs and generate reliable insights. Trustworthy AI is built on data that meets the following criteria:
Scalable storage infrastructure supports trustworthy AI because it properly manages, stores and secures vast amounts of data used by AI systems.
As part of building trustworthy AI, 80% of survey respondents stressed the importance of checkpointing.
Checkpointing is the process of saving the state of an AI model at specific, short intervals during its training. AI models are trained on large datasets through iterative processes, which can take anywhere from minutes to months. The duration of a model’s training depends on the complexity of the model, the size of the dataset and the computational power available. During this time, models are fed data, parameters are adjusted and the system learns how to predict outcomes based on the information it processes.
According to the survey, companies using 100+PB of storage are saving and backing up checkpoints on a daily-to-weekly basis, with 87% of them storing these checkpoints in the cloud or in a mix of hard drives and SDDs.
To support checkpointing at this scale, enterprises need storage systems capable of sustaining constant write activity without disrupting model progress. High-capacity hard drives and hybrid-cloud architectures provide the reliability and cost efficiency required to maintain these rapid snapshot cycles. By consistently capturing and protecting checkpoints, organizations can safeguard training progress, accelerate recovery from interruptions and maintain stable, predictable AI development workflows.
Compute and energy are popular themes in discussions of AI adoption. But the Recon Analytics survey highlights storage as the critical driver.
Figure 4. 66% of infrastructure decision-makers ranked storage as the second most important component among their top four AI enablers. They also ranked storage as the fourth most important barrier to AI deployment.
“The survey results generally point to a coming surge in demand for data storage, with hard drives emerging as the clear winner. When you consider the business leaders we surveyed intend to store more and more of this AI-driven data in the cloud, cloud services are well-positioned to ride a second growth wave.”
Recon Founder and Lead Analyst Roger Entner describes the main takeaway as follows:
To get the most value from AI, enterprises must prepare with scalable, efficient data storage. Whether directly or through cloud services, AI’s reliance on data depends on hard drives — offering unmatched capacity, cost efficiency and sustainability — as the backbone of trustworthy AI.
Hard drives deliver unmatched cost-per-TB advantages for large-scale AI storage. Mass-capacity hard drives offer the optimal balance of scalability, energy efficiency and sustainability, allowing enterprises to expand storage footprints without exceeding budget or power constraints.