27 Jan., 2025

Preparing Enterprise Data Infrastructure for AI at Scale

AI workloads generate massive volumes of structured and unstructured data. To support training, inference and retraining cycles, enterprises need scalable, high-capacity storage that can handle continuous data growth.

There is no AI success without data — heaps of it.

And there are no massive datasets without ample, efficient data storage. AI workloads create continuous data streams — from training datasets and inference logs to metadata, embeddings and model outputs. As generative AI and large language models (LLMs) expand, the volume and variety of enterprise data grow exponentially. This rapid scaling demands storage architectures that can handle constant ingestion, high-speed access and reliable preservation over time.

Data upholds AI and mass-capacity hard drives uphold data.

These insights are brought into clear view by a 2025 survey from the research firm Recon Analytics.

The global survey provides details of how enterprises across multiple industries are adapting their infrastructure to support AI. Respondents represent organisations already using or planning to use AI, offering insight into storage demands, scaling challenges and the future of enterprise data infrastructure.

The Seagate-commissioned global survey queried 1,062 respondents. They are IT storage buyers and decision-makers who work in storage infrastructure roles for companies that report over $10 million in annual revenue, have over 50 terabytes (TB) of current storage usage, have adopted AI or plan to adopt AI within the next three years and are located in the United States, China, the United Kingdom, South Korea, Singapore, France, India, Japan, Taiwan and Germany.

The survey focused on the effects of AI adoption on infrastructure priorities, data retention and data management. Results shed light on how AI will impact infrastructure needs over the next three years.

Global Survey Insights: How AI Adoption Will Transform Data Infrastructure

The latest Recon Analytics survey reveals a pivotal shift in how enterprises are planning their data ecosystems for the AI era. Rather than treating AI as an isolated initiative, organisations are now reevaluating storage strategies, resource allocation and long-term infrastructure design in response to accelerating AI adoption. The survey captures how global IT leaders are preparing for a future where data growth, retention requirements and performance expectations will rise faster than ever before.

AI Data Growth through 2028: Why AI Storage Demand Is Surging

First and foremost, the survey demonstrated that AI adoption is driving exponential growth in data storage demand through 2028.

As many as 61% of respondents from companies that predominantly use cloud storage said their companies’ cloud-based storage would have to increase by over 100% — that is, it would have to double — over the next three years.

Bar chart shows how much companies expect data storage needs to change over the next three years, from decrease/no change to 100%.

Figure 1. 61% of respondents whose companies primarily use cloud storage for their AI data management expect to increase their storage requirements by 100% or more.

Why Long-Term Data Retention Improves AI Accuracy and Trustworthiness

As AI applications drive unprecedented data creation, the more data organisations save, the more they can validate AI is acting as expected. With access to behavioural data — like training datasets, model checkpoints, prompts and answers — companies can scrutinise algorithms and better understand and refine AI decision-making. Without the scale and efficiency of data centres, AI’s potential would be limited, as the ability to store and retrieve massive datasets is central to AI’s success.

It’s not just the amount of storage that drives AI success. Duration of data storage matters, too.

Industries such as finance, healthcare, manufacturing and government operations depend on long-term retention to meet compliance requirements and audit needs. Retaining historical data strengthens governance frameworks, supports regulatory reporting and makes AI outputs more accurate over time.

Of the survey respondents employed by businesses that have adopted AI technology, 90% believe longer data retention improves the quality of AI outcomes.

Infographic provides breakdown of people who believe longer data retention times improve AI outcomes, with 90% of companies who use AI answering yes.

Figure 2. 90% of companies that use AI today believe retaining more historical data improves model accuracy.

This finding points to a correlation between preserving data for longer periods and more reliable AI insights. This may be underpinned by several factors. First, constant iterative processing is intrinsic to how AI algorithms work. Content outputs feed back into the model, improving its accuracy and enabling new models. Raw datasets and outcomes become sources for further development and new workflows.

Role of Data Lineage, Compliance and IP Protection in Trustworthy AI

But holding onto datasets for longer serves other business-critical functions, too, because it protects a company’s intellectual property. It keeps ‘receipts’ of the model’s original data sets and processes, providing an explanation of results when required (say, as part of a legal process).

These receipts establish data lineage, outlining a clear record of the journey data takes from input to output. Data lineage allows organisations to trace the origin and usage of datasets, so AI models are built on accurate data. It enables AI systems to be fully auditable, and supports both regulatory compliance and internal accountability.

Additionally, companies may choose to store more data for longer because they realise that they can’t know today what new, valuable insights the algorithms of tomorrow might uncover from yesterday’s data. Longer data retention allows the processing of old data by yet-undeveloped AI models. For these reasons, longer data retention boosts the business value AI can provide.

In a related finding, infrastructure decision-makers view extended data retention as essential for building trust — a critical foundation without which AI insights hold little value.

88% of respondents whose companies use AI today believe adoption of trustworthy AI increases the need to store more data for longer periods of time.

Graphic shows that 88% of AI-adopting companies believe trust and governance requirements will necessitate longer data retention periods.

Figure 3. 88% of respondents whose companies use AI today said the adoption of trustworthy AI requires increased need to store more data for longer periods of time.

Seagate defines trustworthy AI as AI data workflows and models that use dependable inputs and generate reliable insights. Trustworthy AI is built on data that meets the following criteria:

High quality and accuracy
Clear legality, ownership and provenance
Secure storage and protection
Explainable and traceable transformations by the algorithm
Consistent and reliable outputs from the data processing

Scalable storage infrastructure supports trustworthy AI because it properly manages, stores and secures vast amounts of data used by AI systems.

As part of building trustworthy AI, 80% of survey respondents stressed the importance of checkpointing.

Checkpointing: Why Frequent Model Snapshots Depend on Reliable, High-Capacity Hard Drive Storage

Checkpointing is the process of saving the state of an AI model at specific, short intervals during its training. AI models are trained on large datasets through iterative processes, which can take anywhere from minutes to months. The duration of a model’s training depends on the complexity of the model, the size of the dataset and the computational power available. During this time, models are fed data, parameters are adjusted and the system learns how to predict outcomes based on the information it processes.

Checkpoints essentially act like snapshots of the model’s current state — its data, parameters and settings — at many points during training. Snapshots saved at regular intervals preserve a record of the model’s progression and protect against data loss caused by unexpected interruptions.

According to the survey, companies using 100+PB of storage are saving and backing up checkpoints on a daily-to-weekly basis, with 87% of them storing these checkpoints in the cloud or in a mix of hard drives and SDDs.

To support checkpointing at this scale, enterprises need storage systems capable of sustaining constant write activity without disrupting model progress. High-capacity hard drives and hybrid-cloud architectures provide the reliability and cost efficiency required to maintain these rapid snapshot cycles. By consistently capturing and protecting checkpoints, organisations can safeguard training progress, accelerate recovery from interruptions and maintain stable, predictable AI development workflows.

Storage: Secret Driver Behind Scalable, Cost-Efficient AI Systems

Compute and energy are popular themes in discussions of AI adoption. But the Recon Analytics survey highlights storage as the critical driver.

From the perspective of infrastructure buyers, data storage ranked as the second most important part of AI infrastructure, following only security. Security and storage were followed by data management, network capacity, compute, regulations, LLM viability and energy, in order of importance.
Two thirds (66%) of respondents ranked storage as the second most important among their top four AI enablers, and as the fourth most important barrier to adoption.

Horizontal bar chart showing how surveyed respondents rank AI application use cases within their top four priorities.

Figure 4. 66% of infrastructure decision-makers ranked storage as the second most important component among their top four AI enablers. They also ranked storage as the fourth most important barrier to AI deployment.

“The survey results generally point to a coming surge in demand for data storage, with hard drives emerging as the clear winner. When you consider the business leaders we surveyed intend to store more and more of this AI-driven data in the cloud, cloud services are well-positioned to ride a second growth wave.”

Recon Founder and Lead Analyst Roger Entner describes the main takeaway as follows:

To get the most value from AI, enterprises must prepare with scalable, efficient data storage. Whether directly or through cloud services, AI’s reliance on data depends on hard drives — offering unmatched capacity, cost efficiency and sustainability — as the backbone of trustworthy AI.

Hard drives deliver unmatched cost-per-TB advantages for large-scale AI storage. Mass-capacity hard drives offer the optimal balance of scalability, energy efficiency and sustainability, allowing enterprises to expand storage footprints without exceeding budget or power constraints.

Products

Knowledge Base

Support Downloads

Articles

suggested searches

Learn more

Read the report

Read the article

Preparing Enterprise Data Infrastructure for AI at Scale

Preparing Enterprise Data Infrastructure for AI at Scale

Global Survey Insights: How AI Adoption Will Transform Data Infrastructure

AI Data Growth through 2028: Why AI Storage Demand Is Surging

Why Long-Term Data Retention Improves AI Accuracy and Trustworthiness

Role of Data Lineage, Compliance and IP Protection in Trustworthy AI

Checkpointing: Why Frequent Model Snapshots Depend on Reliable, High-Capacity Hard Drive Storage

Storage: Secret Driver Behind Scalable, Cost-Efficient AI Systems

Recommended articles