Enterprise Storage Insights

The Growth and Sprawl of Data

Enterprises are up against big challenges with the continued proliferation of data. Managing data sprawl at the edge offers businesses a chance to mine more value from their ever-increasing data sets.
The Growth and Sprawl of Data

What Business Owners Need to Know About Data Growth and Sprawl

With an unprecedented increase in data creation, enterprises are being challenged to find new ways to manage increasing volumes of information while using it to improve their business outcomes. This constant stream of data comes from sources such as factory sensors, consumer smartphones, and Internet of Things (IoT) devices at the edge. How businesses cope with the increasing growth and sprawl of data will have a huge impact on their success moving forward.

As Seagate's Rethink Data report notes, data volume, sources, and traffic are expanding faster than many enterprises can handle. But managing the increasing flow of data requires an understanding of how networks are evolving. Businesses must understand how data at the edge fits into today's computing ecosystem.

Defining Data Growth and Sprawl

Data growth is the percentage that the overall datasphere increases over time. This encompasses every source of data. By contrast, data sprawl is about the number of data centers and processing locations, as well as how far data is spreading geographically. Sprawl exists throughout various configurations—from endpoint devices through the edge and to public and private clouds.

Three factors will be primarily responsible for data growth and sprawl over the next few years. First is the increasing use of analytics. Business analytics and artificial intelligence (AI) applications are just two examples of enterprise analytics tools that require more data in different locations. Second, the proliferation of IoT devices is increasing the number of data sources and increasing data traffic to core infrastructure such as on-premises and cloud servers. Finally, cloud migration initiatives are taking information that would otherwise exist on local devices or drives into centralized public cloud and private cloud data center servers for accessibility and analytics purposes.

Increasing demands on enterprise IT infrastructure reflect how this growth in analytics, IoT, and data in motion naturally leads to greater sprawl. Seagate's Rethink Data report shows how both sprawl and fragmentation is on the rise.

Today, approximately 30% of data storage takes place in internal data centers, 20% in third-party data centers, and 19% at the edge. Data storage also takes place in cloud repositories or other locations, representing another 30%. This distribution isn't likely to change over the next two years, meaning that enterprise storage environments will remain dispersed for the foreseeable future.

Once enterprises gain clear insights into how information quantity and locations are multiplying, they can begin developing management strategies that encompass all data sources—including the edge.

Edge Data's Contribution to Growth

The edge isn't a thing; it's a location. The edge is the outer boundary of the network, where real-time decision making takes place. The edge is located as close to the actual data source as possible, which is often found hundreds or thousands of miles away from the nearest enterprise or cloud data center.

The Rethink Data report notes that as edge data sources proliferate, devices and sensors are found everywhere—from manufacturing production lines to office buildings. Edge computing initially has been seen as “a decentralized swing of the pendulum," Bob Gill, research vice president at Gartner, noted in a 2018 paper. According to Gill, decentralization via the edge solved two critical cloud challenges: cost and latency. Edge processing can be faster when data doesn't have to travel to and from a cloud server—and in many instances, it can also be cheaper. This means enterprises can unlock some of the analytics value of edge data at the edge, for real-time decision making, before sending it on to core or cloud data centers to unlock further value.

Billions of IoT devices in the field are enhancing data collection capabilities exponentially. Meanwhile, software and hardware advances have made AI more practical, cost-efficient, and accessible to the average enterprise. Innovations in edge data center facilities also allow businesses to unlock mass amounts of value at the edge.

But for businesses to access the benefits of data’s full value, they need to be able to not only collect, store, and process edge data, but also to transfer more data from the edge to core data centers.

As data growth and sprawl outside the traditional data center increases, the cloud will begin to merge with the edge. As noted in the Rethink Data report, the expectation that edge data will be stored for only a short period of time—until it’s analyzed or processed before moving relevant data to the core—doesn’t mean the future is the cloud versus the edge. Rather, it's the cloud and edge working as one.

Managing Data Sprawl at the Edge

Edge data storage has been growing at a faster rate than core data storage. At the same time, however, the volume of data that organizations transfer from the edge to the core is set to increase from 8% to 16% over the next two years.

To manage this increased processing of edge data—both at the edge and later in core data centers—information management plans must enable faster and easier data transmission from start to finish. Data mobility should be facilitated across endpoints, the edge, and private, public, or industry clouds.

To prevent data from becoming siloed from and inaccessible to the larger enterprise data infrastructure, enterprises must manage and organize data storage at the edge. The edge can be particularly susceptible to silos if traffic from endpoint devices isn't properly coordinated.

But the benefits of data and computing at the edge are profound. In particular, more information can be collected and curated for in-depth analysis by AI and business analytics software than under a model dedicated solely to on-premises or cloud data center infrastructure.

To manage edge growth and sprawl more effectively, businesses will need to employ edge architecture that can conduct storage and analysis of latency-sensitive information in real time, while also enabling distributed computing to perform analysis of streaming data from the edge.

As the Rethink Data report underscores, innovation isn't driven by trends. Creating value under constraints is what drives new solutions. And that's precisely what growth and sprawl at the edge are doing for businesses and their IT partners. Enterprises can expect to see unprecedented data growth due to the massive uptick in IoT devices and the increased use of business analytics and AI tools. To begin managing and profiting from growth and sprawl, enterprises need a solid data management plan and a cost-efficient technology stack. Together, they must enable data to be moved easily between edge and core, at the right time, depending on what value is to be extracted from the data.

Read more about how enterprises can put more of their available business data to work in the full Rethink Data report from Seagate.