
Without an enterprise-wide view (a situation common to many organizations), teams cannot determine which data is valuable, which is redundant, or which poses a risk. In particular, metadata remains underutilised, even though insights such as creation date, last access date, ownership, activity levels and other basic indicators can immediately reveal security risks, duplication, orphaned content and stale data.
Visibility begins by building a thorough understanding of the existing data landscape. This can be done by using tools that scan storage platforms across multi-vendor and multi-location environments, collect metadata at scale, and generate virtual views of datasets. This allows teams to understand the size, age, usage and ownership of their data, enabling them to identify duplicate, forgotten or orphaned files.
It’s a complex challenge. In most cases, some data will be on-premises, some in the cloud, some stored as files and some as objects (such as S3 or Azure), all of which can be on-prem or in the cloud. In these circumstances, the multi-vendor infrastructure strategy adopted by many organizations is a sound strategy as it facilitates data redundancy and replication while also protecting against increasingly common cloud outages, such as those seen at Amazon and CloudFlare.

