Not knowing your Dark Data is a bad strategy

Erich van der Hoogen | 09 January, 2018

Erich van der Hoogen, Information Management Specialist, Infoprotect.

Erich van der Hoogen, Information Management Specialist, Infoprotect.

We humans have a tendency to hoard. Even if we have organised lives, things still gather in our homes. After all, who hasn’t had the experience of moving house and then noting with disbelief how much stuff they actually have.

Generally speaking, we rarely dive into what we store, because those are items we don’t need often or perhaps not even in a long time. Yet we keep it all around, because we anticipate that day when you may need a specific thing. This is why throwing certain items away can be an agonising decision: we don’t know if we will need it and we don’t have the luxury to focus on some future demand at that moment.

Companies do this too, though not necessarily with physical assets. If you want to find the business world at the height of hoarding tendencies, look no further than data. Every business sits on some sort of ‘databerg,’ a structure of digital information where only the tip is visible. Such databergs are growing: in 2017 data expanded by 49% year on year, up from 23% in 2016. With such quantities, it may be tempting to disregard all but the visible portion. Yet what lies beneath can represent great value to a company, says Veritas Distribution Account Manager, Julie Noizeux:

“You’re storing all that data and at some point you may need it back. But what are you doing with that data while you are storing it? Even if you don’t use it, is the data ready for regulations such as POPI and GDRP? Can you conveniently check your data to ensure ongoing compliance? There are many solutions that can help add value to your business, generate revenue, save costs or avoid costly brushes with regulations.”

 

You don’t know what you don’t know

Most companies have little idea what data they are actually storing and where it resides. This happens all too easily, says Erich van der Hoogen, Information Management Specialist at Infoprotect:

“Such a situation is often due to a lack of in house structured storage policies, or everyone’s ability to mostly create, copy and move data around between company shared and personal storage… and then sometimes forgetting about it! Another example would be accidental ‘drag and drop’ of a folder, of which the original then get restored from backup, thus creating one active, and one inert copy. During the next backup run, both folders has to be ‘processed’ for backup, and consume space on production storage.”

Not only is it easy to arrive at this situation, but the amount of data falling through the cracks is astounding. Returning to the ‘databerg’ analogue, only roughly 15% of company data makes up the tip. This is data that is active, regularly accessed by the company and of immediate apparent value. Yet right beneath that surface are two distinct layers.

The first is called ROT (Redundant, Obsolete or Trivial.) This is often the layer companies will engage with if they attempt to make sense of their data stockpile. But beneath that is the Dark Data layer – and this is what you should be very interested in. It may mean nothing, but chances are it may mean a lot. Regardless, it’s costing you not to know – both in business intelligence and in actual money.

Unknown data is a barrier

Two other forces are adding additional pressure to assessing your dark data. The first is through regulatory acts such as POPI and GDPR, and the second is the growing role of cloud in the business world.

“We’re moving away from talking about private and hybrid cloud,” says Noizeux. “We know the majority of our customers and organisations use multiple cloud services from different providers. Managing your data between those clouds is key.”

Yet you can’t migrate it all to the cloud – that would make little sense and cost a lot. Nor though, is it wise to keep dark data on primary storage, which both costs more and inhibits overall performance. Thus it is tempting to do what we do at home: throw it in a closet and leave it for another day. Yet in terms of risk this is a low-road strategy and only slightly better than destroying all that data.

Every business should have their dark data assessed. This can be done unobtrusively by hunting so-called meta data. Such an approach is less concerned with the contents of the data, said Van der Hoogen:

“Having millions on millions of files distributed across many file servers, you don’t necessarily need to know what the data comprises, but rather ‘what type’ of data the file is. We make use of meta data to compile a slightly ‘bigger picture’ around the actual file that will help us determine actionable information.”

Visibility is power

A recent assessment for a client done by Infoprotect is ample motivation to consider such assessments:

Seventy percent of the business’ data hadn’t been accessed in three years, while a third of data was owned by people who had left the company. If the first point doesn’t give pause, the second should. Even worse, this data was being kept on primary storage and costing a lot of money.

Now, it may be tempting to argue that old data is old, that it is of diminishing value. This may be true, but can you confidently say that about all the dark data? The only way to do so and then take effective action around it is to have the dark data assessed. This is not a once-off task. Data is being created every single day, thus you need to repeat the assessment regularly and you want to present the next silo of dark data being created. So establish a process of classification to gain insight into your data. This will help you to better understand your market and your customers, not to mention reduce risk including compliance.

“Data typically has a current and future value. The financial or policy data for a customer might not be of interest at the moment. But in three to four years when there’s a possible dispute, the data’s value will be relevant. This doesn’t change the fact that the data wasn’t accessed in the past two years, thus we need to consider archiving it in order to remove the impact it has on the production environment. Having insight to this detailed information will help businesses make informed decisions around archiving stale, yet important data, possibly deleting frivolous data, thus freeing up expensive production storage, reducing your front-end data footprint, resulting in less data to backup and secure, and ultimately saving on onsite and offsite storage costs. None of this possible without detailed insight on your data.”

While the stuff we hoard at home is probably not valuable at all, data is the currency of modern companies. Does dark data offer hidden opportunities for the business? Can it be used to generate revenue? Unless you assess it, you’ll never find out.

Learn more about you can future-proof your ICT strategy through assessing your dark data here.

This article was originally published on IT Web.