Digging into Dark Data

Most organizations are looking to extract more value from their data. They are investing in Big Data technologies, hiring data scientists, deploying analytics platforms and Hadoop clusters—all in search of that nugget of information that could give them a competitive advantage. The list of puns and cliches that come to mind is staggering, so please consider this:

  • Gartner coined the phrase dark data to describe the masses of unstructured information (around 80% of the total by volume) that organizations retain but have no meaningful way of analyzing.
  • The text mining industry has been around for years. They are continuously improving how people can scour text for information, basically looking for nuggets of gold.
  • Big Data has promised a veritable gold rush of insights to enable you to mine your data for those hidden nuggets.
  • However, these technologies only provide picks and shovels to make it happen because they can only handle plain text, HTML and basic documents. With unstructured data, the motherlode is buried deeper underground. So as with all gold rushes, most of the prospectors come home empty handed.
Caterpillar D9 bulldozer

If you’re mining unstructured data, you’ll need more than picks and shovels. Photo: Zachi Evenor

The Nuix Engine is basically the Caterpillar D9 bulldozer of dark data: powerful, robust, and can get through a ton of data in a hurry. However, I would be the first to admit that it isn’t all things to all people. In this woeful analogy, Nuix is just a piece of the mining operation.

What is the raw material for all analytics platforms? Text. In fact, entire disciplines are focused on “text analytics” and “text mining.”

But have you ever wondered where they get the text? Hadoop may be a powerful and versatile analytics platform, but try getting it to work with information stored in an email archive. If you have tried, my guess is you struggled to provide your platform with usable text.

Obviously I am not proposing that that Nuix can replace your analytics platform, but I do want to get you thinking about what you could do if you had more text.

Think of the insights you could glean from your old emails or maybe terabytes of user documents. Maybe you’d look for fraud; or try to find those groups of people who are like canaries in the coal mine when it comes to market trends; or dig around for great new product ideas that got lost in the shuffle.

I have been fascinated by the promise of IBM’s Watson. The advances we have made in machine learning over the past few years are staggering. I can’t help but wonder what a technology like Watson could do if you fed it the text from all 80% of your organization’s dark data.

If you’re wondering the same thing, you may be interested in our new OEM program. There might be gold in them thar hills!

As Chief Technology Officer at Nuix, I’m responsible for leading the evolution of the company’s software. Currently I’m driving the development of Nuix's information governance and big data solutions. I joined Nuix in 2008, and I’ve worked for more than 15 years with public and private sector organizations, designing and providing solutions for their email, file, document management and archiving systems.

Posted in Big Data
Follow

Get every new post delivered to your Inbox.

Join 26 other followers

%d bloggers like this: