The technology industry talks a lot about structured data. Almost always when you hear about topics like Big Data or business intelligence, the power comes from manipulating, accelerating or integrating databases. In other words, rows and columns; structured data.
I used to think these structured data companies would join forces with unstructured data experts to deliver insights across an entire organization’s data. But now I wonder if that would be like a railroad company merging with a car tire manufacturer; they’re both in the transport industry but there are very few synergies.
Structured and unstructured data companies are trying to answer very different questions. Structured data companies want to analyze as much data as possible to show what happened. They build models based on historical data to predict what might happen in the future.
By contrast, unstructured data gives you the reasons why something happened, or at least someone’s opinions as to why they happened. Unstructured data technologies can provide you with context and clarification around a particular event or issue to help you make up your mind or provide greater insight.
I can think of many use cases for marrying the facts of structured data with the reasons and context of unstructured data. One recent example was in healthcare. IBM announced it was using content analysis software to understand unstructured data elements such as doctors’ notes alongside the structured parts of patients’ health records such as treatment dates and test results.
Another good example is information security.
Infosec and unstructured data
A very large proportion of external attacks now come through “spear-phishing”—typically a website link or poisoned attachments emailed to people at a particular company. Once a user has been tricked into opening one of these, it lodges a file on their PC which then replicates itself across that organization’s network. Really clever malware also morphs as it copies itself, to thwart our efforts to detect and clean it.
Identifying and remediating these attacks requires a technology that can analyze when files were created, copied or accessed and stored, and where there are duplicates and similar files across the network. It’s easy to envisage a technology based around the Nuix Engine that would create and maintain a light metadata index across many computers to hunt down rogue .exe and SQLite injections, and analyze and remediate these files.
Infosec companies are used to dealing with masses of semi-structured data such as log files. These are plain text but come in a predictable and sequential format. So perhaps the jump into fully unstructured data isn’t so great in this case.
Let’s get together
We are very interested in partnering with organizations that are seeking to add unstructured data capabilities to their tools and workflows. That’s why we launched the Nuix OEM program. If that interests you, please get in touch!