Information Governance: A Framework, Not a Tool

At this year’s ARMA Live! conference in San Diego, I was amazed and excited by the number of companies on the show floor that were that were establishing beachheads on the island of information governance (IG). There is a lot of excitement about this concept, evidenced by a number of training programs and certifications supporting it.

Brian Tuemmler speaks at the 2014 ARMA Live! conferece in San Diego.

Brian Tuemmler speaks at the 2014 ARMA Live! conferece in San Diego.

Lately, however, I’ve started becoming a bit worried about “information governance” as a marketing catchphrase rather than just a good business imperative. IG is, in fact, a program that is enabled by tools; it is not a tool in and of itself. I see two problems that result:

  • A number of technical capabilities fit under the umbrella of information governance but are completely and mutually exclusive from others. This makes technology buying decisions confusing for corporations.
  • Some of these capabilities (or tools) portray themselves as the exclusive owners of this domain. For example, removing unneeded business content (aka defensible disposition) and regulatory risk avoidance are very good information governance use cases. But neither is the whole story.

Point-forward vs Data at Rest

Let’s start with the first problem: I see a very distinct separation in the market between tools and technologies that address the “point-forward” issues around governance as opposed to the “data at rest” crowd.

The first group is interested in knowing the five Ws of content at the moment it is created, so that we can leverage the greater context of that information down the road. These tools include:

  • Enterprise content and records management (ECRM), such as SharePoint and other records management systems
  • Digital loss protection systems that monitor emails as they are sent and received
  • Retention management systems that track content in place to support litigation response and records management.

The limitation of any of these systems is that they provide no value until there is actual content under management.

Managing data at rest, on the other hand, focuses on taking what unstructured content has already been created—80% of your corporate knowledge—and using it to add value to your organization. This is where Nuix sits. Nuix does not compete with the point-forward solutions I mentioned previously. Both complementary technologies have a place in the IG ecosystem

You Can’t Buy Information Governance

This brings us to the second problem, which is that IG is often so narrowly defined that a single tool should be able to help. But IG is not something you can buy. It is anything that helps you identify responsibility for and mitigation of information management issues.

As an example, Nuix tools, looking at your data at rest, can help your organization figure out how to:

In the end it comes down to you and how you define information governance for your organization. Don’t let a vendor or tool do it for you! Rather, solicit participation from a number of stakeholders inside your organization to ensure you maximize potential benefits. To achieve these IG goals, the technical solution may involve a number of tools from different vendors.

Posted in Information Governance

Identifying Anti-forensics: Timestomping

In the fields of law enforcement and cybersecurity, there’s endless escalating battle of technology between criminals who try to avoid getting caught and investigators who try to catch them. So while we have all sorts of clever forensic technologies to identify evidence of a crime, there are also very clever anti-forensic techniques designed to frustrate and mislead investigators, and ultimately to evade detection.

Some common anti-forensic techniques include:

  • Securely wiping files or metadata, which generally involves overwriting the data stored on a disk with other information, to ensure there’s no remaining trace of the original
  • Encrypting or encoding information so it can’t easily be read by investigators
  • Using steganography—hiding the data inside other innocuous-looking data
  • Masking or modifying forensic artefacts to make the information less obvious to people looking for it.

(There are, of course, legitimate uses for anti-forensic techniques. For example, if you sold or gave away your laptop to someone else, you’d be foolish not to wipe over any private data stored on the hard drive.)

Timestomping is an example of that last category. It involves changing the created, modified, or accessed date of files within the file system of a hard drive, USB stick, flash memory card, or other storage device.

To understand why someone would do this, you have to realize that because investigative data sets are so big, investigators can’t possibly examine every file on a storage device in the required depth and detail. A common and very sensible analysis workflow is to narrow down the search to a particular timeframe—such as immediately before and after the data breach or other suspected incident occurred. The bad guys know this. So they often change the dates of a few vital files to make them much harder to detect in a standard investigative process.

For example, say someone inadvertently downloads a virus from a website. The internet history will show exactly what time they downloaded that file. An investigator would typically examine all the files created or modified immediately after that time. However, if the virus has timestomped itself, or the files it created or modified, these files will be harder to find.

A common timestomping technique is to change the dates of the file you want to hide to match the dates of other existing files. For example, when you install an application, this creates huge numbers of files with the same date and time. An additional file with the same date and time could easily get lost or overlooked by an analyst.

How it works

File systems store metadata about each file they contain. In most cases this includes the date each file was created, last modified, and last accessed. But file systems allow operating systems and applications to update and change these dates very easily, which can be used for valid or illegitimate reasons. That means timestomping isn’t at all difficult; there are plenty of command-line or even graphical tools that will allow you change dates.

In the following two screenshots, you’ll see I have used a graphical tool to timestomp a file so its dates are the same as another file on the file system—in this case cmd.exe, a file that is created at the same time as hundreds of other files.

The file Sample-Details.xslx before timestomping.

The file Sample-Details.xslx before timestomping.

The file Sample-Details.xslx after timestomping with the dates of another file.

The file Sample-Details.xslx after timestomping with the dates of another file.

That was fun! Now my file looks exactly like a bunch of other system files, but who knows what I’ve hidden in it.

Detecting timestomping

Fortunately, it is possible to detect timestomping. I’ll show you how it’s done within NTFS, the file system used on almost all Windows machines, and Nuix Investigator.

NTFS stores date information about each file within a central file called the Master File Table (MFT), in two different locations: $Standard_Information and $File_Name. $Standard_Information is the location most often used by the operating system and applications, so when a file is timestomped, these date values are changed. However, the operating system and installed applications generally don’t access the dates in $File_Name. So if a file has been timestomped, the dates will be different between these locations. That’s because the operating system has changed one set of dates, but the file system hasn’t recorded the same change in the second set of dates.

Nuix Investigator extracts date metadata for every file from both locations in the MFT. As you can see in the screenshot below, the $Standard_Information “File Accessed” date and the $File_Name “NTFS Win32 File Accessed” date are different. So are the other dates. That means timestomping.

File system metadata shows discrepancies between the two sets of dates, an obvious sign that someone has been timestomping.

File system metadata shows discrepancies between the two sets of dates, an obvious sign that someone has been timestomping.

It would be a long and tedious process to compare the dates on each file to detect mismatches. But there are a couple of things you can do.

First, you can choose which set of dates to apply in a timeline view. Using the $File_Name date, which timestomping doesn’t affect, ensures you see things in the order they happened.

Second, you can easily run a script that does all that work of comparing dates and detecting anomalies for you. From the investigator’s perspective, the results of that script will be very interesting. If someone is trying to cover their tracks, they’ll probably only try to hide the files that will give them away. So if you can press a button to get a list of files someone has tried to hide from you, that’s a great place to start your investigation.

Posted in Cybersecurity, Digital Investigation

Nuix demonstrates AWS migration tool at AWS:ReInvent

AWS:ReInvent 2014 is a technical conference and trade show for anyone who wants to a lot more about the latest developments and applications of Amazon Web Services.

As an AWS partner, Nuix was proud to exhibit our latest and greatest cloud technology at AWS. One of the things we demonstrated was a practical solution for comprehensively moving data into AWS sites.

A Nuix employee gestures to a computer screen while speaking with a conference attendee in a booth.

Demonstrating the AWS migration tool at the AWS:ReInvent Conference.

There are already many tools in the marketplace for this type of migration, but it can often be a lengthy process that requires a mix of the AWS Console, AWS Command Line tools, custom tools, and log review, among other steps. More importantly, these tools tend to have very limited indexing capabilities because they’re designed for source data that is pre-indexed— pulled from a Microsoft SQL Server database, for example. But if you’re moving unstructured data to AWS, it isn’t easy to search and manage.

So Nuix Technical Analyst Mike Jackson used the Nuix software development kit (SDK) available on the Nuix Developer Portal to build an AWS migration tool. This tool employs the Nuix Engine to index unstructured data on the fly as you’re migrating it. You can then use this index to search the data using Amazon CloudSearch or Nuix Web Review & Analytics, for example.

Mike designed an intuitive front end for the migration tool, so users can simply select the source data and the AWS service to which it is migrating, which includes RedShift and DynamoDB.

The end result is a tool that excels in processing data that would otherwise be too difficult or expensive to migrate into CloudSearch

The CloudSearch migration tool is a powerful example of what is possible with the Nuix SDK, which is the primary offering of our OEM Program. Just think about what you could do with the Nuix Engine’s fast, forensic, precise data processing for unstructured data— the possibilities are endless!

Posted in Developers

Be prepared: What Scouts can teach us about ESI

Last weekend I was at a Cub Scout event—I’m a Cub Leader, among other things, in my spare time—and I started thinking about the Scout motto: Be prepared. One area where many legal professionals have not earned their merit badges is being prepared for prediscovery conferences early and ensuring they understand how rapidly evolving data storage technologies affect discovery.

In a recent article published on Law360, Charles Ragan and Eric Mandel from litigation firm Zelle Hofmann give commentary on the case of  Brown v. Tellermate Holdings Ltd. in which the court applied heavy sanctions to the defendant and its counsel for making false statements to the court because they weren’t prepared with their client’s technology.  The authors warn that “counsel must stay abreast of continuing changes in information technologies, and critically assess client information about electronically stored information if they are to meet their duties to courts and clients.”

Cub Scouts prepare to parade the colors

These Cub Scouts probably know more about the latest technology applications than most lawyers.
Photo: U.S. Navy Mass Communication Specialist 2nd Class Kristopher S. Wilson

Your data in the cloud is still your data

Businesses are rushing to capitalize on benefits of storing data in the cloud.  We see this in the email migration side of our business, and organizations of all sizes are entrusting their data to public systems such as salesforce.com or Google Business Apps. While there are many advantages for internal IT teams, cloud services present a whole new challenge for discovery. Counsel must understand all the systems used within an organization, where this data resides, and how to preserve and procure it.

In the Tellermate case, defense counsel failed to critically examine the information provided by their client regarding its business processes, where the data resided and its ability to preserve its business records. Due to a basic lack of understanding how cloud technology worked, counsel decided that the client’s data within a salesforce.com system was not within the client’s possession. As a result, counsel failed to take even the most basic preservation steps such as screen captures.

“Those failures [also] led to defense counsel making statements to opposing counsel and in court that were false and misleading, and which had the effect of hampering plaintiffs’ ability to pursue discovery timely and in a cost-efficient manner,” according to Ragan and Mandel

This case teaches us that when it comes to ESI stored in the cloud, even though the data is technically in the possession of the cloud provider, courts still view it as within the control of the owner. That means it’s discoverable, even if that is in the most basic format due to security issues.

Understand what you have before agreeing to a timeline

This case also teaches us about a common discovery failure: Defense counsel used over-broad search terms that yielded a “document dump” of around 50,000 mostly irrelevant documents and thus designated them all as “attorneys’ eyes only” because they didn’t have time to review them all.

Ragan and Mandel quite elegantly explain that “because the volume and variety of potentially relevant ESI is increasing exponentially … even what may seem at first blush to be a reasonably tailored search query may return a surprisingly large volume of ‘hit’ documents, making for a potentially expensive postsearch, preproduction review.”

They add to this by quoting Judge Peck’s statement on the selection of keywords in the famous Da Silva Moore case.

“In too many cases, however, the way lawyers choose keywords is the equivalent of the child’s game of ‘Go Fish.’ The requesting party guesses which keywords might produce evidence to support its case without having much, if any, knowledge of the responding party’s ‘cards’ … Indeed, the responding party’s counsel often does not know what is in its own client’s ‘cards.’”

Being prepared means understanding the whole universe of data so that any testing and sampling of keywords provides an accurate understanding of the burden of discovery before committing to a production timetable.

And before you think the “whole universe of data” comment is unrealistic in practice, I believe there is no reason why lawyers shouldn’t be able to be fully prepared. There are great eDiscovery tools available and courts are more than willing to limit or stage discovery to ensure it remains proportional.  I know I have used this tactic to my clients’ advantage in the past.

Don’t assume you know technology

So when you walk into a pre-discovery conference, take a leaf from the Scouts handbook: Be prepared!

Technology is changing and it’s a big risk to assume that what you knew about it five years ago is still the same today.  If you are unsure about a particular technology, bring in an expert.

Chances are, your current litigation support vendor has much broader technology experience than you do.  They will not only understand the technology, but will be able to share their experiences and pitfalls of working with data from particular systems, including the latest cloud systems. They will also have the tools to allow you to see all of your data so you can make informed representations to the court.

Using the understandings from being prepared early in a cooperative and transparent approach is the topic of a webinar we’ll be holding this week entitled “Embracing Proportionality in the Age of Big Data.” We’ve assembled a panel of legal experts including:

  • U.S. Magistrate Judge James Francis
  • Thought leader and practitioner Eric Mandel
  • Forensic technologist and special master Craig Ball
  • Former magistrate and Nuix’s director of client services, Roxanna Prelo Friedrich.

The webinar will be held on Wednesday November 12 at 9 am PST, 12 pm EST, 5 pm GMT. Register for this special discussion here.

Posted in eDiscovery

Developer diaries: Getting started with the Nuix Engine using Gradle

So you want to try out the Nuix Engine, and you’re in the mood for some prototyping.

“Give me my Maven information so I can get to work!” you think to yourself.

It’s not quite as easy as adding a more conventional Java dependency to a project, but with the right build automation tool it doesn’t have to be tediously difficult either. With that in mind, I recommend using another tool that makes the inclusion of the Nuix Engine into your stack a simple matter: Gradle.

The Engine makes use of all kinds of mechanisms that don’t fit into the traditional Java project. It finds complex objects (from a disk, a remote account, or even hidden in slack space) and decomposes them into parts that are easier to search, understand, and maintain. With this in mind, you can see how including the Nuix Engine in your project requires some out-of-the-box thinking.

A cat sitting in a cardboard box, thinking

Including the Nuix Engine in your Java project requires some out-of-the-box thinking. Photo: Stephen Woods

For the purposes of comparison, I’ve completed this project build as a standard Eclipse project, and alternatively as a Maven project. Both make the inclusion of the /engine/lib dir a multistep chore. Gradle, on the other hand, makes this inclusion relatively easy— and much less of a configuration issue. If you don’t have access to Gradle yet, take a moment to download and install it. If you are using a tool like Homebrew, a simple ‘brew install gradle’ will do the job!

Read more ›

Posted in Developers

Do you trust your archive? Part 2: The missing data

In our last installment I talked about how Nuix recovered one customer’s archive after the archive manufacturer and an API-based migration tool failed to find the data in the archive. Today I will tell you about just the opposite: The archive that reported much more data than was actually on the file system.

Last summer, during a particularly hot week in Missouri, we received a distress call from a partner who was having significant issues while attempting to migrate data out of an old archive using a traditional API-based migration tool. The partner had already been on-site with the customer for over six months. Email messages were migrating at the rate of a message per minute.

Keeping in mind there were over 200 million messages in the system, at that rate the migration would finish around the end of the year 2394. The API-based migration technology company’s support team was more optimistic, estimating it would take another two years to move a relatively small archive of just 13 TB. The partner revealed to us that if they don’t show immediate progress they would be fired and walked out of the account in just four days.

Nuix to the rescue.

We flew into St. Louis with a server and found we could start processing the customer’s data on day one. It took us eight days to process all of the available 13 TB of data. We then assembled all the users’ data (legal hold custodians first of course) and started exporting it out of the old archive. The project took us 19 weeks, which was 85 weeks ahead of the previous vendor’s estimate.

That was the good news. But Nuix completes rigorous chain of custody and forensic-level discovery on every migration project and we found that over 15% of the messages reported as archived were in fact missing completely from the storage subsystem.

Library index cards

The index says the files are there, but you can’t always trust the index. Photo: Åsmund Heimark

The archive’s internal indexes reported some 30 million messages as being available for migration, but the messages weren’t there due to a high level of system corruption. Over 25 thousand archive containers were empty and their contents could never be recovered.

Obviously the customer was disappointed but glad to know the truth about its data.

So the question is: Do YOU trust your archive?

Posted in Email and Archive Migration

TechEd Sydney and the PST menace

A guest post by Shane Jansz, Nuix’s Sales Manager for the Asia-Pacific region

Last week I attended the Sydney leg of Microsoft TechEd 2014. This was my sixth TechEd (counting the Melbourne event last month) but Nuix’s first as a sponsor to the event in Australia. We partnered with Mimecast, which has built a good business in this country making the Office 365 experience better.

More than 1,000 IT professionals attended the three-day event and about half of them stopped by the booth to discuss how and why Nuix makes pigs fly!

Nuix flying pig stuffed toys

Nuix’s flying pigs were a popular attraction at our TechEd booth. Photo: Jenny Savann

I had a lot of conversations about migrating from legacy archives like Symantec Enterprise Vault into Office 365. A lot of people said, “That’s great about the archive, but what about all the PST files my users have?” I’d ask them how many PST files, expecting maybe one per user, but they’d tell me each user might have five or more files each! If you look after a thousand users, that’s thousands and thousands of PST files.

How does that happen? It starts when IT departments place a limit on how much data users can store in their Microsoft Exchange mailboxes. This sounds like a great idea from an IT department perspective because they can predict how much storage they’ll need and not have to buy any more.

But that’s not how people work. Once they hit their limit, they don’t clean out their inboxes. Instead, they start archiving their old messages off to PST files.

If they store these PST files on their local hard drive, they’re not being backed up and they’re not searchable. That’s a big business risk. But if they store them on network files shares, they take up loads of space. They’re getting backed up but then you’re storing copies of copies of copies of multiple emails.

The message I heard a lot is that people have loads of emails that they can’t collect and can’t manage. Often they don’t even know where they are, let alone what’s in them.

So it was a real eye-opener for many people to learn that we have a collection tool that can crawl your network, hunt down those loose PST files and bring them together. With all the data in one place, you can start to understand the content, decide what you want to keep and generally put some structured governance around PSTs. Wouldn’t that be a relief!

Posted in Email and Archive Migration
Follow

Get every new post delivered to your Inbox.

Join 32 other followers