Developer Diaries: Java 8 and the Nuix Engine

When we released Nuix 6, one significant change we made to the Nuix Engine was modifying it to support Java 8. If you are a Java enthusiast, you know that Java 8 has improved its ability to use multiple processor cores in parallel by using Collections.parallelStream().

The Nuix Engine can use this feature, but with a little hitch. The ItemUtility.findTopLevelItems() call returns a Set that doesn’t provide an implementation of the parallelStream method. To use the parallelStream, you need to swap the existing Set with one that has a parallelStream implementation.

Parallel pulleys and wires

In Java 8, developers can distribute resource utilization across multiple processor cores using parallelStream, but at a cost. Photo: Beshef

In the below example, I attempt to collect all of the GUIDs from a kind:email query, determine which of those are top-level, and generate a string of GUIDs joined together by OR (presumably for adding to a guid:() query).

List<Item> items =“kind:email”);
Set<Item> foundTopLevel = getUtilities().getItemUtility().findTopLevelItems(items, true);
Set<Item> setObjectThatImplemenetsParallelStream = new HashSet<>(foundTopLevel);
String joinedGuidsWithOR = setObjectThatImplemenetsParallelStream.parallelStream()
  .collect(Collectors.joining(" OR " ));

This lets the JVM use any available threads in the common pool for the activities against that stream. Doing this for a slow activity like fetching data from a Nuix proxy for an item is a good example of how it can be helpful to distribute resource utilization across cores.

There is a cost to using a parallel stream and I’m curious where the line is for getting positive value out of it. Perhaps my next entry!

Posted in Developers

#NuixIGChat: Having the facts is better than guessing

@juliecolgan A1: Info transparency is what allows you to make eyes-wide-open decisions and priorities. W/o it, you're just guessing.
Guessing games are best reserved for carnivals. Good information governance (IG) requires the facts to prioritize actions and investments. Relying on hearsay, industry examples and best practice only is rarely enough to compel action and investment by C-level executives. Seeing your information realities in detail can be just the thing you need to build a fact-based return on investment case and get started on governing information.

On Wednesday, October 15, Nuix hosted a twitter chat focused on information transparency and its role in the governance of information assets. Practitioners and thought leaders joined Nuix in a lively discussion that established information transparency as a gatekeeper to good information governance.

@DoloresGMadrid A1 - Without transparency you are making decisions without all the information.
Well said, Dolores!

Customers all over the world use Nuix technology to bring clarity to information threats and opportunities, to prioritize actions and to make sound information governance decisions.

One of my favorite questions was: “If you could ask any question of your data, what would it be?” The answers were thoughtful and sometimes downright funny!

@Mechiche A2: Dear Data, Please tell me a story. Preferably one that has a beginning, middle and end.
@RyanRStockton A2: Dear data, why is there so much of you?
@btblair A2: Why do you have to be, like, so gross all the time anyway?
There was no doubt that everyone who joined the chat learned something and had a bit of fun too.

Couldn’t make it?

Don’t worry, you can check out the conversation here. You’ll see the full discussion on the value of information transparency, search and rescue versus search and destroy, and defining eTrash for content cleanup.

Thanks to everyone who participated. Stay tuned for details on our next #NuixIGChat!

Posted in Information Governance

“Show All Children Metadata” gives you quick insights into SQLite and other structured data

A large number of very popular applications write their data into SQLite databases. This includes the Firefox web browser, the Skype communication client and the majority of the applications on the iOS and/or Android mobile operating systems.

SQLite is a single lightweight database file, but each application writes data in a different way. Just looking at the way they store dates and times, some use a human-readable format, while others use epoch dates or some other format that requires decoding. Many applications store data such as images and sounds as “blobs” or binary large objects.

Each application uses different fields and table structures, and rebuilding data often requires constructing queries across multiple tables. This requires knowledge of SQL database syntax.

As a result, it can be very complex and time-consuming to analyze data stored in SQLite databases.

Similar difficulties crop up when analyzing semi-structured data sources such as log files. Each log files stores the data elements in its own order.

Nuix has developed a handy shortcut to analyzing SQLite databases and many other structured and semi-structured formats including Microsoft SQL databases, Windows event logs, web server logs, Windows Registry keys and the Microsoft Internet Explorer index.dat file.

It couldn’t be easier to use. Just select the file in Nuix Workstation, right-click and choose “Show All Children Metadata.”

Selecting "Show All Children Metadata" by right-clicking an item in Nuix Workbench

In the results pane, you will then see each database or log file field and the data within each one in a table format.

The "Children Metadata" view, showing database fields in a table, in the Nuix Workbench results pane

Admittedly, there are some quirks and formats of SQLite databases that won’t work. But think about how much time it will save you to get a quick look across a large majority of databases without having to construct a single query.

Another advantage is you can select multiple tables or databases, then view and compare all the data together, just like running a join query in SQL. It’s dynamic to the data—whatever column headings are in the tables, they’ll show up in the results view. It’s also fast and easily repeatable—you don’t need to remember the query you wrote last time to access similar data.

Best of all, the data in each table is fully indexed and text searchable, with named entities extracted, so if a database or log contains the gold you’re looking for, you can quickly find it.

Posted in Digital Investigation, eDiscovery, Information Governance

When JavaOne 2014 took over San Francisco

A guest post by Colleen Clark, Global Director of the Nuix OEM Program

When whole streets, buildings, and public venues of a major US city get taken over by a single event, you know it’s a big deal. That’s exactly what the JavaOne Conference was: a chance to interact with thousands of talented developers and technologists, an opportunity to share ideas and collaborate, and an avenue for introducing the OEM Program to a very large audience— all in beautiful, downtown San Francisco.

Yes, JavaOne was a big deal for the Nuix team.

Photo showing a downtown San Francisco street dotted with JavaOne Conference sponsor structures and kiosks, plus cafe seating for attendees, as part of Duke’s Cafe

The Nuix team’s view at Duke’s Cafe.

Our experience at JavaOne

Nuix played a small part in the San Francisco takeover. In lieu of a traditional booth, we manned QuickeeCam Photo Station kiosks in Parc 55 and Duke’s Café. We took more than a thousand funny, goofy, awesome pictures with JavaOne attendees. (Data nerds may have been able to plot a direct correlation between the time of day and the goofiness of the pictures.)

We rubbed elbows with some impressive outfits, including Target’s coding team and the guys from Aldebaran Robotics, not to mention featured speaker Arun Gupta. Nuix Developer Jim Mowbray was especially excited to troubleshoot code and talk shop, face-to-face, with the Spring development team.

Photo Station kiosk at JavaOne Conference

Did you snap a pic at one of the Nuix Photo Stations?

It wasn’t all a party in the streets, though! We also attended educational sessions, learning new skills and sharpening our existing tools. Sales Engineer Jason Wells was a huge fan of “Eclipse Luna: Java 8 and More,” a talk given by Wayne Beaton, the director of Open Source Projects at the Eclipse Foundation. Beaton discussed the scope of the Eclipse Foundation and its roles beyond IDE provider, as well as new features in the recent Eclipse Luna release.

Dan Berry, our director of Integration Services, found a ton of benefit in the session “Programming Lambda Expressions,” which provided basics on using lambdas to write declarative code instead of imperative code. Dan says the session was full of quality technical information, a perfect mixture of simple and complex examples, and included “a compelling demonstration of how streams in internal iterators can be easily turned into parallel streams to add concurrency, something that was potentially complex and confusing in previous versions of Java.”

Picture Your Future in NYC contest winner

Exclusively created for JavaOne attendees was the Nuix “Picture Your Future in NYC” contest. Developers at JavaOne submitted their ideas for an application to work on top of the Nuix Engine. The entries were judged by Caroline Kvitka, editor-in-chief of Java Magazine.

We are proud to announce that the winner of the contest is Dustin Henry, Senior Programmer Analyst with FedEx. Dustin’s idea is for an app that provides an up-to-date list of allergenic items on any restaurant’s menu based on the type of allergy the user specifies. Dustin and a guest will take a four-night, all-expenses-paid trip to New York City to work on the development of his app with the Nuix team!

Our sincere thanks and appreciation go to Ms. Kvitka for her time and expertise in evaluating the contest entries. I encourage you to check out her fine work and the contributions of her amazing staff in Java Magazine.

I hope you had as much fun and found as much benefit at JavaOne as we did. The Nuix team looks forward to seeing you in San Francisco next year!

Posted in Developers

Backoff: propagation and possible authorship

The Nuix Cyber Threat Analysis Team has recently discovered a piece of malware that is responsible for propagating the newly discovered Backoff point of sale (PoS) malware family. This post will describe the malware in detail, and infer potential authorship of Backoff based on details we found during analysis.

Backoff is a PoS malware family that the United States Computer Emergency Readiness Team (US-CERT) announced in an alert at the end of July 2014 (see this PDF for further details).

Here are technical analyses of the malware from SpiderLabs, Trend Micro and Fortinet.

The malware is responsible for logging keystrokes and scraping memory for card data. It also has a command-and-control component for updates and data exfiltration.

Malware propagation

We have discovered a piece of malware that is responsible for propagation of Backoff after an initial machine has been compromised. The specific sample that is being discussed has the SHA1 hash of:


We found that the malware had been uploaded to VirusTotal (and can be found here).

Static analysis of the file quickly revealed that two Microsoft Windows executables are embedded within the malware.


We extracted these files from the original malware and discovered the following files:

PSExec 1.98 (7540FED53C1FF761F926F2D4289858D1A567AF8F)
Backoff 1.55 ‘net’ (66C83ACF5B852110493706D364BEA53E48912463)

PSExec is a utility system administrators often use to execute commands or install programs remotely.

The specific flow of the malware is quite straightforward:

  • Generate a five-uppercase-letter executable name in the victim’s %TEMP% directory.
  • Drop the PSExec file to this location.
  • Generate another five-uppercase-letter executable name in the victim’s %TEMP% directory.
  • Drop the Backoff malware to this location.
  • Find any connected machines using the Microsoft Windows ‘net view’
  • Push Backoff to any connected machines using the PSExec utility.

I’ve also expressed this flow of execution in a visual form below.

NetworkSpreader execution flowchart

NetworkSpreader execution flowchart

The PSExec command used to run Backoff on connected machines is as follows:

[path_to_psexec] [hostname] -accepteula -d -c [path_to_backoff]

The ‘-accepteula’ parameter ensures that the utility’s EULA acceptance screen is not displayed. The ‘-d’ parameter runs Backoff in non-interactive mode. In other words, PSExec will not wait for Backoff to finish running. Finally, the ‘-c’ parameter copies the Backoff binary to the remote machine.

The malware has a helpful undocumented feature. By passing the argument of ‘debug’ when the executable is run, the propagation malware will create and write debug statements to a ‘dbg.txt’ text file. This file is generated in the same directory the malware is run from.

Overall, I think you’ll agree that the malware in question is quite simplistic. However, this simple technique proves quite effective; it accomplishes its goal in a very clean way. If I had to speculate, I’d guess that this file was likely written for a one-off situation that arose in the midst of a compromise.

Potential Backoff authorship

There has been lot of speculation about who is behind the Backoff malware family. From the TrendMicro write-up, it looks like the installation routine was taken directly from the Alina malware family. Previous research performed by Xylitol provided further evidence that Alina and Dexter share a common link.

Dexter and Alina are both PoS malware families that have been responsible for a large number of breaches in recent years. You can find more information about Dexter from Arbor Networks, Seculert and SpiderLabs, and more about Alina from Xylitol, Sophos and SpiderLabs.

Additionally, the author of the Dexter family goes by the handle ‘Dice’.

The sample we identified contains an interesting debug string that is generated when the malware is compiled:

Debug string discovered in NetworkSpreader

Debug string discovered in NetworkSpreader

This debug string ‘C:\Users\dice\Desktop\networkspreader\networkspreader\Release\networkspreader.pdb’ provides a wealth of information. The path that this file was compiled from indicates that this binary was named ‘networkspreader’, which would have been a clue into its functionality had we not already analyzed it. Additionally, the ‘C:\Users’ path indicates that this file was compiled from a Microsft Windows Vista or higher operating system. Finally, and arguably the most important detail, the username of the author appears to be ‘dice’—the very same handle that was linked to the Dexter malware.

Further, compile timestamps show that this network spreading malware was compiled a mere seven minutes after the encrypted Backoff 1.55 ‘net’ sample that was embedded. Additionally, the un-encrypted version of Backoff had a timestamp three minutes prior to the encrypted copy.

  • NetworkSpreader Compile Timestamp: 2014-04-29 13:23:36 -0600
  • Encrypted Backoff 1.55 ‘net’ Compile Timestamp: 2014-04-29 13:16:37 -0600
  • Unencrypted Backoff 1.55 ‘net’ Compile Timestamp: 2014-04-29 13:13:54 -0600

This information further adds to the evidence that ‘dice’ is the author behind Backoff, or at the very least, has access to the Backoff source code.


Overall, this malware sample was quite interesting. While the actions this malware performs are anything but sophisticated, it provides an interesting glimpse of how the Backoff malware family spreads on a compromised network after the initial compromise. Additionally, debug strings found within the ‘networkspreader’ malware strongly suggest that the author of Dexter may also be behind the Backoff malware family.

Posted in Cybersecurity

The Nuix Engine: Speed is our not-so-secret weapon

It’s 2:00 am; you can’t sleep so you flip on the TV. Every channel seems to be showing those late-night infomercials you always get sucked into watching. “Why yes, I do hate spending 10 minutes of my day chopping vegetables,” you start thinking. For that low price of $19.99 I would love a magic bullet mixer to get those 10 minutes of my life back. I could watch more infomercials.

The products are different but the pitch is always the same: it’s the fastest, most efficient product on the market, no exceptions. So you give in and buy it. And it breaks the first time you use it.

As seen on TV

Most time-saving claims don’t stack up, but Nuix can prove it’s the fastest. Photo: Mike Mozart

Late-night infomercial companies will tell you what you want to hear to sell their products. Many times, their word doesn’t hold true. But with the Nuix engine, I can say it is many times faster than any other software product out there and back it up with facts. Whether the task at hand is data migration, eDiscovery, information governance or investigations, the results are still the same. Using the Nuix Engine’s parallel processing technology allows data to be processed at record-breaking speed.

Take the example of one of the largest city governments in the United States. The city found itself only 13% through an archive migration project after more than 12 months with an API-based migration technology. Nuix accelerated the city’s data migration rates eight times faster than the previous API-based technology. And that was using a quarter the number of servers, further strengthening the city’s return on investment.

One more story of note: A large state government agency had started a legacy archive migration project with an API-based migration vendor. After 15 months the job was only 20% complete and the vendor was projecting it would take another two years. The agency had provisioned eight physical servers for the original project at a significant cost. Nuix completed the migration in 18 weeks, using only three servers. We saved the customer at least two years.

Shaving two years off a project timeline is a considerable achievement, don’t you think? You’d get to look like a hero and achieve or exceed your projected ROI. And think how much more time you’d have to watch infomercials!

Posted in Email and Archive Migration

Execute your Nuix scripts remotely with REST

Over the years Nuix has developed powerful ways to automate and access the functions of the Nuix Engine, first within the Workbench interface and then externally.

We started in 2009 with the Nuix Scripting API, which enabled customers to develop scripts using Ruby and EMCAScript (JavaScript). (You can now also use Python.) It gave regular users tremendous power to customize and automate their workflows. However, when we started talking about scripting, our litigation support customers associated it with CPL, a proprietary and difficult-to-use language that only worked with Concordance. Likewise, our investigative customers associated it with EnCase EnScript, another proprietary language.

Nuix proved that scripting didn’t need to be dirty word by extending our API and embracing a real scripting language—anyone who can buy a Ruby book has the tools to automate Nuix. This opened a huge door for our customers and partners to automate their workflows.

Over the next few years we added Java and RESTful APIs to provide access from enterprise and web applications. Java is an awesome language, but it is not trivial to learn. This bothered me. I felt like the ability for anyone to extend Nuix was starting to erode. The scripting API was still there and going strong, but most of the Nuix OEM conversations were having centered around the Java and REST APIs.

That all changed while we were working on a quick-turn integration project with a customer. We needed to add new capabilities in the engine that didn’t really map neatly in the REST model. From necessity, we created the “user-scripts” element of the REST API, which enabled users to run any script they had created through a REST request. This completely flattened the knowledge gap and put the power back in the hands of the scripters.

Don’t get me wrong, the REST API is incredibly powerful—we’ve used it to build amazing applications like Nuix Director and Nuix Web Review & Analytics, and you can too. But for the folks who just want to execute their scripts more effectively, this is a massive win.

Last month we ran an all-day developer session at our inaugural User Exchange. The majority of the attendees had a scripting background, with a few experienced in REST and one in Java. It was awesome to watch as someone asked a question about when a feature would be available in REST, and the response from the other attendees was “You can just script it until the feature is available in REST.”

So with all that in mind, here are the examples I ran through. Read more ›

Posted in Developers

Get every new post delivered to your Inbox.

Join 29 other followers