Skip navigation

The eMag Link Monthly Articles


May Edition

View previous articles
 

Making a molehill out of a mountain: Pre-culling reduces data pool – and related costs – prior to e-discovery
Are you sure of that date?


Making a molehill out of a mountain: Pre-culling reduces data pool – and related costs – prior to e-discovery


Advances in computer technology allow organizations to store more files, more documents and more correspondence than ever before. While these electronic records may well contain valuable historical information, the size of the data pool can make eventual discovery activities extraordinarily time-consuming and prohibitively expensive.

To reduce these mountains of data into manageable molehills, many organizations and law firms are implementing pre-culling strategies that allow them to minimize the data pool and produce only the most responsive materials at a greatly reduced cost. Pre-culling allows corporate officers, litigators, in-house counsel and computer forensic specialists to review raw data. The most advanced applications provide a structured view into otherwise unstructured data, allowing users to better assess the scope of case information prior to the start of the discovery process.

Companies archive huge data pools

Corporate America has drastically changed its approach to saving documents and files over the past generation. Ten or 15 years ago, business professionals would create a document, print it out, and periodically make decisions about whether or not to file the information. Now, with the increased reliance on computers, the decision no longer revolves around manually filing a document (a relatively passive undertaking), but instead focuses on actively deleting it (a deliberate decision). Basic human nature and the availability of giga-, tera- or petabytes of computer storage conspire to discourage most corporate professionals from hitting the "delete" button and destroying a file.

As a result, most companies have amassed huge volumes of archived materials, saved on hard drives or back-up tape media. Because this information is often stored remotely – out of sight and out of mind – organizations are overwhelmed when they are required to sort through the data pool to produce responsive documents during litigation or regulatory compliance activities. This process takes a great deal of time and requires that significant amounts of archived data be restored, processed and reviewed, resulting in spiraling e-discovery costs.

Pre-culling reduces restoration, processing costs

Using conventional means, organizations would be forced to employ a multi-step approach to prepare data for electronic discovery review, including tape restoration and subsequent processing activities like de-duplication, culling, keyword searches and data filtering. Each component takes time and may add thousands of dollars in associated expenses.

However, pre-culling provides a shortcut through this maze of activities by allowing users to view data structures and files in their raw native format prior to expensive restoration and processing. This data is sortable and searchable by a variety of fields, including subject matter, keywords, content, context, custodian, metadata and others. As a result, users can view their data from numerous perspectives, make early strategic decisions, and save time and cost by allowing exclusion of non-relevant raw data prior to media restoration and processing.

In addition, there are several processes that can be used for culling large tape populations. A header scan, for instance, produces all available information in the back-up header, including the back-up date and back-up software type, as well as the internal volume identifiers. This process allows the user to cull the tape population by a date frame – the two years prior to a new product launch, for instance. If a corporation has a total archive of 3,000 tapes, this type of culling activity alone may reduce the data pool significantly.

From there, a server scan can be run to identify all back-up clients (a.k.a. servers) containing responsive data of particular interest. During an intellectual property (IP) case, for instance, a server scan can identify servers germane to the matter and allow counsel to exclude irrelevant file systems – like unrelated human resource (HR) network shares. This, in turn, allows for another layer of reduction to the data pool – most likely cutting the number of tapes further.

Another culling process would be file-level catalogs which provide file names, document creation and modification dates, file or directory pathways, and other information contained on the tape. Users can pinpoint only those documents created or handled by custodians or individuals affiliated with the litigation or regulatory investigation, as well as identify which specific back-up tapes contain these relevant files. This, of course, whittles the data pool further.

By employing pre-culling processes and tools, the user ultimately may need to restore, process and extract specific information from only a fraction of the tapes represented in the original data pool – in the example above, perhaps only 50 tapes out of the original universe of 3,000. This allows counsel to find relevant files earlier in the discovery process and significantly reduces cost.

Benefits of pre-culling are far-ranging

Pre-culling offers advantages during a wide range of activities. Certainly, litigation costs may be contained by streamlining data restoration, processing and extraction. However, the sheer volume of archived data is not the only issue attorneys face during e-discovery.

Revised Federal Rules. Updates to the Federal Rules of Civil Procedure, effective Dec. 1, 2006, have added new pressures. For instance, the modified requirements have shortened the time frame between the date a suit is filed and the deadline by which opposing attorneys must conduct the initial Meet & Confer session. Attorneys now have only 99 days to grasp key materials and develop strategies prior to sitting down for these vital meetings.

Likewise, the updates have opened a Pandora's Box in regards to sources from which data must be produced. Before Dec. 1, the Federal Rules used only the amorphous term "documents," raising questions about how far this term stretched when it came to electronically created and stored materials that defied the conventional definition of "documents." Email was most certainly considered part of the equation, but the status of other forms of communication like voicemail and instant messaging programs remained ill-defined. The new Federal Rules more clearly resolve these issues by incorporating the term "electronically stored information" (ESI, for short), which encompasses everything from email, to voicemail, cell phone conversations, PDAs, thumb drives, iPods, instant messages and almost any other form of reasonable business communication.

The bottom line is that counsel must gain comprehensive knowledge about a growing body of information that may impact how they proceed with a case – only they now have less time in which to gain that understanding. Pre-culling allows users to understand the breadth of data available more quickly. The earlier attorneys get an overview of the subject matter, the more readily they can assess the merits and weaknesses of the case, and prepare their strategy accordingly.

Compliance and regulatory requests

Pre-culling offers similar benefits during response to compliance and regulatory requests. Amendments made to the Hart-Scott-Rodino Antitrust Improvements Act (HSR) in 2001, for instance, have prompted increased numbers of "second requests" for information. Parties involved in mergers and acquisitions are being asked to produce documentation to demonstrate that no inappropriate activities or collusion to manipulate the market have taken place.

Adherence to data retention policies

These applications can also provide a cost effective means to manage routine data storage. Most organizations have adopted data retention policies requiring that certain records be maintained for a specified period. For instance, a company may stipulate that email files be kept for one year. However, without an efficient and convenient method to identify and delete files stored beyond the requirement, the data may be kept indefinitely. And if the records are retained – even beyond stated retention policy requirements – the organization may be required to produce any portion of the archive during discovery or in response to regulatory and compliance requests.

Pre-culling, therefore, helps allow organizations to proactively sort and retire records in accordance with their own policies. This practice reduces the costs of data storage and frees up storage media for reuse. In addition, it considerably reduces the overall data pool, limiting the volume of information that could be deemed responsive during litigation or regulatory investigations.

(On a related note, organizations would also be well served to take advantage of litigation and electronic discovery preparedness consulting services. These firms help manage the murky waters of retention policies guidelines, readiness for potential litigation and how to respond – including implementation of litigation holds – when lawsuits or regulatory requests do arise.)

There can be little doubt that pre-culling is an idea whose time has come. Corporations, counsel and the courts are equally concerned about the expense associated with producing responsive data. Pre-culling tools and technology that effectively reduce the data pool and contain costs will be viewed as a welcome solution to an escalating problem.

Back to top

Are you sure of that date?

During a recent meeting, you wanted to know when a Word document was created so you looked at the properties of the file, by right clicking the file and clicking on the property tab. This provides the Last Accessed Time, Last Modified Date, and Created Date for the document. The question is, should you trust those dates for any decision-making?

Since you are the one who created the document, you feel comfortable that the dates are correct, but don’t be so sure. Whenever a document is moved from your thumb drive to your hard drive or back from the hard drive to the thumb drive, the “Created Date” will change-- this is because files are being moved from one partition to another and the Master File Table or File Directory Listing for the drive has to be updated. This is also true when a file is moved between two partitions or drive letters on your system’s hard drive.

What if instead of creating the document in question the file was given to you? You should feel even less comfortable or down-right cynical about the date. Remember, the Created Date will change when the file is moved from the other person’s system. How can this be checked? From a Microsoft Word Document, select the Tools tab, select the Macro folder and select Macro option. In the Macro function, select the Word Commands from the Macros In: drop down tab. Then go to the Macro named FileSummaryInfo and click the run button. The screen shows you the name of the document and the Author but not the Created Date. Click on the statistics button to reveal the creation date on the document with the last saved time and date is listed on the tab.

What if the file did not have a date on the FileSummaryInfo tab or you are still very suspicious of the person that gave you the document? What can you do to check this document further? The inner “techie” in you remembers the metadata reader program installed on your desktop/laptop. There are several programs that can read all of the metadata for a file, one of the most well known is the Metadata Assistant from Payne Consulting. Payne is a Microsoft Partner and also has many additional Metadata Assistant programs for other types of files like Lotus Notes or Groupwise.

With the Metadata Assistant (MA), you can check out many of the other attributes of the file to see if the data and dates correlate to the timeline and match the story that you have been told about the file’s history. The MA program will show you the Time Last Saved, Time Last Printed, Last Saved By, Revision Number, and the Last 10 Authors (if the information is available) among the other technical information about the file. However, one new problem is that not only can the MA can find the information, it can also erase the information from the file. This cleaning process is a common occurrence now, or should be, when sending files to someone that may be your adversary in court or even sending information outside of your own organization. How then, can you be sure that the information provided is accurate and unaltered?

Since there is no information on the file, the next step might be to use the information listed on the File–Properties-General tab (remember – right click and select the properties option). But wait; Febooti’s FileTweak program can change the File Created, File Modified, and File Accessed to anytime in the past or future. If the date does not seem right in the past or is in the future, what is the next step?

Now is the time to call in a Computer Forensic Expert so he or she can look at the entire computer system and report back to you about the authenticity of your document.

Now, ask yourself again if you are still sure of that date?

Back to top

This article may be re-published as long as the following resource box is included at the end of the article and as long as you link to the email address and the URL mentioned in the resource box:

Article by eMag Solutions. For more articles on eDiscovery and Data Restoration, subscribe to our e-mail Newsletter by sending a blank email to newsletter@emaglink.com or by going to http://www.emaglink.com.

Submit a request for an eMag rep to call you immediately.

Newsletter Signup
Sign up to our monthly newsletter.
Read latest newsletter.