Tape Indexing Breathes Life Into Tape Storage

Here’s an observation that can be tagged “mixed blessings”: foot dragging on the part of techno-lagging attorneys has shielded (and in some cases continues to shield) their clients from the full potential weight of eDiscovery requests. For example, even after years of discussion, the legal profession didn’t formally recognize the obligation to produce metadata in response to discovery requests before the Federal Rules of Civil Procedure amendments adopted at the end of 2006. More outrageously, some attorneys are still gaming to avoid eDiscovery all together, as Magistrate Judge John M. Facciola (U.S. District Court, Washington D.C.) pointed out in his keynote presentation at LegalTech earlier this year.

Only a few years ago certain courts had ruled that data stored on tape could be considered “inaccessible” because it was so expensive to review it, and thus data stored on tape did not always need to be reviewed when answering an eDiscovery request (for example, the Zublake decisions). More recently, however, the legal profession is becoming aware of advances which make tapes faster and cheaper to review, like technology for rapid disaster recovery.

What IT person doesn't look forward to working with historic data?
What IT person doesn't look forward to working with historical data?

There are still a number of fine distinctions being made in this area of law, and the specific tape handling practices of different companies can render their tapes more or less “accessible.” (Ironically, companies that archive backup tapes indefinitely, which sounds like a safe practice, may be exposing themselves to a greater burden in eDiscovery, not to mention the extra cost of storing outdated tapes.) But broadly speaking, few companies storing information on tape can categorically rely on “inaccessibility” to rule out the risk of being required to review their tapes during eDiscovery any more. For more about the law concerning inaccessibility, including California’s burden-shifting rules, I recommend this article by Winston & Strawn attorneys David M. Hickey and Veronica Harris.

Fortunately, two prongs of innovation are shrinking the issues surrounding eDiscovery and tape. The first prong, which happens to be the subject of this blog post, comes in the form of new tape indexing and document retrieval technology. The second solution, which involves substituting hard drives in place of tape, will be the subject of a future post.

To learn more about the current state of eDiscovery technology in the realm of tape, I recently spoke with Jim McGann, Vice President of Marketing at Index Engines. Index Engines’ solution comes in the form of an appliance (a hardware box pre-loaded with their software) that scans a broad variety of tapes and catalogs the content. The appliance indexes tape data and de-duplicates documents within the index using the hash values of the documents. At this point users can cull (selectively retrieve) potentially responsive documents from a batch of ingested tapes without first performing an expensive, resource-intensive full restoration of each tape. And because Index Engines can ingest all of the common tape storage formats, users don’t need to run or even possess the original software used to write to the tapes.

From a longer-term strategic perspective, Index Engines’ users can approach their tape stores incrementally, taking a first pass through their tapes in response to a particular discovery request, then add to their global tape index as new discovery requests are fielded. They can embark upon a proactive tape indexing campaign that will give them enhanced early case assessment capabilities. Users may also opt to extract important data that is not immediately needed but resides on old or degraded tape.

For companies with thousands or tens of thousands of tapes, indexing can allow significant numbers of tapes to be discarded since many individual tapes typically contain data which is almost entirely repeated on other tapes or has lasted past the end of its retention period – not to mention the corrupted or blank tapes which are being carefully stored nonetheless.

All of this makes Index Engines an extremely affordable (at least by Enterprise standards) alternative to restoring and reviewing tapes individually.

I asked Jim McGann whether Index Engines resembled dentists who teach patients good dental hygiene and, if successful, will wind up putting themselves out of a job. If Index Engines’ appliances succeed in indexing, de-duplicating, and extracting all of the stored tape in existence, while ever more affordable hard drive storage replaces tape storage, won’t the company be out of a job?

Jim pointed out that, for certain organizations which currently rely on tape storage, substituting hard drives for tape drives is simply not a viable option. Costs associated with re-routing system data and human work flows, as well as the risk of downtime during a transition, mean that many organizations won’t switch even after disk drives become less expensive. And Index Engines takes away much of the cost incentive for switching that would otherwise be driven by eDiscovery and compliance requirements. Finally, Jim says, Index Engines can be used to index almost all of the information customers have, not just tape data, which enables users to find non-tape information that must be reviewed for eDiscovery.

The other approach to the problem of tape storage (to be explored in more detail in a future blog post) involves near-line hard drive solutions. Leading hard drive storage vendors such as Isilon Systems claim that their “near line” solution is priced nearly as low as tape while offering higher performance and reliability. But advocates of tape, including the Boston-area based Clipper Group (in a whitepaper offered on tape drive vendor SpectraLogic’s web site) claim that the total costs of ownership of disk storage, taking into account factors such as floor space requirements and electricity, is still many, many times higher than tape.

So, as tapes look like they will be around for some time to come, companies with tapes will continue to need technologies like Index Engines’. And most will not be able to avoid discovery of tapes for much longer, if they are even still able to do so, thanks in part to the availability of these technologies.

PREVIOUS POST: The Evolution of eDiscovery Analytics Models, Part II: A Conversation with Nicholas Croce