What is Discovery? – explaining eDiscovery to non-lawyers

I met with a group of software developers earlier this week to talk about configuring a visual analytics solution to provide useful insights for eDiscovery. To help them understand the overall process I wrote out a short description of key concepts in Discovery. I omitted legal jargon and described Discovery as a simple, repeatable process that would appeal to engineers. If anyone has enhancements to offer I’d be happy to extend this set further.


Discovery is a process of information exchange that takes place during most lawsuits. The goal of discovery is to allow the lawyers to paint a picture that sheds lights on what actually happened. Ideally court proceedings are like an academic argument over competing research papers that have been written as accurately and convincingly as possible. Each side tries to assemble well-documented citations to letters, emails, contracts, and other documents, with information about where they were found, who created them, why they were created, when, and how they were distributed.

Discovery requests

The dead tree version of a law library.
The dead tree version of a law library.

The discovery process is governed by a published body of laws and regulations. Under these rules after a lawsuit has begun each side has the ability to ask the other side to search carefully for all documents, including electronic ones, that might help the court decide the case. Each party can ask the other for documents, using written forms called Discovery Requests. BOTH sides will need documents to support their respective positions in the lawsuit.


Documents are “responsive” when they fit the description of documents being sought under discovery requests. Each side has the responsibility of being specific, and not over-inclusive, in describing the documents it requests. Each party has the opportunity to challenge the other’s discovery requests as being over broad, and any dispute that cannot be resolved by negotiation will be resolved by the court. But once the time for raising challenges is over each side has the obligation to take extensive steps to search for, make copies of, and deliver all responsive, non-privileged documents to the other side.


Certain document types are protected from disclosure by privilege. The most common are attorney-client privilege and the related work product privilege which in essence cover communications between lawyers and clients and in certain cases non-lawyers working for lawyers or preparing for lawsuits. When documents are responsive, but also protected by privilege, they are described on a list called a privilege log and the log is delivered to the other side instead of the documents themselves.


A document ordinarily isn’t considered self-explanatory. Before it can be used in court it must be explained or “authenticated” by a person who has first-hand knowledge of where the document came from, who created it, why it was created, how it was stored, etc. Authentication is necessary to discourage fakery and to limit speculation about the meaning of documents. Documents which simply appear with no explanation of where they came from may be criticized and ultimately rejected if they can’t be properly identified by someone qualified to identify them. Thus document metadata — information about the origins of documents — is of critical importance for discovery. (However, under the elaborate Rules of Evidence that must be followed by lawyers, a wide variety of assumptions may be made, depending on circumstances surrounding the documents, which may allow documents to be used even if their origins are disputed.)

Document custodians

The term “custodian” can be applied to anyone whose work involves storing documents. The spoken or written statement (“testimony”) of a custodian may be required to authenticate and explain information that is in their custody. And when a legal action involves the actions and responsibilities of relatively few people (as most legal actions ultimately do), those people will be considered key custodians whose documents will be examined more thoroughly. Everyone with a hard drive can be considered a document custodian with respect to that drive, although system administrators would ordinarily be considered the custodians of a company-wide document system like a file server. Documents like purchase orders, medical records, repair logs and the like, which are usually and routinely created by an organization (sometimes called “documents kept in the ordinary course of business”) may be authenticated by a person who is knowledgeable about the processes by which such documents were ordinarily created and kept, and who can identify particular documents as having been retrieved from particular places.

Types of documents which are discoverable and may be responsive

Typically any form of information can be requested in discovery, although attorneys are only beginning to explore the boundaries of the possibilities here. In the old days only paper documents and memories were sought through discovery. (Note: of course, physical objects may also be requested, for example, in a lawsuit claiming a defect in an airplane engine, parts of the engine may requested.) As of today requests frequently include databases, spreadsheets, word processing documents, emails, instant messages, voice mail and other recordings, web pages, images, metadata about documents, document backup tapes, erased but still recoverable documents, and everything else attorneys can think of that might help explain the circumstances on which the lawsuit is based.

Discovery workflow

Discovery can be time consuming and expensive. Lawyers work closely with IT, known document custodians, and others with knowledge of the events and people involved in the lawsuit. First they attempt to identify what responsive documents might exist, where they might be kept, and who may have created or may have control over the documents that might exist. Based on what is learned through this collaboration, assumptions are made and iteratively improved about what documents may exist and where they are likely to be found. Efforts must be taken to instruct those who may have potentially responsive documents to avoid erasing them before they are found (this is called “litigation hold”). Then efforts are taken to copy potentially responsive documents, with metadata intact, into a central repository in which batch operations can take place. In recent years online repositories that enable remote access have become very popular for this purpose. Within this repository lawyers and properly qualified personnel can sort documents into groups using various search and de-duplication methodologies, set aside documents which are highly unlikely to contain useful information, then prioritize and assign remaining documents to lawyers for manual review. Reviewing attorneys then sort documents into responsive and non-responsive and privileged and non-privileged groupings. Eventually responsive, non-privileged documents are listed, converted into image files (TIFFs), and delivered to the other side, sometimes alongside copies of the documents in their original formats.

Early Case Assessment (also called Early Data Assessment)

Even before receiving a discovery request, and sometimes even before a lawsuit has been filed, document review can be started in order to plan legal strategy (like settlement), prevent document erasure (“litigation holds”), etc. This preliminary review is called “Early Case Assessment” (or “Early Data Assessment”).

UPDATE: I describe the sources and development of legal procedural rules for e-discovery in a later blog post, Catch-22 for e-discovery standards?

When employees leave, company information leaves with them

A good topic for a future blog post will be a review of the technology that might prevent this from happening: a recent study revealed

“Of about 950 people who said they had lost or left their jobs during the last 12 months, nearly 60 percent admitted to taking confidential company information with them, including customer contact lists and other data that could potentially end up in the hands of a competitor for the employee’s next job stint.


“Most of the data takers (53 percent) said they downloaded the information onto a CD or DVD, while 42 percent put it on a USB drive and 38 percent sent it as attachments via e-mail….”

Black CD compact disc and black removable USB driveSymantec, who commissioned this study (and which through a string of acquisitions has become a major vendor in the information management realm), just happens to be one of a number of software vendors who provide DLP (“data loss/leak prevention/protection”) solutions that can inhibit this sort of thing.

Meanwhile, over at RIM, the makers of the BlackBerry, the CEO isn’t shy about admitting that they record ALL company calls on the theory that everything employees say on the job is the company’s intellectual property.

I’m not an advocate for “big brother” work environments because I think there can be a strong relationship between genuine trust and employee productivity and creativity. Nonetheless, I have to admit that employees who are convinced that they will be held accountable for what they do with company information will be more conscientious about how they handle it.

Yet another topic for a future post will be examining how important information is misplaced when employees shift to new projects, positions, or companies.