Reusing document clustering categories to spend less on eDiscovery?

After drafting a blog post about mass data sampling and classification in the “cloud,” I became curious about the potential for reusing categories developed in eDiscovery sampling and classification projects as “seeds” for later projects. For further insight I turned to Richard Turner, Vice President of Marketing at Content Analyst Company, LLC, a document clustering and review provider for eDiscovery.

schl¸sselBruce: I wonder to what extent document categories that are created using document clustering software when reviewing documents for eDiscovery can be aggregated across multiple document requests and/or law suits within the same company. Can previously developed categories or tags be reused to seed and thus speed up document review in other cases?

Richard: Regarding the notion of aggregating document categories, etc., it’s something that’s technically very feasible. And it could greatly speed document review if categories could be used to “seed” new reviews, new cases, etc. Here’s the challenge: we have found that most of the “categories” developed by our clients start-out case specific, and are too granular to be valuable when the next case comes along. It also hasn’t seemed to matter whether categorization was being used by a corporate legal department or an outside counsel – they’re equally specific.

The idea itself had merit, so we tossed it around with our Product Solutions Architects, and they came up with several observations. First of all, the categories people develop are driven by their need to solve a specific eDiscovery challenge, i.e. documents that are responsive to the case at hand. Second, when the next issue or case comes along, they naturally start over again, first by identifying responsive documents and then by using those documents to create categories – any “overlap” is purely coincidental. Finally, to develop categories that were really useful across a variety of issues or cases, they would need to be fairly generic and probably not developed with any specific case in mind.

I think that’s very hard to do for a first or even second-level review – it’s not necessarily a natural progression, as people work backwards from the issues at hand. Privilege review, however, could be a different animal. There are some things in any case that invoke privilege because of the particulars of the case, for example, attorney-client conversations which are likely to involve different individuals in different litigation matters. There are other things that could logically be generic – company “trade secrets” for example would almost always be treated as privilege, as are certain normally-redacted items such as PII (personally-identifiable information). Privilege review is also a very expensive aspect for eDiscovery, since it involves physical “reads” using highly-paid attorneys (not something you can comfortably offshore). Could “cloud seeding” have value for this aspect of eDiscovery? It’s an interesting thought.

One Reply to “Reusing document clustering categories to spend less on eDiscovery?”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s