De-mystifying Machine Learning

I was a little surprised to see a post in a respected tech publication just the other day about how unfathomable machine learning is, and how unknown its impact is going to be. Agreed, machine learning is still unfamiliar to many people, and its potential is enormous. But maybe I can help demystify it a little by sharing some of my own experience applying machine learning in a real life situation.

I really dug into machine learning a few years back working on a marketing campaign concerning the use of analytics during the discovery phase of lawsuits. I got hands-on by downloading the somewhat-famous Enron emails, which I popped into a MySQL database server, and did a little poking around in them using Tableau. But what really helped me understand the power of machine learning was studying emerging e-discovery technology, culminating in a conversation with data scientist and entrepreneur Nicolas Croce (see the interview here).

documents

Before I share what I learned, first some background for those who aren’t already familiar with what the legal profession calls “discovery”. Discovery is the process by which lawyers are permitted to obtain evidence, including documents and electronic records, from their opponents. This is permitted under civil and criminal law so that the lawyers for both sides can assemble evidence that courts need to make good decisions. In a major legal action discovery can involve literally millions of documents and equivalent types of records (images, emails, database entries, etc.). Both sides must review these documents to identify which are important and why.

Continue reading “De-mystifying Machine Learning”