How to Start Your Organization’s Data and AI Ethics Program

Introduction to a 4 Part Series

Let’s suppose your organization (or some part thereof) has decided to take a more principled approach towards the data and/or algorithms it uses by establishing ethics-based ground rules for their use. Maybe this stems from concerns expressed by leadership, legal counsel, shareholders, customers, or employees about potential harms arising from a technology you’re already using or about to use. Maybe it’s because you know companies like Google, Microsoft, and Salesforce have already taken significant steps to incorporate data and AI ethics requirements into their business processes.

ethics principles

Photo by Kelly Sikkema on Unsplash

Regardless of the immediate focus, keep in mind that you probably don’t need to launch the world’s best program on day one (or year one). The bad news is that there is no plug and play, one-size-fits-all solution awaiting you. You and your colleagues will need to begin by understanding where you are now, visualizing where you are headed, and incrementally building a roadway that takes you in the right direction. In fact, it makes sense to start small—like you would when prototyping a new product or line of business—learning and building support systems as you go. Over time, your data and AI ethics program will generate long term benefits, as AI and data ethics increasingly become important for every organization’s good reputation, growth in value, and risk management.

In the following 4 part series about initiating a functional data and AI ethics program we will cover the basic steps you and your team will need to undertake, including:

Part 1: Recruit a Task Force to Build a Data and AI Ethics Program

Part 2: Educate Your Organization About Data and AI Ethics

Part 3: Create a Map of Potential Data and AI Ethics Hot Spots

Part 4: Test Your Data and AI Ethics Program

 

Next – Part 1: Recruit a Task Force to Build a Data and AI Ethics Program

Machine Learning & AI for Non-Technical Businesspeople (Part I)

Part I: What is Machine Learning? Combining Measurements and Math to Make Predictions

The labels “machine learning” and “artificial intelligence” can be used interchangeably to describe computer systems that make decisions so complicated that until recently only humans could make them. With the right information, machine learning can do things like…

• look at a loan application, and recommend whether a bank should lend the money
• look at movies you’ve watched, and recommend new movies you might enjoy
• look at photos of human cells, and recommend a cancer diagnosis

Machine learning can be applied to just about anything that can be counted/measured, including numbers, words, and pixels in digital photos.

pattern
What is it we see in this photo? How would you describe the details that let us recognize this? (Photo by Mike Tinnion on Unsplash)

What makes it different is that, with machine learning computers don’t need humans to write out incredibly detailed instructions about how to identify bad loans, good movies, cancer cells, etc. Instead, computers are given examples (or goals) and math, and Continue reading “Machine Learning & AI for Non-Technical Businesspeople (Part I)”

Lessons in Agile Machine Learning from Walmart

Takeaways from Sam Charrington’s May, 2017 interview with Jennifer Prendki, senior data science manager and principal data scientist for Walmart.com


I am very grateful to Sam Charrington for his TWiML&AI podcast series. So far I have consumed about 70 episodes (~50 hours). Every podcast is reliably fascinating: so many amazing people accomplishing incredible things. It’s energizing! The September 5, 2017 podcast, recorded in May, 2017 at Sam’s Future of Data Summit event, featured his interview with Jennifer Prendki, who at the time was senior data science manager and principal data scientist for Walmart’s online business (she’s since become head of data science at Atlassian). Jennifer provides an instructive window into agile methodology in machine learning, a topic that will become more and more important as machine learning becomes mainstream and production-centric (or “industrialized”, as Sam dubs it). I’ve taken the liberty of capturing key takeaways from her interview in this blog post. (To be clear, I had no part in creating the podcast itself.) If this topic matters to you, please listen to the original podcast – available via iTunes, Google Play, Soundcloud, Stitcher, and YouTube – it’s worth a listen.


Overview

Jennifer Prendki was a member of an internal Walmart data science team supporting two other internal teams, the Perceive team and the Guide team, delivering essential components of Walmart.com’s search experience. The Perceive team is responsible for providing autocomplete and spell check to help improve customers’ search queries. The Guide team is responsible for ranking the search results, helping customers find what they are looking for as easily as possible. Continue reading “Lessons in Agile Machine Learning from Walmart”

EU Guidelines on Using Machine Learning to Process Customer Data

Summary: Every organization that processes data about any person in the EU must comply with the GDPR. Newly published GDPR Guidelines clarify that whenever an organization makes a decision using machine learning and personal data that has any kind of impact, a human must be able to independently review, explain, and possibly replace that decision using their own independent judgment. Organizations relying on machine learning models in the EU should immediately start planning how they are going to deliver a level of machine model interpretability sufficient for GDPR compliance. They should also examine how to identify whether any groups of people could be unfairly impacted by their machine models, and consider how to proactively avoid such impacts.


In October 2017, new Guidelines were published to clarify the EU’s GDPR (General Data Protection Regulation) with respect to “automated individual decision making.” These Guidelines apply to many machine learning models making decisions affecting EU citizens and member states. (A version of these Guidelines can be downloaded here—for reference, I provide page numbers from that document in this post.)

The purpose of this post is to call attention to how the GDPR, and these Guidelines in particular, may change how organizations choose to develop and deploy machine learning solutions that impact their customers.

Continue reading “EU Guidelines on Using Machine Learning to Process Customer Data”

Machine Learning Enterprise Adoption Roadmap

Be it the core of their product, or just a component of the apps they use, every organization is adopting machine learning and AI at some level. Most organizations are adopting it an ad hoc fashion, but there are a number of considerations—with significant potential consequences for cost, timing, risk, and reward—that they really should consider together.

That’s why I developed the following framework for organizations planning to adopt machine learning or wanting to take their existing machine learning commitment to the next level.

machine learning adoption roadmap preview
click to enlarge

 

Define: Identify opportunities to adopt machine learning solutions in every part of your organization.

Does your organization have well-defined problems that can be solved using machine learning? Continue reading “Machine Learning Enterprise Adoption Roadmap”

Making the Scene With SharePoint 2010 Enterprise Social Media Features

Social sharing - how is it different in the workplace?

I discovered an interesting video recently while helping a client demonstrate how users of a SharePoint document management system can share information about the documents they are managing. The video is by Michael Gannotti, a technology specialist at Microsoft, and it apparently shows how Microsoft uses SharePoint 2010’s social media features in-house. The video covers other SharePoint 2010 features as well, but I found 2 segments particularly relevant.

Social Media features in SharePoint (from timestamp 6 minutes 49 seconds to 15 minutes 50 seconds):

  • people search — users can find people who are experts on the subjects they’re researching;
  • publishing — via wikis, FAQS, and blogs;
  • user home pages — users can fill out their own profiles, add types of content, see their friend and group feeds;
  • viewing other users’ pages — users can find out more about co-workers and their work;
  • adding meta-information — tagging, liking, and adding notes or ratings to alert others about the relevance of content to oneself, to a project, or to a topic; and,
  • publishing (blogging) options — users can post to SharePoint either via a rich web-based text authoring environment or direct from a Word document.

Using One Note For Sharing (from timestamp 17 minutes 34 seconds to 18 minutes 34 seconds):

  • can create the equivalent of wikis and FAQs;
  • is web-editable;
  • may be better for printing; and
  • can also be used offline.

Here’s his video (hosted by Vimeo):

Other resources

Another useful resource concerning social media in SharePoint 2010 is this blog post by Microsoft Senior Technical Product Manager Dave Pae at TechEd earlier this month: http://community.bamboosolutions.com/blogs/sharepoint-2010/archive/2010/06/07/live-from-teched-overview-of-social-computing-in-sharepoint-2010.aspx. This also links to a post about social search which not only discusses the types of content (including meta-information) which can be searched, but also covers phonetic search capabilities: http://community.bamboosolutions.com/blogs/sharepoint-2010/archive/2010/06/07/live-from-teched-in-new-orleans-what-s-new-in-enterprise-search-in-sharepoint-2010.aspx.

Social media tactics are transforming corporate knowledge management

New tools are putting the collaboration into “collaboration software” by creating a social media-inspired user experience for Enterprise knowledge management. But it’s taken a long time to get here.

Old school collaboration: floppy-net and shared drives

Bruce WilsonUntil around ten years ago, when people talked about using software for “collaboration” in an Enterprise setting they usually meant transferring files point-to-point by email or handing off a diskette, aka “floppy-net” (or worse, by passing paper that would require re-typing). Advanced collaboration involved establishing shared “network drives” where documents could be stored in folders accessible to everyone on the local network. But under this “system” for collaboration, even when people devoted a significant amount of time to maintaining document repositories it could be difficult for others to find useful documents, or even know whether useful documents existed in the first place. Labeling was limited, document sets might be incomplete or out of date, authors, owners, or other contextual information might be unclear. Much like the internet before Google-quality search, folks could spend a lot of time browsing without getting any payoff.

The New York Public Library

Such collaboration systems are still quite common even though they aren’t very efficient because of the way that they rely on limited personal connections, memories and attention spans. In such a system the best strategy when hunting for a document is to ask around to try to figure out who might know where to find useful documents. People asked for help – if they have time – try to remember what documents are available, then either hunt through the repository themselves or point towards likely places to look. This system obviously doesn’t scale very well because there is a linear relationship between the number of documents being managed and the time and expertise required to manage them. Emails sent by people seeking help finding information can become a significant burden, particularly in the inboxes of the most knowledgeable or best connected. And because managing documents in this system is relatively time consuming and unrewarding, most people have little incentive to use or contribute to document management. Countless document repositories under this model suffered from neglect or abandonment simply because they were so impractical. And unless a critical mass of use and contribution is achieved, the appearance that a repository is abandoned or neglected in turn reduces the incentive of new or returning community member to participate. Instead people would rationally choose to “reinvent the wheel”, recreating documents or processes from scratch simply because the barriers to finding out whether what they need already exists are too high.

SharePoint and other web-like Information Management solutions

The rise of the internet has helped propel Enterprise collaboration forward, thanks in part to a new generation of internet-inspired collaboration software exemplified by Microsoft’s SharePoint. Sharepoint offers features such as alerts, discussion boards, document libraries, categorization, shared workspaces, forms and surveys, personal pages and profiles, and the ability to pull in and display information from data sources outside of SharePoint itself, including the internet (“web parts”).  Access controls have also evolved, enabling people to have access to the files and directories that pertain to them, while limiting access to others. Meanwhile data storage capacity has exploded, costs have plummeted, and access speed has rocketed. Naturally, for most organizations the volume of documents being managed has ballooned exponentially. But we still need to ask: have knowledge management and collaboration scaled in proportion to the volume of information that is available and could be useful if more people could get their hands on it?

Notwithstanding features like Enterprise search, notifications, and improved metadata, many information management hubs are, in effect, still data silos where information is safe and organized but inconvenient to explore and share. In truth, despite powerful automated solutions now available, effective collaboration is still largely dependent on the quality of user participation.

Adoption and Engagement

Two key elements of effective collaboration are adoption, which corresponds to the percentage of team members who are able to use the system, and engagement, which corresponds to how many of them use the system regularly.

For a collaboration system to be effective it must maintain a critical mass of active users or risk becoming ignored and thus irrelevant. There’s a chicken and egg relationship here. A collaboration system must achieve and maintain a critical mass of adoption and engagement to be self-sustaining. Few people are going to adopt and engage if nothing of value is happening on the system because not enough other people have adopted and engaged. To attract this level of participation the experience should be easy (low frustration), useful (practical results are usually obtained), and emotionally rewarding (users experience satisfaction or even enjoy using it). Otherwise a collaboration system risks turning into a quiet information cul de sac no matter how impressive its technology.

Enter social media

Lessons learned from the social media phenomenon – examining the virtual footprints of the hundreds of millions of people using Facebook – are radically enhancing Enterprise knowledge management by promoting ease of use, practical results, and emotional gratification within collaboration systems. To get more information about this development I recently met with J. B. Holston, CEO of NewsGator, whose Social Sites solution adds Facebook-like features to SharePoint. Available for only 3-1/2 years, Social Sites’ committed customers already include Accenture, Novartis, Biogen, Edelman, and Deloitte, among others.

The basic idea behind Social Sites (my take, not necessarily J. B.’s) is that SharePoint users experience less frustration, find better quality material, and receive more emotional gratification when their SharePoint experience is more like Facebook. And because a social media approach to collaboration is both useful and gratifying, more people use the collaboration system – adoption increases – and they use it more often for more purposes – engagement increases. Teams get more done while having more fun. Additional benefits of a social media overlay on top of a standard SharePoint install is that it to draws attention to and promotes increased use of available resources and encourages users to find out about and experiment with collaboration options they weren’t using before, which may convert them into more valuable collaborators themselves.

Social Sites extends the functionality of SharePoint in a number of respects. The first generation of Social Sites added features including:

  • marking and tagging items;
  • providing custom streams of their “friends” activity updates (imagine keeping up with important developments with key people down the hall, in other regions or departments as they happen);
  • making it easier to move content in and out of SharePoint; and
  • making it easy for people to connect with the people who posted specific items with a single click.

The latest generation of Social Sites offers even more features (70 webparts in all are available), such as:

  • “liking”;
  • comment;
  • ratings;
  • idea development (“ideation”);
  • wikis;
  • threaded conversations;
  • bookmarks;
  • feeds;
  • the ability to follow people and events;
  • automatic updates when specific things of interest happen;
  • the ability to ask questions;
  • the ability to make requests; and
  • the ability to pass word along about things that are happening.

An open API makes it possible to customize activity streams open to groups of users that is also accessible from mobile devices. Social Sites also lends itself to community management and governance.

As icing on the cake, Newsgator also offers iPhone and iPad applications for Social Sites to enable everywhere, all of the time mobile interaction with SharePoint (including Social Sites social media features), completing the Facebook-like user experience.

For companies already using SharePoint, Social Sites allows them to upgrade their team’s collaborative performance without fundamentally reengineering their current knowledge management systems. For example the way information is stored and structured and integrations like workflows can be preserved. They can also avoid the costs of migration, retraining employees on new systems, or hiring specialists to manage the new systems. On the flip side, to the extent that Social Sites upgrades SharePoint to make it competitive with, or superior to, other collaboration options, the combination improves SharePoint’s attractiveness to companies considering swicthing over from competing knowledge management solutions. Finally, customers who seek to make this level of interaction widely available within their organizations may buy even more SharePoint licenses and invest in more customization.

Special thanks to J.B. Holston @jholston and Jim Benson @ourfounder for many of the ideas and information that found their way into this post.

Forensic archiving and search of web 2.0 sites

Bruce Wilson

As someone who has worked on a number of web application development projects over the years I understand the challenges of web content management and archiving more than most folks. Thus at LegalTech NY earlier this year I was particularly impressed by a vendor in the web archiving space called Hanzo Archives.

Many of us are familiar with a service called the Internet Archive (more commonly known as “The WayBack Machine”) which offers snapshots of previous versions of thousands of web sites, even small ones. It’s fun, and sometimes useful for information gathering, but hardly rises to the level of detail most of us would hope for in a litigation or compliance scenario.

What Hanzo does is take the idea of archiving web sites to a forensic level by comprehensively recording the content of a web site, including Flash and other non-html content, at frequent intervals. Once recorded, site archives are fully searchable and web content can be “replayed” exactly as it was published on a particular date, all in a manner that can be authenticated in court.

This fall I had the privilege of speaking with Mark Middleton, founder and CEO of Hanzo Archives, to satisfy my curiosity about what his product is capable of and who is using it.

Bruce: Mark, thank you for arranging to speak with me. I think I have a general understanding of what your archives do, but let me start off by asking you for some use cases that illustrate who needs your product and what they need it for.

Mark: Actually we have two products now. We are defining a new product, WebHold, which is a streamlined and simplified derivative of our existing ‘Enterprise’ service. We have advanced so far in the past couple of months that we are now able to collect the most insanely complicated web sites, where by comparison archiving something like financial services sites is simple.

To answer your question, our use cases would include litigation support and brand heritage. The common thread here is that, increasingly, companies are communicating and advertising to their audiences using web technologies. Whereas a company historically has been able to capture their communication in print or broadcast relatively easily, they are unable to do this for their web content and so for the first time in decades they have major communication channels they cannot capture for the future. One of the world’s most successful brands in the Food and Beverage industry has selected Hanzo for this purpose.

Here’s your legal use case. We have a prospect whose target audience for communication and advertising is young people. Our prospect communicates with them in sophisticated ways on the web – videos, games, surveys, and animation. They offer very sophisticated messaging about their products, put it on many websites, in order to communicate their sophisticated brand to their customers. How does one capture that?

At the same time, in their words, they anticipate regular litigation about their products. They’ve got every other avenue covered – print, TV ads, materials provided on premises, but they cannot do web – they cannot rely on the “WayBack Machine” – none of the rich media is recorded there. They can do backups – but how do you recreate a web site from a backup, how do you prove it was the one that was live on a particular date? We can capture their content on a regular basis, save into secure containers, can prove the content in the containers is authentic and original, and can be recreated in our archive system to look exactly as is did when it was live. So companies can enjoy the same level of confidence that they have in their other channels.

Bruce: OK, again I think I understand, but to help explain it to others can you give me some concrete examples?

Mark: Here are a couple of examples from prospects and clients.

One is an investment house with a variety of products. Their website contains a mixture of historic performance data and propositions to entice the investor to buy into their investment product. Traditionally these kinds of offers were made in print prospectus documents, regulators would require this to be filed with the regulator. Specific to the web, the companies are now making this offer in a unique way. People can select an investment product, and pull up a calculator showing what returns you might be able to see. They can see graphs of performance, plus recent opinions of analysts, all on the same site. But because it is a dynamically generated web content, as it is presented to the user, there is no capture of what someone sees anywhere. And so what has happened historically is that people take companies to court saying that the company is “not performing as to expectations as per your offer.” So now, because of this possibility, companies need to capture the web site experience so that they can prove it was reasonable and not misleading.

We’ve also received a lot of interest from pharmaceutical companies. Because of claims about their products and performance – drugs often don’t perform for an individual the way they perform statistically – these companies face potential class actions based on perceived underperformance. Advertisements in magazines and TV can all be captured, but web sites are much more difficult.

Bruce: I’ve worked on sites where the content keeps changing and have some idea of how messy it can be to try to figure out how it was at a earlier date. But how is Hanzo’s solution different from the current state of the art?

Mark: We have spoken with a U.S. pharmaceutical company who had to resurrect product information from their webistes, even though it had been brought down years before. Maybe the judge gave them two months to resurrect it. They had to locate and re-hire former staff, building a team of 30 people to handle the project. Once they found the hard drives that were needed that were stored in a cupboard in a basement somewhere, they then had to rebuild from code on up. That is an extreme case of course. But generally, when relying on backups or information stored in a content management system they have to reconstruct physical server infrastructure, and server software, including licensing, before they can even start with the content. But with Hanzo they don’t need any of that, they archive it independently and it can be reviewed immediately, on demand.

Bruce: OK, so by my way of thinking this is the same issue as disaster recovery—to be efficient you want to have a hot backup, not simply the opportunity to recreate your site from bare metal.

Also on our agenda for this conversation, you mentioned you have a new product? This an SaaS product, did I understand this correctly?

Mark: Yes, we’ve been working on a new product – it’s called WebHold. We still only do web archiving. First, a little background. Most institutions that archive websites rely on software called web crawlers. We’ve used several web crawlers from the open source community and have also developed our own. In the last few years we have done a lot of research and had opportunity to archive some very complex sites, and have developed tech that exceeds existing crawlers. But we still have kept to standard archive files to be consistent with standards, even though now putting multi-media, etc. in them. So what we’ve managed to do is this. With our technology we can capture sites very easily. Particularly for customers who have compliance requirements but are not cash rich, we can offer something effective and great value for money. We have come up with a product that will archive websites on a daily basis on a level of quality that will meet compliance from an archival level. This is something we can offer to FINRA [US]or FSA [UK] regulated companies with a high degree of reliability. It’s fully SaaS. Customers submit their sites, which are crawled, then the results are made available to the customer, and we archive their sites every day.

Bruce: What about archiving inside a firewall?

Mark: The regulatory requirements are to archive public facing websites that present advertising and offers that go to the consumer. For Enterprises it is also possible to archive intranets inside the firewall using Hanzo’s crawlers and access systems. We can do it as a SaaS using either a VPN or as an appliance. Offering our products as an appliance was the result of opportunities we had to capture collaborative web platforms on corporate and government intranets.

Bruce: In an earlier blog entry about disaster recovery I learned that one underpublicized form of disaster is when a third party SaaS business goes under, thus cutting a company off from its data. Apparently this happens in niche verticals every so often. Can Hanzo be rigged up to capture a company’s data on an SaaS, say Salesforce.com? Not that I’m predicting that they’re going away any time soon….

Mark: Hanzo can archive some simple web based apps already. It’s a departure from standard architecture of crawlers. But we can do that collaboratively with a client.

Bruce: How about pulling data out of a third party SaaS for the purposes of eDiscovery?

Mark: Hypothetically speaking, if it was for some reason undesirable or not possible for a third party SaaS provider to produce the data themselves Hanzo could be used to get data for eDiscovery from a SaaS system.

IT crapshoot: cost-cutting is costly in disaster recovery, archiving for e-discovery

Disaster recovery and archiving are key zones of interaction for IT and Legal Departments. When a lawsuit is filed and an e-discovery production request is received, a company must examine all of its electronically stored information to find documents that are relevant to that lawsuit. Court battles may arise regarding the comprehensiveness of the examination, the need to lock down potentially important documents and metadata, and the cost of identifying, collecting, preserving, and reviewing documents — all of which are related to the way in which data is stored.

Photo credit: Josie Hill

Photo credit: Josie Hill

With this in mind, I recently sought out Jishnu Mitra, President of Stratogent, a specialized application hosting and disaster recovery services provider, to obtain his perspective on disaster recovery best practices and the relationship between disaster recovery and e-discovery. Key points he made include:

  • effective disaster recovery sites are “hot” sites that can be used for secondary purposes rather than remaining idle;
  • “cold” sites are unlikely to get the job done and are not cheap;
  • efforts to keep IT budgets down by delaying or limiting disaster recovery, or by limiting archiving, can backfire;
  • budget-conscious IT departments are more likely to use archiving features built-in to their software of choice;
  • many IT and Legal personnel have a habit of being disrespectful towards one another and doing a poor job of communicating with one another;
  • more crossover Legal-IT people are needed.

Bruce: Can you provide a little background about Stratogent’s domain expertise?

Jishnu: We offer end-to-end application hosting services, including establishing the hosting requirements and architecture, hardware and software implementation, and proactive day-to-day application management including responding to any issues that arise. Most of the time we are tasked with building a full data center, not the building itself, of course, but a complete software and hardware hosting framework. We aren’t providers of any specific business application (like salesforce.com does). We design, deploy and operate all the layers on which modern business applications are hosted including the application’s framework e.g. .NET, Java or SAP Basis.

Our customers include multi-office companies, who require applications shared between offices, and web-based application SaaS (“software as a service”) companies. The scope is typically quite complex – we don’t build or manage general web sites or blogs — that’s a commodity market and too crowded. We build and manage custom application infrastructures for enterprises or for complex applications that require a range of IT skills to manage. Our customers hire us because they don’t want to budget to hire all of the people they would need to do this internally, or when they are deploying a new application that is beyond the current reach of their IT team. For example, if a company wants to start using a new-to-them ERP [“Enterprise Resource Planning”] application like SAP or (say) a Microsoft based enterprise landscape that needs to scale, we can multiplex our internal pool of talent to give their application 24-7 attention far cheaper than the company can hire and retain the specialized employees they need to do it themselves.

Bruce: So you supply the specialized competencies needed to build and operate complex application environments so that your customers can focus on their core competencies? Then their core competencies don’t need to include what you do in order for them to succeed.

Jishnu: Yes. They know what they want, they conceptualize what they want, but not the hardware they need and the infrastructure software. We can go in from the very beginning saying, “Here’s how you set up a highly available, clustered server farm for your social networking app,” and so on and so forth. We know how to customize it and set it up. They also don’t have our expertise in negotiating with hardware vendors, or in capacity planning, etc. Plus there’s the build phase, loading OSes, etc. We essentially give them over the course of our engagement the entire hosting framework on which the app runs and then take care of it for the long run.

Once we get their hosting framework to a steady state, they get to run with it for two, five or longer number of years with no or limited failure. So their role is conceptualizing on day 1 and then we become a partner organization worrying about how to realize that dream, handling inevitable IT break-fix issues and managing changes over the entire life span of that system. Disaster recovery usually becomes part of that framework at some point.

Bruce: Can you give me some broad idea of the scope of disaster recovery work that you do?

Jishnu: Disaster recovery is not a separate arm of our business. It’s very integral to the hosting services we provide. We build disaster recovery sites at different levels of complexity. It can go from a small customer up to a really large customer. And over time Stratogent gets into innovative approaches to deal with disaster recovery. The philosophy of Stratogent is that we’re not trying to sell a boxed solution to all the customers. It’s more of a custom solution, not a mass market product. We say we will architect and host your solution – and as architect we always add very specific elements for our customer, not just one solution for everyone.

The basic approach, even for small customers, is to choose a convenient and correct location for the disaster recovery site and use a replication strategy based on whatever they can afford or have tolerance to accept. As much as possible a disaster recovery site should be up and running and ready to go at a flip of the switch. They can use the excess capacity at their disaster recovery site at quarter end to run financial reports or for other business purposes, plus it can be used for application QA and staging systems. They can be smart about it, and keep it on, so that they can have confidence in it.

Of course a disaster recovery solution like this can’t be built in just a month or two – to do it right requires creativity and diligence. In one recent instance when asked to do it ”right now”, we had to go with a large vendor’s standard disaster recovery solution for our customer. Everyone knows that this does not get us anything beyond the checkmark for DR, so the plan is to go to a Stratogent solution over time, build a hot alternate site on the East coast, and sunset the large vendor’s standard disaster recovery arrangement.

Bruce: Given the importance of disaster recovery for a number of reasons, how seriously are companies taking it?

Jishnu: Everybody needs it, but it suffers from “high priority, low criticality”, and the problem rolls from budget year to budget year. Some unpleasant trigger like an outage, or an impending audit instigates furious activity in this direction, but then it goes on to the back burner again. In the recent instance, although disaster recovery was scheduled for a later phase for technical reasons, for SOX compliance the auditor demanded a disaster recovery solution by year end or our customer would fail their audit. So we went out and obtained a large vendor’s standard disaster recovery solution, which met the auditors’ requirements but isn’t comparable to a “hot” disaster recovery site.

The way disaster recovery solutions from some of the large vendors work is this: they have huge data centers where their customers can use equipment should a disaster happen. Customers pay a monthly fee for this privilege. When a disaster strikes, customers ship their backup tapes out there, fly their people out there, and start building a disaster recovery system from scratch. And by the way, if you have trouble here’s the menu for emergency support services for which they will charge you more. And in 95 out of 100 cases it just doesn’t work, but is a monumental failure when you need it most. These are “cold” sites that have to be built from the ground up. It takes maybe 72 hours to get them up, rather to be asserted as “up”. Then, as someone like yourself with application development experience knows, it takes weeks to debug and get everything working correctly. And when you’re not actually using them, standard disaster recovery services are charging you an incredibly high amount of money for nothing except the option of bringing your people and tapes to their center, and then good luck.

Bruce: You mentioned running quarterly financials, QA, and staging as valuable uses for the excess capacity of “hot” disaster recovery sites. Could this excess capacity also be used for running e-discovery processes when the company is responding to a document production request?

Jishnu: Possibly, but I haven’t seen it done yet in a comprehensive manner. The problem is you still need to have the storage capacity for e-discovery somewhere. The e-discovery stuff is a significant chunk of storage, maybe tier 2 or 3, which demands different storage anyway, so it makes sense to keep the e-discovery data in the primary data center because its easy and faster to copy, etc. That said, it is very useful to employ the capacity available in the secondary site for e-discovery support activities like restoring data to an alternate instance of your application and for running large queries without affecting the live production systems.

Bruce: Do you deploy disaster recovery solutions that protect desktop drive, laptop drive, or shared drive data?

Jishnu: As I have said, our disaster recovery solutions are part of whatever application frameworks we are hosting. We as a company don’t get into the desktop environment, the local LANs that the companies have. We leave that to local teams or whatever partner does classic managed services. We do data centers and hosted frameworks. We don’t have the expertise or organizational structure to have people traveling to local sites, answering desktop-related user queries, etc. But any time it leaves our customer’s office and goes to the internet, from the edge of the office on out it’s ours.

Bruce: But when archiving is part of the customer’s platform hosted by you, it gets incorporated in your disaster recovery solution?

Jishnu: Yes.

Bruce: Is Stratogent involved when your customers must respond to e-discovery and regulatory compliance information retrieval requests?

Jishnu: Yes. For example, we recently went through and did what needed be done when a particular customer asked for all the documents in response to a lawsuit. We brought in a consultant for that specific archiving system as well. Our administrators collaborated with the consultant and 2 people from the customer’s IT department. It took a couple of weeks to provide all the documents they asked for.

Bruce: Was the system designed from the outset with minimizing e-discovery costs in mind?

Jishnu: Unfortunately no. In this case archiving for e-discovery was an afterthought and was grafted on to the application later and a push-button experience wasn’t in the criteria when designing this particular system. But it woke us up. We realized this could get worse.

Bruce: So how do you do it differently now that you’ve had this experience?

Jishnu: Here we recommended to our customer that we upgrade to the newest version of the archiving solution and begin using untapped features that allow for a more push-button approach. Keep in mind that e-discovery products weren’t as popular or sophisticated as you see them now.

Bruce: Aren’t there third-party archiving solutions also?

Jishnu: There are several third-party products and you see the regular enterprise software vendors coming out with add-ons. We’re especially looking forward to the next version of Exchange from Microsoft, where for us the salient feature is archiving and retention. Only because email is the number one retrieval request. On most existing setups getting the information for a lawsuit or another purpose takes us through an antiquated process of restoring mail boxes from tape and loads of manual labor. It’s pretty painful, it takes an inordinate amount of time to find specific emails, its not online, it takes days. For this reason we’re looking forward to Exchange 2010 which has features built INTO the product itself. Yes, some other vendors have add-on products that do this also.

Bruce: And I assume you’re familiar with Mimosa, in the case of Exchange?

Jishnu: – Like Mimosa, yes. But when it’s built-in the customer is more likely to use it. By default customers don’t buy add-ons for budgetary reasons. It’s so much easier if the central product has what we need, and that is in fact happening a lot these days. I won’t be surprised if products in general evolve so that compliance and regulatory features get considered integral parts of the software and not someone else’s problem.

Bruce: Do you have other examples of document retrieval from backups or archives?

Jishnu: Actually there are three scenarios where we do document retrieval. Scenario one, which we discussed, is e-discovery. Scenario 2 is when we have seen retrieval requests in acquisitions, mergers and acquisitions, and we had to pretty much get information from all sorts of systems, a huge pain.

Scenario 3 is SaaS driven. For many of our customers, the bulk of their systems are either on-premises or hosted by Stratogent, but some of our customers use SalesForce.com or one of many, many small or industry specific SaaS vertical solutions. In one recent case, one of these niche vertical SaaS vendors, because of some of the issues in that industry, was about to go out of business. We had to go into emergency mode and create an on-premise mirror, actually more like a graveyard for the data, to keep it for the future, to enable us to fetch the data from that service. We figured out a solution for how to get all the customers’ data and replicate and keep it in our data center and continuously keep it up to date. Fortunately the vendors were cooperative and allowed access through their back door to allow us to achieve this. I call this “the SaaS fallback” scenario. SaaS is a great way to quickly get started on a new application, but BOY, if anything happens, or if you decide you aren’t happy, it becomes a data migration nightmare and worse than an on-premises solution because you have no idea how it’s being kept and have to figure out how to retrieve it through an API or some other means.

Bruce: In e-discovery and other legal-driven document recovery scenarios, how important is collaboration between IT and Legal personnel, or should I say, how significant a problem is the lack of this collaboration?

Jishnu: I’ve seen the divide between IT and legal quite often. Calling it a divide is actually being polite; at worst both parties seem to think the others are clueless or morons. It’s a huge, huge gap. And I have also seen it playing out not just in traditional IT outfits, but also product based companies when I was principal architect at Borland. When attorneys came to talk to engineering about IP issues, open source contracts or even patent issues, there was no realization among the techies that it was important. In fact legal issues were labeled “blockers” and the entire legal department was “the business prevention department”. And there is exactly the opposite feeling in the other camp with how engineering leaders don’t “get it” and how talking to anybody in development or IT was like talking to a wall. The psychological and cultural issues between IT and legal have been there for a while. In some of the companies that have surmounted this issue, the key seems to be having a bridge person or team acting as an interpreter to communicate and keep both sides sane. Some technical folks I know have moved on to play a distinctly legal role in their organizations and they play a pivotal role in closing the gap between legal and IT.