Contrary, perhaps, to expectations, I have not devoted much of this blog over the years to actual descriptions of the technology used in eDiscovery. My subject was commentary on rules and practice around the world, and I cheerfully surrendered hours in dark demo rooms in favour of talking to people who were actually doing things. There was more value, I decided, in hearing from lawyers and service providers about the issues raised by eDiscovery, and about the work being done to solve them, than in looking at user interfaces.
There are other reasons for not writing too much about the technology: I am not a user; I have always declined to get involved in system selection; and I have no comparisons to offer from my own recent experience. What has always interested me is persuading lawyers to look at the available solutions, to make themselves aware of what the market offers, and to see if their own businesses might benefit from using one or more of them. That battle largely won now, but it took a while to get there.
I spoke recently to AJ Shankar, CEO of eDiscovery software company Everlaw. His opening shot was that doing what you need to do and enjoying what you do are not incompatible objectives. The expectation from eDiscovery, he said, is one of misery, but it does not have to be like that, and one can get pleasure from using the right software.
The context was Everlaw’s new Clustering software which is designed to complement all Everlaw’s search functions, with a user interface designed to be enjoyable to use while handling very large volumes of documents.
This picks up on two of the themes from my last interview with AJ Shankar, in December of last year. He talked then of “enterprise-grade functionality crossed with user-grade look and feel”, and about lowering the point at which it was obvious to any legal department or law firm that some technology must be used for discovery.
The “enterprise-grade” point refers to the ability to handle complex functions, with very large volumes of mixed-format data and to do so securely as well as quickly. The need for speed is a part of the user experience – if your very pretty interface takes any noticeable time to redraw, then the user’s experience, as well their efficiency, degrades. The academic world produces ever more clever search tools, but they do not stand up to the real-world demands of electronic discovery, where volumes, the pressures of time and cost, and the need to keep users alert, all impose higher demands.
AJ observes that “discovery” is not limited to the formal obligations of litigation. It is needed whenever you want to get an answer or have data which you need to understand. It is more complex than that, however – we are not necessarily just seeking the answer to a question but effectively saying “I don’t know what I have here, help me find something I should know”. In a neat phrase, AJ said “You don’t know how high-value your opportunities are”.
When you put it like that, the decision about when to use technology looks rather different. Cost comes into it, of course, as does proportionality, but lawyers are rightly cautious about leaving stones unturned. The existence of technology capable of turning many stones quickly lowers the decision-making bar.
AJ said that Everlaw’s design (and its pricing structure) supports the idea that technology should be introduced at an early stage. Everlaw offers data visualisation tools from the start across documents of all kinds, including OCRd images. From this, he said, you “get a sense of the lie of the land” and then decide if you want to go further. Clustering, he said, is designed for “true discovery, not just litigation”.
What does Everlaw’s Clustering bring to this objective of discovery? Most of the commentary about the new functionality has compared it with Google Maps. This is right, but it is worth explaining why this is so because it involves more than simply being user-friendly.
Whatever you think of Google, the Google Maps experience is generally a positive one. You can start at a very high level – the whole world perhaps – and go down to individual buildings, panning in any direction and zooming in or out depending on your needs. Look at a town. Go down to street level. Zoom out to see the surrounding streets. Add some search criteria to better inform your view – a reminder from this that the map part is merely the visual and two-dimensional product of multiple sources of data.
Assuming a reasonable Internet connection, the exercise is fluid and useful. You might easily forget the mass of data which underlies this exercise and makes it possible. The labels which inform your view derive from enormous databases. At a different level, colours differentiate between one type of data and another. Some data – traffic flows, for example – are constantly updated by the input of others, adding a dynamic element to the exercise.
All this has parallels in eDiscovery. The raw data (the documents and the metadata) are constantly supplemented by input from users and, perhaps, by provisional conclusions of predictive coding and other analytic tools. All that is pretty meaningless as raw data in the form of files, folders and lists. It is potentially overwhelming when considered as thousands of dimensions projected into a two-dimensional space.
Clustering is not a new idea. Its attraction is the ability to group similar material in one place and draw attention to exceptions and outliers that are very similar to other documents – the potentially privileged document lurking unseen, or the document lying on the margin between two clusters. Integration is the key – as AJ put it “You don’t want to travel out to clustering, find stuff, and take it home”. You need a level of integration which allows the use of any tool in the box to widen or narrow a search result.
Reviewers don’t have time to keep going back over old ground, re-reviewing documents they have already seen. It is sometimes helpful, nevertheless, to examine the same data through different lenses – in a different review component, or because a flag is raised as criteria change. The Google Maps parallel is again helpful – it gives you different views of the same data (Map types), there are overlays (Cycling, Street View), new labels, links and photographs, and dynamic content (changing traffic conditions). Periodic updates (Everlaw issues these regularly and seamlessly) may throw new light on previously-seen documents.
However advanced the technology, it is useless without the enterprise-grade underlying technology to which AJ referred, supplemented in Everlaw’s case by drawing on the user’s own GPU (Graphics Processing Unit) to aid screen re-rendering as the view changes. What is critical to the user is the delivery of all the data in a form which is both meaningful and perhaps even pleasurable to use – even over hotel wifi.
AJ was in the UK when I spoke to him, which gave an excuse to talk about the UK market and the differences between it and the US. I started this blog in 2007, at a time when US discovery providers were just beginning to discover significant differences between their home market and the UK – not only was there less litigation, but document volumes were lower, and the rules were less onerous than in the US, reducing the incentive to find better ways of getting the work done.
Much has changed since then, partly because of increased data volumes, partly because of regulatory activity, and partly because of a sensible focus on finding the right documents rather than disclosing all of them. The fundamental problem is the same, AJ said. Everyone has to “look at haystacks to find needles”. There is no realistic alternative to the use of some kind of technology to comply with disclosure obligations, and (which AJ sees is as equally important) to find information for all kinds of purposes.
If you have to do it, you might as well get some pleasure from it or, at least (to go back to AJ’s opening comment in our interview), not find it a misery.
The Everlaw Clustering press release.
Everlaw blog post: Everlaw Clustering: 3 Use Cases To Boost Efficiency During Review
There is a short video illustrating Everlaw’s clustering.