|
|
By william, on April 24th, 2013
Copied with permission from IREvalEtAl
There has been a flurry of interest the past couple of days over Judge Miller’s order in re Biomet. In their e-discovery process, the defendants employed a keyword filter to reduce the size of the collection, and input only the post-filtering documents to their vendor’s predictive coding system, which seems to be a frequent practice [...] → Read More: What is the maximum recall in re Biomet?
By william, on April 17th, 2013
Copied with permission from IREvalEtAl
Point- and lower-bound confidence estimates on the completeness (or recall) of an e-discovery production are calculated by sampling documents, from both the production and the remainder of the collection (the null set). The most straightforward way to draw this sample is as a simple random sample (SRS) across the whole collection, produced and unproduced. However, [...] → Read More: Stratified sampling in e-discovery evaluation
By william, on January 25th, 2013
Copied with permission from IREvalEtAl
Readers of Ralph Losey’s blog will know that he is an advocate of what he calls “multimodal” search in e-discovery; that is, a search methodology that involves a mix of human-directed keyword search, human-machine blended concept search, and machine-directed text classification (or predictive coding, in the e-discovery jargon). Meanwhile, he deprecates the alternative model of [...] → Read More: Does automatic text classification work for low-prevalence topics?
By william, on December 8th, 2012
Copied with permission from IREvalEtAl
A question that often comes up when discussing e-discovery validation protocols is, why should they be based on confidence intervals, rather than point estimates? That is, why do we say, for instance, “the production will be accepted if we have 95% confidence that its recall is greater than 60%”? Why not just say “the production [...] → Read More: Why confidence intervals in e-discovery validation?
By william, on December 3rd, 2012
Copied with permission from IREvalEtAl
As it is becoming apparent that, without drastic immediate action, we are going to significantly overshoot greenhouse gas emission targets and warm the planet by an environmentally disastrous 4 to 5 degrees centigrade by the end of the century, I thought I should fulfil my long-standing promise to myself and calculate the carbon emissions generated [...] → Read More: The environmental consequences of SIGIR
By william, on September 4th, 2012
Copied with permission from IREvalEtAl
My last post introduced the idea of the satisfiability of a post-production quality assurance protocol. We said that such a protocol is not satisfiable for a given size of the sample from the unretrieved (or null) set if the protocol were to fail the production even if the sample found no relevant documents. The reason [...] → Read More: Statistical power of E-discovery validation
By william, on August 13th, 2012
Copied with permission from IREvalEtAl
In my last post, I examined the live-blog e-discovery production being performed by Ralph Losey, and asked what lower limit we could place on the recall of highly relevant documents with 95% confidence based on the final, quality assurance sample. The QA sample drew 1065 documents from the null set (that is, the set of [...] → Read More: Meaningful QA sample size in e-discovery
By william, on August 8th, 2012
Copied with permission from IREvalEtAl
Those who are following Ralph Losey’s live-blogged production of material on involuntary termination from the EDRM Enron collection will know that he has reached what was to be the quality assurance step (though he has decided to do at least one more iteration of production for the sake of scientific verification). Quality assurance here involves [...] → Read More: Quality assurance samples and prior beliefs
By william, on July 15th, 2012
Copied with permission from IREvalEtAl
In my last post, I discussed an experiment in which we had two assessors re-assess TREC Legal documents with less and more detailed guidelines, and found that the more detailed guidelines did not make the assessors more reliable. Another natural question to ask of these results, though not one the experiment was directly designed to [...] → Read More: Do document reviewers need legal training?
By william, on July 2nd, 2012
Copied with permission from IREvalEtAl
Social scientists are often accused of running studies that confirm the obvious, such as that people are happier on the weekends, or that having many meetings at work make employees feel fatigued. The best response is, what seems obvious may not actually be true. That, indeed, is what we found in a recent experiment. We [...] → Read More: Detailed guidelines don’t help assessors
|
|