As it is becoming apparent that, without drastic immediate action, we are going to significantly overshoot greenhouse gas emission targets and warm the planet by an environmentally disastrous 4 to 5 degrees centigrade by the end of the century, I thought I should fulfil my long-standing promise to myself and calculate the carbon emissions generated [...] → Read More: The environmental consequences of SIGIR
Hi FXPAL blogosphere. Among the odds and ends I do at FXPAL is help people present their works with video. It also falls to me to archive the videos themselves. As I periodically move the video to new storage servers, I tend to look over “the old family album.” Our family is in the business [...] → Read More: Mining the Video Past of Future Research: Is it worth a look?
My last post introduced the idea of the satisfiability of a post-production quality assurance protocol. We said that such a protocol is not satisfiable for a given size of the sample from the unretrieved (or null) set if the protocol were to fail the production even if the sample found no relevant documents. The reason [...] → Read More: Statistical power of E-discovery validation
In my last post, I examined the live-blog e-discovery production being performed by Ralph Losey, and asked what lower limit we could place on the recall of highly relevant documents with 95% confidence based on the final, quality assurance sample. The QA sample drew 1065 documents from the null set (that is, the set of [...] → Read More: Meaningful QA sample size in e-discovery
Those who are following Ralph Losey’s live-blogged production of material on involuntary termination from the EDRM Enron collection will know that he has reached what was to be the quality assurance step (though he has decided to do at least one more iteration of production for the sake of scientific verification). Quality assurance here involves [...] → Read More: Quality assurance samples and prior beliefs
In my last post, I discussed an experiment in which we had two assessors re-assess TREC Legal documents with less and more detailed guidelines, and found that the more detailed guidelines did not make the assessors more reliable. Another natural question to ask of these results, though not one the experiment was directly designed to [...] → Read More: Do document reviewers need legal training?
Social scientists are often accused of running studies that confirm the obvious, such as that people are happier on the weekends, or that having many meetings at work make employees feel fatigued. The best response is, what seems obvious may not actually be true. That, indeed, is what we found in a recent experiment. We [...] → Read More: Detailed guidelines don’t help assessors
Much later than I intended, after painstaking editing to get the length down from 39 to 31 pages, I’ve prepared a revised version of “Approximate Recall Confidence Intervals”, which is now in submission. Aside from tightening up the text and excluding a few inessential results, the main change from the first version has been to [...] → Read More: “Approximate Recall Confidence Intervals”, updated and in submission
Frequent readers of this blog will know of my burning desire to move IR research away from dry technical topics and towards questions that directly impact and excite the retrieval user. In pursuit of this goal, I have for the past year been working on a paper on estimating two-tailed confidence intervals for recall under [...] → Read More: Recall confidence intervals
Last week I was at SWIRL, the occasional talkshop on the future of information retrieval. To me the most important of the presentations was Diane Kelly’s “Rage against the Machine Learning”, in which she observed the way information retrieval currently works has changed the way people think. In particular, she proposed that the combination of [...] → Read More: Attention-enhancing information retrieval