The session track of TREC

This year's TREC conference had several interesting sessions, and not the least interesting of them were the planning sessions for next year's tracks. The design of a collaborative retrieval task, and of the methods and measures for evaluating such a task, can provoke a more wide-ranging, philosophical discussion than the presentation of retrieval results,

Review utility and reviewer gender

Another interesting paper at CIKM was "Inferring Gender of Movie Reviewers: Exploiting Writing Style, Content, and Metadata" (no link because the ACM digital library is currently broken), by Jahna Otterbacher. Working with a crawl of top-rated IMDB reviews for popular movies, the paper builds a logistic regression model for predicting reviewer gender. Features

Reverted indexes considered further

Gene and Jeremy have disputed two assertions about index reversion made in my previous post. The first is that (1,10) normalization strips term occurences of their IDF factor. The second is that the cutoff depth used in creating the reverted corpus sets an upper bound on reverted term DF.

Let me say immediately that

Let me say immediately that [...] → Read More: Reverted indexes considered further

Reverted indexes and true relevance feedback

I was fortunate enough at CIKM not only to meet the two bloggers cited in my thesis, namely Gene Golovchinsky and Jeremy Pickens of FXPAL (the latter now of Catalyst), but also to hear Jeremy present one of the more interesting papers of the conference, "Reverted Indexing for Feedback and Expansion" (co-authored with Gene and

Assessor error in legal retrieval evaluation

Another year, another CIKM. This marks my first post-PhD publication (I finally submitted!), and it also marks a new sub-genre of retrieval evaluation for me: that of legal retrieval, or more specifically e-discovery. Discovery is a process in which party A produces for party B all documents in party A's position that are

How (and why) not to rank academics

The recently-launched Microsoft Academic Search, a product of Microsoft Research Asia, has made a bit of a splash as a potential competitor to Google Scholar. Although its coverage does not seem as detailed as Google Scholar quite yet, MS Academic Search has a number of additional features, such as author and conference pages,


The latest iteration of the NTCIR effort, NTCIR-8, has concluded. NTCIR is a collaborative information retrieval forum, focusing on tasks in East Asian languages (currently Chinese, Japanese, and Korean, plus English); it is otherwise known as "the Asian TREC". The proceedings are available online.

This year was my first at NTCIR.

This year was my first at NTCIR. [...] → Read More: EVIA and NTCIR

CIKM reviewing: too much and too little

I’ve just finished my CIKM 2010 reviewing assignments, and have come out of the process with a number of questions:

Is nine papers too many to review?
This year, I was assigned nine papers to review. Last year it was five. For SIGIR (if my own local cache of reviews is to be believed) it [...] → Read More: CIKM reviewing: too much and too little

Evaluating keyword search in databases

I was recently invited to contribute a short article to a special issue of the IEEE Data Engineering Bulletin on keyword search in databases. Since database keyword search is not an area I have worked on previously, I decided the most worthwhile contribution I could make was to survey evaluation practice in the area,

Easter eggs in academic books

I'm currently reading Nonsampling Error in Surveys, by Judith Lessler and William Kalsbeek (Wiley, 1992) for a project I'm working on. It is, for the most part, an informative but necessarily rather dry treatment of statistical questions in survey design and interpretation. The segue at the end of Chapter 10 (p276), however, reads:

If

If [...] → Read More: Easter eggs in academic books