No results.


“Semantic search log analysis” by Hollink, Tsikrika, and de Vries. Categorize the semantic classes of query reformulations. For a professional image search service, the most common non-identity reformulation is to find the spouse of the first query’s result.
A call to boycott ACM and IEEE program committees and journal reviewing until they allow free [...] → Read More: Authorities

If I had a ten-thousand-node cluster…

Another aspect of this year’s SIGIR reviewing, though hardly a new one, was the number of papers from industry. As a fellow reviewer observed, these often come across as nicely polished internal tech reports; the kind of thing you should probably read once your globally-distributed search infrastructure has reached the million-query-a-day mark; a sort [...] → Read More: If I had a ten-thousand-node cluster…


The Journal of Universal Rejection.
Don’t become a scientist.
Building self-esteem and self-discipline amongst LA kids in gang-ridden neighbourhoods — through cricket (!).
A large-scale statistical self-evaluation of Australian university research against world standards finds that Australian universities are below world standard in statistics research.
Today I learnt, R has support for inline C++.
Simple models are often as accurate [...] → Read More: Hubs

Spam filtering for web results

The popular history of web search holds that Google buried its competition in part because its PageRank algorithm solved, for a time, the problem of web spam; and some pundits now assert that not merely Google, but algorithmic search in general, is loosing its battle against the spammers. Spam has attracted much less [...] → Read More: Spam filtering for web results

Flooding the internet with online media

http://www.focus.com/images/view/48564/ → Read More: Flooding the internet with online media


I enjoy other bloggers’ link aggregations, so I’ve decided to do my own.
Ben Edelman demonstrates that Google biases its search results towards its own services, even when other, lower-ranked links get more click-throughs — BenEdelman.
Strike one for pessimists! Ben Carterette and Ian Soboroff (SIGIR 2010; yes, six months old, but only just read) find [...] → Read More: Spokes

A weighted similarity measure for non-conjoint rankings

Having an article in submission sharpens one’s appreciation for situations to which the article applies; and the ACM’s leisurely eighteen-month-plus turnaround on journal articles gives plenty of time for that appreciation to develop. Since early 2009, it has seemed to me that every third retrieval paper either has, or should have, compared non-conjoint rankings [...] → Read More: A weighted similarity measure for non-conjoint rankings

Post-stratification of a binomial population

Retrieval System A returns 100 documents. We sample 20, and find 10 relevant. We therefore estimate that System A’s precision is 0.5, and that there are 50 relevant documents in the set System A returned. Let us refer to the number of relevant documents a system returns as the system’s yield.

Subsequently, System [...] → Read More: Post-stratification of a binomial population

The session track of TREC

This year’s TREC conference had several interesting sessions, and not the least interesting of them were the planning sessions for next year’s tracks. The design of a collaborative retrieval task, and of the methods and measures for evaluating such a task, can provoke a more wide-ranging, philosophical discussion than the presentation of retrieval results, [...] → Read More: The session track of TREC

A Comparison of Open Source Search Engines

A report by
Christian Middleton & Ricardo Baeza-Yates

Excerpt from the Introduction:

As the amount of information available on the websites increases, it becomes necessary to give the user the possibility to perform searches over this infor- mation. When deciding to install a search engine in a website, there exists the possibility to use a commercial search engine or → Read More: A Comparison of Open Source Search Engines