No results.

How often should statistical significance occur?

Via Andrew Gelman and Howard Wainer, an interesting meta-analysis from 2005 by Pan, Trikalinos, Kavvoura, Lau, and Ioannidis (the last of Why most published research findings are false fame), comparing reported statistical significance and effect sizes in studies of genetic propensity to disease, between studies performed in mainland China and those performed elsewhere. There [...] → Read More: How often should statistical significance occur?

Renewing ACM

My ACM membership came due just recently. In the light of objections to their copyright policy, I seriously considered not renewing in protest. I agree with Panos that the ACM should not seek copyright from authors, unless the authors are actually being paid for their work. I’m particularly struck by Bob Carpenter’s [...] → Read More: Renewing ACM


“Semantic search log analysis” by Hollink, Tsikrika, and de Vries. Categorize the semantic classes of query reformulations. For a professional image search service, the most common non-identity reformulation is to find the spouse of the first query’s result.
A call to boycott ACM and IEEE program committees and journal reviewing until they allow free [...] → Read More: Authorities

If I had a ten-thousand-node cluster…

Another aspect of this year’s SIGIR reviewing, though hardly a new one, was the number of papers from industry. As a fellow reviewer observed, these often come across as nicely polished internal tech reports; the kind of thing you should probably read once your globally-distributed search infrastructure has reached the million-query-a-day mark; a sort [...] → Read More: If I had a ten-thousand-node cluster…

Longer or shorter conference papers?

The past few days have seen the review deadline for IJCAI papers, SIGIR papers, and SIGIR posters, of which I had 11 (mostly farmed out), 10, and 4 (all done myself), respectively. IJCAI had a six-page limit, whereas this year, SIGIR expanded their papers from the usual eight pages to ten.
I don’t know what [...] → Read More: Longer or shorter conference papers?


The Journal of Universal Rejection.
Don’t become a scientist.
Building self-esteem and self-discipline amongst LA kids in gang-ridden neighbourhoods — through cricket (!).
A large-scale statistical self-evaluation of Australian university research against world standards finds that Australian universities are below world standard in statistics research.
Today I learnt, R has support for inline C++.
Simple models are often as accurate [...] → Read More: Hubs

Spam filtering for web results

The popular history of web search holds that Google buried its competition in part because its PageRank algorithm solved, for a time, the problem of web spam; and some pundits now assert that not merely Google, but algorithmic search in general, is loosing its battle against the spammers. Spam has attracted much less [...] → Read More: Spam filtering for web results


I enjoy other bloggers’ link aggregations, so I’ve decided to do my own.
Ben Edelman demonstrates that Google biases its search results towards its own services, even when other, lower-ranked links get more click-throughs — BenEdelman.
Strike one for pessimists! Ben Carterette and Ian Soboroff (SIGIR 2010; yes, six months old, but only just read) find [...] → Read More: Spokes

A weighted similarity measure for non-conjoint rankings

Having an article in submission sharpens one’s appreciation for situations to which the article applies; and the ACM’s leisurely eighteen-month-plus turnaround on journal articles gives plenty of time for that appreciation to develop. Since early 2009, it has seemed to me that every third retrieval paper either has, or should have, compared non-conjoint rankings [...] → Read More: A weighted similarity measure for non-conjoint rankings

Post-stratification of a binomial population

Retrieval System A returns 100 documents. We sample 20, and find 10 relevant. We therefore estimate that System A’s precision is 0.5, and that there are 50 relevant documents in the set System A returned. Let us refer to the number of relevant documents a system returns as the system’s yield.

Subsequently, System [...] → Read More: Post-stratification of a binomial population