**GD Star Rating**

*loading...*

Search Engine Land has a short article on bias versus brands. The issue at hand is whether Google Instant has a brand bias. Google says it does not:

Singhal explains that when someone types in T, mathematically “most people typing T will go to Target. That’s the probability model. If you add R to it (“Tr”), most people are looking for a translation system. It’s actually just pure mathematical modeling.” It is just math, he says, not a bias.

Oh come on, now! What kind of explanation is that? There is no such thing as* “just math”*. There is always a conscious decision to ** use** math in a particular way.

Let’s take as an example the classic information retrieval ranking function: tf * idf. Researchers have long known that it is important to rank documents by their *“it’s just math”* query term frequency with in a document. However, it is just as, if not more, important to correct for that internal term frequency by using global frequency statistics, such as (inverse) document frequency. The reason is that if you have a query such as [the table] and you do not correct for the collection-wide ubiquity of the word [the], your document rankings will be dominated by probabilities of [the] within a document. IDF values are used to correct this bias, and bring the term [table] to much higher prominence. Experimental results almost always show that tf * idf beats ranking by tf alone. (Aside: Even language models have a idf-like, global probability smoothing factor to correct for tf alone.)

Thus, to make a ranking algorithm truly useful, the mathematics have to be designed to account for the *“it’s just math” *probabilities. Without such correction, the algorithms are biased away from relevance. So to claim that brands dominate because* “it’s just math”* masks the deeper issue that existing bias within Google Instant isn’t being corrected and is propagated to the user.

At the risk of getting a little too geeky, I am reminded of the Asimov laws of robotics. The first and primary law is: “A robot may not injure a human being or, through inaction, allow a human being to come to harm.” Note that there are two edges of the “no harm” sword: No harm through commission, and no harm through omission. Both are required.

I think that there is a analogy to brand bias within search algorithms. In the search engine domain, the robot is the algorithm. Just like it is not enough for a robot to avoid committing acts of harm against a human, it is also not enough for the algorithm to throw up its hands, say “*it’s just math*” and therefore I haven’t actively, explicitly committed any bias in my rankings. No, the algorithm has to simultaneously be aware of the biases that arise through through inaction, or omission, as well.

Click-through probabilities are the tf, the component of the ranking algorithm that ensures true, unbiased, no-acts-of-commission probability. Where is idf, to balance out that math, to ensure that no acts of bias omission slip through? Without it there is still bias. *“It’s just math”* is not a proper defense.

Agree? Disagree?