Friday, 10 July 2009

SuperFreakonomics contest

Today I wasted 45 minutes on a contest to win a £9.99 book.

Not rational at all - except for what I learned from the experience, and the satisfaction of feeling cleverer than 651 other contestants.

Background information on the contest is here. In short, you need to guess how many Google hits there will be for SuperFreakonomics on November 3rd, two weeks after the eponymous book is published. This is the sequel to Freakonomics, so it should be popular. But how popular?

Now this kind of contest has some interesting idiosyncrasies. Like guessing the price in The Price Is Right, or like guessing the weight of a nun - or whatever it is they do in travelling carnivals - your best strategy is not to try to accurately work out the weight. Instead, you should look at what other people have guessed and pick your number to maximise the chance that you'll be just a tiny bit closer than them. You might also recognise this strategy from the "beach vendor problem".

In the simplest case, if the other contestants have guessed 100 lbs, 130 and 135 lbs you might think that the most likely weight is, perhaps, 132 lbs. That may be true, but you would be an idiot to put your bet there.

Instead, you can maximise your chances of winning by bidding 99, 101, 129 or 136 lbs. With a 132 lb bid, the only way you can win is if the weight is exactly 132, or 133, or maybe 131 if there's a tiebreaker. What are the odds of that? Not big unless you are really good at weighing nuns. If that's your fetish, you don't need economic theory to help you.

While if you pick 136 lbs, anything over 135 results in a win for you. Similarly, if you pick 99 lbs, anything under 100 is yours; and at 129, anything from 115 to 129 wins it for you.

Of course judgment must come into this somewhere, and you may simply glance at the nun and be certain she's well over 100 lbs. Which eliminates the 99 lb guess. And if she is definitely slimmer than the 136-lb nun who taught you at boarding school, you won't be guessing that either. But the less idea you have of the real answer, the more guidance you should take from the distribution of the existing answers.

How does this apply to the SuperFreakonomics contest? Well, I honestly have little idea how many hits SuperFreakonomics will get. I can be fairly confident it's well under the 1.3 million that Freakonomics has, and well over the 11,000 that it has accumulated so far. But how far over? I don't have much of a clue - it could be anywhere from 50,000 to 500,000.

Other things being equal, it's more likely to be at the low end. I believe that Google hits - like many distributions, especially where network effects apply - are subject to a power law. That is, there are ten times as many words with 1,000 hits as with 10,000; and a hundred times fewer again with a million hits.

So to reflect this combination of strategic targeting and statistics, I downloaded all the existing guesses (651 of them so far), sorted them and projected them onto a logarithmic distribution. I then calculated the gaps between them in order to find the biggest gap in which to place my estimate.

Turns out that there's a nice space from 88,281 to 97,538 - natural logarithm 0.099717 - which is therefore where I wish to place my bet. Of course I'm better off placing it at the low end than the high, because of the same power law effect; so my estimate is 88,282.

Admittedly there are other factors to consider. The second biggest logarithmic gap is 0.080298, between 137,400 and 148,888. If I felt the number of hits were substantially more likely to be around the 140,000 mark than the 90,000 mark I should put my money up there instead. Perhaps the higher probability would outweigh the tactical disadvantage of being closer to other contestants.

In a more extreme version of this argument, I might have a powerful predictive Google model - let's say I were an SEO professional, and one with an improbable respect for statistics - which predicts that there will be between 200,000 and 230,000 hits. In this case I'd choose the biggest available gap within that range, which is 204,000 to 210,000, and guess 204,001. Bad luck to the 204,000 guy ("James") but all's fair in love and game theory.

Another factor is that the variable we're measuring is not independent. It is subject to influence by the participants in this contest - should any of them care enough. If the number of hits by 28th October or so is (let's say) 80,000, a determined competitor whose bid is 150,000 might find a linkfarm network which will take his money to put "SuperFreakonomics" on seventy thousand websites in a week. Will Google fall for it? That depends on many factors, but given that this is the raw total number of hits and not the duplicates-eliminated count, it might work. So by guessing honestly (for some value of "honest") I am assuming that winning a signed copy of SuperFreakonomics will not incentivise people sufficiently to impact the number of hits meaningfully.

Finally, we are disadvantaged by the New York Times's blog moderation process. Unlike the BBC, who have people on duty 24 hours a day to approve the intemperate comments on Robert Peston's blog (hello there, John_from_Hendon), the NYT seems to keep their interns in front of the PC only during working hours. So, even assuming that my guess is still eligible, there could be a queue of forty people in front of me who have applied exactly the same logic. In which case I should assume that one of them has already picked 88,282 and I should choose 88,283. Indeed they may have applied the same logic themselves, and so I better bump it up to around 88,350 just to be sure. Which, of course, they'll have figured out too. Maybe it's safest just to pretend I'm smarter than everyone else after all. Maybe I'll find out on Monday; more likely my illusion will be dispelled on 3rd November when eight million Google hits show up for SuperFreakonomics - I can only hope that this page, which has mentioned it at least seven times, is one of them.

p.s. my 45 minutes did not include 20 minutes of writing this post.

p.p.s. I'm trusting that hardly anyone will see this between my writing it and the closing of the contest, or that you will be too honourable to go to the NYT and pick a number one higher than me. If you do want to make an entry and not mess too much with my attempt, try 204,001, 137,401 or 80,002.

p.p.p.s. I do actually think the real result will be higher than my guess. But I'm more confident in the game strategy than in my ability to estimate Google counts. And the guesses are pretty crowded around the level I would have picked, meaning that my second-best option is probably the preferred strategy. But on reflection, maybe 178,649 would have been a better choice.

No comments: