Monday, 4 October 2010

When regressions go wrong...or too right

Via the excellent Simoleon Sense weekly roundup, I find this link which claims to discover "the single most important question in your life".

The research behind it claims to have found that the answers on a single kindergarten test can predict future income, college attendance, quality of college, college graduation (and while they're at it, a close link between college ranking and future wages).

I don't have the knowledge to challenge their results and I would not want to suggest that there's anything untoward about this research. But one thing makes me really, really puzzled: the results are too good.

Specifically, the regressions show an incredibly close linear fit between rank (or percentile) in the kindergarten test, and absolute salary. And a similarly close fit between rank in the test, and percentage chance of going to college.

The following image not only shows that unrealistically close linear fit, but they also imply an almost constant distribution of wages from $10k to $23k - which cannot possibly be correct. There should be a bulge in the middle, which would show up in this graph as a much flatter region in the middle of the graph instead of a straight linear gradient.

Your success on the test is a HUGE predictor of wages


And the next graph shows an even more constant spread of college admissions from 20% to 75%. Shouldn't there be bell curves in here somewhere?

And your likelihood of attending college


Incidentally, when you look closer at the college ranking versus wages figures, something else suspicious appears: the college rankings are described as an "earnings-based college quality index" and are measured in dollars. No surprise then that this measure would be correlated with earnings!

Now this may all be perfectly correct (the full presentation of the results is here). No doubt the results have been peer reviewed, and I don't have as much time or statistical expertise as the reviewers or indeed the researchers themselves. But whenever statistical analysis shows such perfect results, I'm cautious to say the least.

No comments: