Beware of spurious correlations

Human beings are wired to look for ‘meaning’, which makes us eager to spot connections that sometimes simply aren’t there. That’s never been truer than in today’s era of ‘big data’.

Tyler Vigen is a criminology student at Harvard who set up his own website to highlight how easy it can be to draw ludicrous conclusions from data because of the way we’re taught to look for patterns.

Vigen has loaded up a number of different sets of random data on his site and then cross-related them to identify apparent (but clearly nonsensical) statistical similarities. For example, as you can see above, there seems to be a clear link between the divorce rate in Maine and the per capita consumption of margarine in the US.

Similarly, you might say there was a link between the per capita consumption of cheese and the number of people who died by becoming tangled in their bedsheets:

chart (1)

Clearly, there isn’t really a link in either case. But it’s easy to imagine that people might accept there was – and unnerving to realise how readily we accept this kind of correlation when presented to justify a medical or scientific or commercial conclusion.

Vigen calls them ‘spurious correlations’. You can find many more examples (and even create your own) by visiting www.tylervigen.com. It’s quite amusing.

Alternatively, you could look a little harder at some of the ‘facts’ that get used in presentations around your own business and see how many of them actually stand up to robust statistical scrutiny.

Not so amusing, but potentially more revealing.

 

Leave a Reply