Statistics and heuristics

Dr Peter Jenkins, Director of Mathematics

While walking from the Mathematics staffroom to the school cafeteria at lunchtime, I’m normally very adept at blocking out the chatter of teenaged girls. That said, I occasionally overhear something that piques my interest. The most recent occasion involved a pair of girls talking about the irritating frequency with which Spotify’s shuffle mode tends to play consecutive songs by the same artist. ’It’s supposed to be random’, lamented one of the girls; ’but it just played three One Direction songs in a row.’

Despite not using Spotify myself or knowing precisely what it is, I found this conversation interesting because it caused me to recall having experienced this same frustration first-hand with my ipod playlist of non-One Direction songs. This feeling of frustration and surprise occurred despite knowing full well that my brain was being hijacked by the so-called cluster illusion: the tendency to erroneously consider the inevitable streaks or clusters arising in small samples from random distributions to be non-random. More simply, we tend to perceive patterns when none exist.

The cluster illusion is just one of a number of heuristics or rules of thumb that we subconsciously apply when attempting to process statistical and probabilistic information. In his fascinating book, Thinking, Fast and Slow, psychologist Daniel Kahneman (2011) details a large number of these heuristics that appear to be hard-wired into us; they allow us to make decisions and judgements that are fast, instinctual, intuitively reasonable … but often wrong.

For a more mathematical example, consider the following classic problem:

Scientists have discovered an important gene that exists in one in every 1 000 people. They have developed a simple test that can detect the presence of this gene with 99 per cent accuracy. If a person is tested for this gene and receives a positive result, how likely is it that they actually have the gene?

If you’re like most people, you might think it pretty likely that the person does, in fact, have the gene (given the positive test result). In truth, the actual probability is only about 9 per cent. To see why, one must realise that since the gene is extremely rare, it is much more likely that a positive test result is the result of the test getting it wrong on a person who doesn’t have the gene, than the result of the test getting it right on a person who does have the gene! Here, our statistical intuition lets us down badly by failing to give appropriate weight to the gene’s base rate of occurrence, and placing too much importance on the new information about the positive test result. Our tendency to neglect the significance of the base rate in situations like this is a special case of what Kahneman calls the representativeness heuristic.

What’s so concerning about these heuristics is how effective they are at lulling us into a sense of complacency. If our intuition is making something appear clear and reasonable, then we don’t ask further questions, and end up making bad decisions. An example of this with particularly devastating consequences is the case of Sally Clark, a British woman who was wrongly convicted of murder in 1999 (later exonerated). In this case, misleading statistical information was presented by the prosecution that was left unchallenged; this led jurors to the erroneous conclusion that Clark was almost certainly guilty. Similar examples of the devastating effects of cognitive biases can be found in many areas of public life.

It’s clear that a strong education in probability and statistics is a necessity in the information age. Studying statistics arms one with powerful tools to make sense of data and draw logical inferences. Not only is wielding such tools essential in almost every professional discipline, but doing so also empowers students to transcend heuristics in order to make careful, reasoned judgments. People with a good understanding of statistics learn to ask the right questions: ’What is the base rate? How likely is this to happen by chance? Percentage of what?’ Just as critical literacy teaches students to look beyond the surface features of written text to uncover underlying messages, studying statistics teaches students to approach numerical information with an equally critical eye. Given the examples that can be drawn from law, politics, science, and medicine, its potential to fascinate outweighs its demonstrable utility.

It should be noted that each of our current Senior Mathematics courses contain probability and statistics units, with Mathematics A having a particular focus on this. (Mathematics A students actually learn a technique that enables them to readily determine exact probabilities related to Bayesian inference problems such as the gene problem previously described). Furthermore, the new senior Mathematics courses commencing in 2019 will all contain more statistics content than their current counterparts, reflecting a growing understanding of the importance of statistics.

To return to the First World problem of appropriate music shuffling, it turns out the student who suggested there was something strange with Spotify’s randomisation algorithm was actually onto something, though not for the right reason. In 2014 Spotify modified its algorithm (Poláček, 2014) to decrease the frequency of clusters of consecutive songs by the same artist. Yes, Spotify made shuffling appear more random by engineering it to be less random. The fact that repeated artists still cause us to feel like something’s wrong suggests that heuristics such as the clustering illusion are, indeed, more powerful than we realise.

References

Kahneman, D. (2011). Thinking, fast and slow. New York: Farrar, Straus and Giroux. Chicago.

Poláček, L. (2014) How to shuffle songs? Retrieved from https://labs.spotify.com/2014/02/28/how-to-shuffle-songs/