Some days we complete the crossword, some days we do not. Is there a pattern to how frequently we are unable to fill the whole grid, and how many clues are left unsolved?
Based on ten patient years of crossword "failure" data collection and analysis, retired cosmic-ray physicist and astronomer S. Naranan suggests there is. His analysis reveals a surprising statistical regularity in crossword failures (i.e. unsolved clues) - the occurrence of failures follows the Negative Binomial Distribution (NBD), the same distribution used by the car insurance industry to predict the probability of accidents, or by marketing teams to forecast purchase patterns for a product.
Dr. Naranan's research paper has been published in the Journal of Quantitative Linguistics (published by Routledge) and is available online to subscribers. The abstract is available free.
The Model's Universality
When I heard about Dr. Naranan's research, a couple of questions came to mind:
- Since cryptic crosswords are of widely different complexity and styles, is it possible for a generic model to apply to all crosswords? This analysis is based on crosswords of The Hindu and The Times Of India. Will the result not change if we consider, say, The Guardian or The Times (UK)?
- The study relies on one solver's data only – Dr. Naranan's. Even if his data of unsolved clues fits the NBD curve, is it reasonable to suppose that this will be so for every solver?
Both questions are addressed in the article.
Puzzle Variations
The article first explores the Poisson Distribution (PD) which is based on Bernoulli trials – this fits the scenario of large sample size of data (n), independent nature of trials with only two possible outcomes (success or failure), and low probability of failure (u). It then goes on to say:
It is clear that PD is inadequate for our data because […] the probability of failure u is not the same for all puzzles because of their in-built diversity, e.g. different composers, styles of cluing, deliberate introduction of variation in the complexity of a puzzle. Such variability is reflected in the real world of crossword puzzles and the observed data will include a mixture of numerous PD’s with different characteristic parameters (say λ1, λ2, λ3, λ4, . . . ).
(JQL, 2010, Vol 17, Number 3, p197-198)
The Poisson distribution depends on one parameter only (λ), and it is interpreted as the average number of errors. Because of the variation in λ due to diversity in crosswords, a generalization of PD is needed. This leads to a "mixture" distribution, which is the Negative Binomial Distribution. NBD depends on two parameters (p,k).
What are p and k? They are parameters that quantify the gap between the setter and solver. These parameters are related to the average and standard deviation of the distribution of failures.
In mathematical terms, if the average is m and the standard deviation is s, then p is (m/s*s), the ratio of average to variance and k = mp/(1-p). Another way to look at p and k: The number of puzzles with no errors is p^k. The ratio of puzzles with errors x = 1 and x = 0 is k (1-p).
This means if a solver tells how many of his puzzles have x = 0 (no errors) and how many have x = 1 (one error) from a known sample of puzzles, this model can predict the entire distribution, i.e. how many will have 2, 3, 4, etc. number of errors.
Solver Skill Variations
NBD can effectively model the (p,k) for solvers with different solving expertise.
According to the model described, the complexity of crossword puzzles and their variability will depend both on the solver and the composer(s). There is no reason to suppose the NBD will not apply universally to all crossword puzzles and solvers. So, for each solver of crossword puzzles, one can expect NBD to apply, each with a characteristic pair of parameters (p, k) that quantifies the gap between the skills of the composer and solver. For the author (p, k) = (0.455,0.869).
(JQL, 2010, Vol 17, Number 3, pp 200)
The perfect solver, who never makes an error in the puzzle, has x = 0 always. So Prob (x = 0) = 1 and Prob (x) = 0 for all other x. For such a solver, p = 1 and k = anything.
Theory Of Proportional Effect
The paper also examines the multiplicative effect of missed answers in the grid. The first failure (x = 1) happens at a random position in the grid, but it increases the chance of the second failure (x = 2) occurring at an intersecting location. This could be represented by a 3-parameter lognormal distribution LND2, says Dr. Naranan, though he calls this model "semi-quantitative at best". He observes that his data fits both the NBD and LND2 models but adds that the correspondence of NBD and LND2 may not hold for other solvers and/or puzzles.
Extending The Study
Dr. Naranan wishes to extend his work to an organized group of solvers who tackle many puzzles. This will not only generate a large sample size crucial for statistical analysis of data with long tails, but also confirm the robustness of the NBD model for crossword-solving errors. The study indicates that NBD can accommodate variations in solver habits and composer vagaries, and a group project of recording count of errors per puzzle can confirm the findings.
In Closing
The study is significant in understanding the nature of crossword solving. To me the most fascinating part of the research is its suggestion that crossword solving, a game of pure skill and not chance, has the same pattern of randomness as accidents.
If you are interested in reading the paper, it is available to subscribers of JQL here:
. Journal of Quantitative Linguistics, 17 (3), 191-211.
S. Naranan. (2010). A Statistical Study of Failures in Solving Crossword Puzzles
The author's manuscript is also available on his website here.
Thanks a lot to Dr. Naranan for answering my back-and-forth questions with immense patience. He had said to me at first: “If you have background in undergrad maths, the maths should be quite easy.” It turned out that my rusty recollection of undergrad maths was inadequate and I needed all his detailed explanations to make some sense of his work. Thank you!
Related Posts:
If you wish to keep track of further articles on Crossword Unclued, you can subscribe to it in a reader via RSS Feed. You can also subscribe by email and have articles delivered to your inbox, or follow me on twitter to get notified of new links.