Tuesday 21 September 2010

Failures In Solving Crosswords: A Statistical Study

hindu-crossword-9564-incomplete Some days we complete the crossword, some days we do not. Is there a pattern to how frequently we are unable to fill the whole grid, and how many clues are left unsolved?

Based on ten patient years of crossword "failure" data collection and analysis, retired cosmic-ray physicist and astronomer S. Naranan suggests there is. His analysis reveals a surprising statistical regularity in crossword failures (i.e. unsolved clues) - the occurrence of failures follows the Negative Binomial Distribution (NBD), the same distribution used by the car insurance industry to predict the probability of accidents, or by marketing teams to forecast purchase patterns for a product.

Dr. Naranan's research paper has been published in the Journal of Quantitative Linguistics (published by Routledge) and is available online to subscribers. The abstract is available free.

The Model's Universality

When I heard about Dr. Naranan's research, a couple of questions came to mind:

  1. Since cryptic crosswords are of widely different complexity and styles, is it possible for a generic model to apply to all crosswords? This analysis is based on crosswords of The Hindu and The Times Of India. Will the result not change if we consider, say, The Guardian or The Times (UK)?
  2. The study relies on one solver's data only – Dr. Naranan's. Even if his data of unsolved clues fits the NBD curve, is it reasonable to suppose that this will be so for every solver?

Both questions are addressed in the article.

Puzzle Variations

The article first explores the Poisson Distribution (PD) which is based on Bernoulli trials – this fits the scenario of large sample size of data (n), independent nature of trials with only two possible outcomes (success or failure), and low probability of failure (u). It then goes on to say:

It is clear that PD is inadequate for our data because […] the probability of failure u is not the same for all puzzles because of their in-built diversity, e.g. different composers, styles of cluing, deliberate introduction of variation in the complexity of a puzzle. Such variability is reflected in the real world of crossword puzzles and the observed data will include a mixture of numerous PD’s with different characteristic parameters (say λ1, λ2, λ3, λ4, . . . ).

(JQL, 2010,  Vol 17, Number 3, p197-198)

The Poisson distribution depends on one parameter only (λ), and it is interpreted as the average number of errors. Because of the variation in λ due to diversity in crosswords, a generalization of PD is needed. This leads to a "mixture" distribution, which is the Negative Binomial Distribution. NBD depends on two parameters (p,k).

What are p and k? They are parameters that quantify the gap between the setter and solver. These parameters are related to the average and standard deviation of the distribution of failures.

In mathematical terms, if the average is m and the standard deviation is s, then p is (m/s*s), the ratio of average to variance and k = mp/(1-p). Another way to look at p and k: The number of puzzles with no errors is p^k. The ratio of puzzles with errors x = 1 and x = 0 is k (1-p).

This means if a solver tells how many of his puzzles have x = 0 (no errors) and how many have x = 1 (one error) from a known sample of puzzles, this model can predict the entire distribution, i.e. how many will have 2, 3, 4, etc. number of errors.

Solver Skill Variations

NBD can effectively model the (p,k) for solvers with different solving expertise.

According to the model described, the complexity of crossword puzzles and their variability will depend both on the solver and the composer(s). There is no reason to suppose the NBD will not apply universally to all crossword puzzles and solvers. So, for each solver of crossword puzzles, one can expect NBD to apply, each with a characteristic pair of parameters (p, k) that quantifies the gap between the skills of the composer and solver. For the author (p, k) = (0.455,0.869).

(JQL, 2010,  Vol 17, Number 3, pp 200)

The perfect solver, who never makes an error in the puzzle, has x = 0 always. So Prob (x = 0) = 1 and Prob (x) = 0 for all other x. For such a solver, p = 1 and k = anything.

Theory Of Proportional Effect

The paper also examines the multiplicative effect of missed answers in the grid. The first failure (x = 1) happens at a random position in the grid, but it increases the chance of the second failure (x = 2) occurring at an intersecting location. This could be represented by a 3-parameter lognormal distribution LND2, says Dr. Naranan, though he calls this model "semi-quantitative at best". He observes that his data fits both the NBD and LND2 models but adds that the correspondence of NBD and LND2 may not hold for other solvers and/or puzzles.

Extending The Study

Dr. Naranan wishes to extend his work to an organized group of solvers who tackle many puzzles. This will not only generate a large sample size crucial for statistical analysis of data with long tails, but also confirm the robustness of the NBD model for crossword-solving errors. The study indicates that NBD can accommodate variations in solver habits and composer vagaries, and a group project of recording count of errors per puzzle can confirm the findings.

In Closing

The study is significant in understanding the nature of crossword solving. To me the most fascinating part of the research is its suggestion that crossword solving, a game of pure skill and not chance, has the same pattern of randomness as accidents.

If you are interested in reading the paper, it is available to subscribers of JQL here:

.           Journal of Quantitative Linguistics, 17 (3), 191-211.
            S. Naranan.  (2010).  A Statistical Study of Failures in Solving Crossword Puzzles

The author's manuscript is also available on his website here.

Thanks a lot to Dr. Naranan for answering my back-and-forth questions with immense patience. He had said to me at first: “If you have background in undergrad maths, the maths should be quite easy.” It turned out that my rusty recollection of undergrad maths was inadequate and I needed all his detailed explanations to make some sense of his work. Thank you!

Related Posts:

If you wish to keep track of further articles on Crossword Unclued, you can subscribe to it in a reader via RSS Feed. You can also subscribe by email and have articles delivered to your inbox, or follow me on twitter to get notified of new links.


Amrith said...

Fascinating to say the least :)
I really wish I could post something intelligent on the comments section but these mathematics leave my head in a spin :)
Wonderful post, yet again!


anax said...

Oooh – difficult one. Mathematics/logic has many uses in many walks of life, but its attempted application to the field of language is bound to throw up problems.

Solving ability is going to be a major factor, of course, but being utterly bamboozled by a final clue can be due to a number of reasons. Even if we take away clue difficulty and solver experience, we are left having to take into account that simple consideration “What letters have you got?”, and applying mathematics to this looks pretty awkward.

The clue (as it later turns out, thanks to a blog) was actually not as hard as it looked, but the cross-checking letters were _A_E – how many words fit that pattern? Or, the clue was extremely tough but the pattern was J_Z_ - regardless of whether or not the solver has fully parsed the clue he/she is likely to home in one just one possibility (two if his/her mind works that way).

For a mathematical analysis to cover this it would have to take into account the number of possible answers for every letter pattern, as well as clue difficulty and solver experience, which seems like rather a tall order.

Shuchi said...

Hi anax

The scope of the study is (i) a very large sample size of data, i.e. attempted crosswords (ii) a solver who generally finishes most of the crossword.

It does not analyse reasons for unsolved clues in specific cases - it only says that if the graph of the solver's frequency of errors is plotted for a large sample set, it will closely fit the NBD curve. So, given that the solver makes has 0 errors in n1 puzzles, 1 error in n2 puzzles, when the size of the sample set (n) is known, the rest of the error data can be closely predicted.

The sample size has to be big (the study covered 3400+ solved crosswords). This won't work if we are only talking of a few puzzles. As you rightly say, the specific reasons for the error can be so many - unfriendly letter pattern, or simply not knowing the word.

Shuchi said...

@anax: What is the second word that fits J_Z_? If it came up in a crossword, that might be my "failure" data of the day.

anokha said...

@anax: Actually one's mind doesn't have to work that way. Look it up in the dictionary and you will find the following def:

— n
a term for the total combination of characteristics that serve to identify a particular species of bird or plant

[origin obscure]

Collins English Dictionary - Complete & Unabridged 10th Edition
2009 © William Collins Sons & Co. Ltd. 1979, 1986 © HarperCollins
Publishers 1998, 2000, 2003, 2005, 2006, 2007, 2009

Then, of course, there is the slang dictionary ....... and the word is allowed even in SCRABBLE!

Shuchi said...

@anokha: Thank you for that. You must be one of the Perfect Solvers of this study, with p = 1 and k = anything :) Never seen you leave any clue unsolved.

anokha said...

"..... Never seen you leave any clue unsolved."

Then you just need to SEE my blog, Shuchi :) it is strewn with errors :(

Unknown said...

Very interesting..will show it to my colleague who's a statistician.You might be interested to know that statisticians discover the most surprising correlations.For example,stock price movement and skirt hems are apparently correlated.Both go up at the same time:)

Shyam said...

Hi Shuchi

This may not be the most appropriate forum to post my queries but I hope someone could answer them.

1) What is the speciality of NBD in this context? Wouldn't a simple binomial distribution suffice? Let X be the binomial Random variable, X~B(n,p), n the number of sample puzzles and p the prob of success. We can always estimate the user's performance from a small 'n' and assign him a 'p' which signifies his ability.

2) From a setter's viewpoint, averaging the success over many users will give the probability of his clues being solved.

Merging these 2 ideas into a single model may be some task. I have not gone through the paper, and I would love to know if that is what the paper has accomplished.

Shuchi said...

Hi Shyam

I'll invite the author to respond to your comments or maybe carry on the discussion offline, but as I gather from my reading (disclaimer: I am no statistician)

1) The variance is expected to be more than the mean in a mix of solvers of varying skills. It is also so for the data used in the study. Therefore, the NBD and not a simple BD.

2) The paper doesn't touch upon that. It may be an interesting extension.

Shuchi said...

Hi again Shyam

To update my response to your question 2, this has been suggested in the paper as an extension - an organized group of solvers participating in a project to determine p(x) vs x, for a mix of solvers and puzzles. From the setter's viewpoint, this exercise will assess quantitatively a measure of the complexity of his/her puzzles.

If you're interested to know more about the paper, please mail me your email id [shuchi at crosswordunclued dot com], and I'll be happy to put you in touch with the author.