Tuesday, February 7, 2012

Can a computer program write cryptic clues?

I bet your answer is a vehement "No" (unless you are Spiffytrix's friend). Grid fills maybe, anagram suggestions - but entire clues? Not possible. This isn't Sudoku to get generated by software.

That's what I thought, which is why Enigma took me by surprise. Enigma, the brainchild of David Hardcastle, is a computer program that auto-generates cryptic clues for any word input. David built this program over a period of four years, as part of his thesis for PhD in Computer Science, Birkbeck, University of London.

In his thesis [p245], David says:

there is a widely held (and probably well-founded) belief that computers can generate English language but not “natural” English language. A key goal for ENIGMA is to challenge that belief, and for the system to generate clues with fluent surface texts.

How Enigma Works

Given a word, the first step is to figure out all the ways the word can be clued using the puzzle rubrics configured in the system. The user can then select a particular type from the list and generate clues using that type.

Let's see this work for the input word VIEWERS. The system comes up with a number of "clue plans".

enigma-generate-rubrics

"Exp" represents the number of possible combinations through which the clue could be expressed, where 4 means 104 =10,000 etc.

We select the fifth clue plan - (anagram(WIVES) around ER) - and click the Generate button, then the system generates clues which are shown ordered by rank.

enigma-clue-plans 
[click to enlarge]

Enigma generates the clues by treating the clue plan as a set of chunks and generating text for one chunk at a time. For example, (WIVES)* translates to chunks of text like 'strange wives', 'wives about', 'reorder wives' etc. The system discards those chunks that don't work syntactically (e.g. 'wives problem') or don't work semantically (e.g. 'jumbled wives').

The (rather unfortunate) phrase 'battered wives' scores above other similar alternatives such as 'fancy wives' since it is matched as a "collocate" i.e. a phrase in English with these exact words. The system recognises thematic associations between words by computing word distance between pairs of words in a 100 million word corpus (the British National Corpus) and using a statistical algorithm to determine whether or not a given pair of words is unusually correlated in the text.

Next the system finds ways of representing ER, such as hospital department, pause, etc.

Then the system explores all frames representing 'A in B' with all the combinations of the 'anagram of WIVES' chunk and the 'ER' chunk to try to build a new, meaningful chunk for that whole piece of the clue.

The auto-generated explanation for this clue is:

enigma-clue-explanation  

How Good Are Enigma-Generated Clues?

When Enigma was built, David conducted an evaluation in two ways –

1. Turing-style test – For the same light, two clues were provided – an Enigma-generated clue and a Sun newspaper clue. 30 pairs of such clues were presented and solvers were asked to pick the Enigma-generated clue from each pair.

2. Domain expert assessment - Crossword compilers Jonathan Crowther and Don Manley, editors Kate Fassett and Mike Hutchison, and expert solvers provided their feedback on clue quality.

60 people participated in the Turing-style test and on average they correctly guessed the clue from the Sun newspaper 70% of the time. The best score (parity with the newspaper) was 50% and the worst (obvious to tell apart) 100%. The pairs (now marked with which is Enigma-generated) are here.

The domain experts were harsher critics of the system and found most clues lacking in human wit. The surface reading was right some of the time but not all the time. Another criticism was based on originality. For example, ENIGMA scored the clue "Drain fresh ewers (5)" for SEWERS high, however this has been used often in cryptic clues and a human setter familiar with crosswords would avoid it. While this similarity with human output is something of a success, ENIGMA's inability to recognise originality which comes to human setters is a failing.

The major drawback of Enigma is its dependency on the encoded semantics, which in its current state is rather shallow. For example, in the VIEWERS clue, a human compiler would have spotted 'without hesitation' as a construct for '… around ER', and using same elements have made "Witnesses battered wives without hesitation (7)", a much more fluent clue. This Enigma has not managed – because 'B without A' has not been encoded.

The problem gets compounded when the clue length increases, with the surface becoming nonsensical. The system has no strategic planning component to organise the surface beyond clause and sub-clause level.

Read the detailed evaluation here [p228-262].

The domain experts were asked if any of the clues of a set of 42 were of publishable quality. Mike Hutchinson highlighted 10 clues, Jonathan Crowther highlighted 8 and Sandy Balfour 9. One can conclude that, while Enigma is no threat to human crossword setters, it does get it right at least some of the time.

A Puzzle to Try

Have a go at this crossword with clues generated entirely by Enigma:

Enigma Crossword

Answers here.

Related Posts:

If you wish to keep track of further articles on Crossword Unclued, you can subscribe to it in a reader via RSS Feed. You can also subscribe by email and have articles delivered to your inbox, or follow me on twitter to get notified of new links.

8 comments

eeshan said...

One (of the arguably many) definitions of a good cryptic crossword clue would be one that cannot be solved by simple mechanical dissection, but requires quirky human knowledge.
It would be interesting to know how well a crossword solving program works on these clues, and then comparing that with how well 'good' human crossword solvers do on the same clues.

anax said...

Fascinating stuff Shuchi, and a brilliant piece of software which successfully answers your opening title/question with a resounding “No”. That’s not a criticism – David’s efforts show how vital the human touch is in creating cryptic clues. The software does pretty much everything but what it can never do is replicate the humour which setters use to make clues stand out.
I see this as an excellent teaching tool for new setters, as it goes so far beyond the very basic breakdowns offered by Crosswordman’s Word Wizard facility. What it does is quite brilliant – it offers a wealth of fully explained clue concepts which the new setter can regard as influences behind what will be, hopefully, polished clues.
Terrific stuff.

Shuchi said...

@eeshan: Have you tried Crossword Maestro? I did a trial once on The Hindu Crossword and got interesting results. Let me dig around and check if I made notes.

@anax: I thought Enigma brought the idea of machine-generated cryptic clues into the realm of possibility, something that was inconceivable to me before.

If I were shown these clues:

Bad year in troubled decade (7) DECAYED
Bedroom breaks tedium (7) BOREDOM

I would have taken them to be human-authored. In fact many of Enigma's flaws - DDs without distinct meanings, definitions often failing the substitution test, awkward surface reading for long clues - are common failings of an amateur human setter too.

Agree about humour being hard to replicate. I can see many of Enigma's other limitations being fixed by more sophisticated configuration/coding, but I can't see how it could come up with novelty clues.

raju umamaheswar said...

Good job, Schuchi. However, all that was arcane to my simple mind. I'd still prefer the human touch in compiling. Like the flavour of a personalized letter from your loved ones as opposed to an email.

After all, it is humans who had discovered the computer's value and is there any limit to his thinking power?

I have the Crossword Maestro software CD, bought in 2002.

It can only assist in choosing the right word and there you are! You still have to use your brain. As good/bad as One Across.com. It can never assist you when you want it most, with that frustrating phrase: 'I don't know how it works'' or sorry I'm stuck with this one.'! as its punch line! I haven't tried it for compiling or constructing a full grid and solving.

David Hardcastle said...

@eeshan: one of the evaluation exercises I did was to test how well (or badly) Crossword Maestro could solve the generated clues.

It did pretty well. Unsurprising, as you say since the clues are quite mechanical and also since it uses the same open lexical resources (such as WordNet and Roget's thesaurus) that Enigma uses to create the clues!

@anax: I have always meant to hook the clue planning into my Java grid creation tool and create something which provides hints to setters but leaves the setter to add the flair. If/when I get around to that I'll let Shuchi know ... :)

Adrian and Denise Lowe: said...

Is the programme available to the 'tavern with learner in charge?'

Shuchi said...

@Adrian and Denise Lowe: Back in 2011 when I'd asked David pretty much the same question, he had said:

"Unfortunately Enigma is a rather large and unwieldy application, as it has a lot of large underlying data sets, so it is very difficult to send you a copy to have a go with."

I've pinged David in case he has a working app to share now with the funny IP Club.

David Hardcastle said...

@Adrian and Denise Lowe: yes I'm sorry but Enigma is not easy to package up and share with you, in part because it drives off datasets which I can't sub-licence or re-release.

I keep meaning to sort that out and put it onto something like GitHub for anyone who is interested, along with SphinX (an earlier project which supports manual cryptic crossword compilation and which I still use myself to compile crosswords for our local village newsletter).

Sadly, work keeps getting in the way and my good intentions never quite break through ... :)