Search
Close this search box.
Search
Close this search box.

What Crosswords Can Teach Us About Subtext and Natural Language Generation

What Crosswords Can Teach Us About Subtext and Natural Language Generation

Artificial Intelligence & Machine Learning

The crossword is a laboratory for linguistic creativity. The best cruciverbalists use everything there is to a word, and never settle for less. A simple puzzle may be home to layers of references and designed ambiguity. Take the crossword clue “Grateful (8).” The easy, boring answer is “INDEBTED.” The creative, rewarding answer is “POWDERED” (to have been fully grated down to a form like powder). 

A particular sort of crossword is of interest to me: the cryptic crossword.

The cryptic crossword clue can be read two ways. The solver must look past the surface reading to find a second, puzzle reading. The latter gives the cryptic clue meaning and guides the solver toward the solution. 

In 2007, David Hardcastle, PhD, of Birkbeck, University of London developed a natural language generation (NLG) model ENIGMA that can generate a cryptic crossword clue for any word in the English language. This essay will interrogate ENIGMA’s method of encryption. How does the model come to engineer a clue that can be read two ways? How does the model optimize the dissonance between the surface reading and the puzzle reading? This essay further considers how everyday speech is similarly “encrypted.” Note how what we say often “dresses up” what we intend to communicate, just as a cryptic clue “dresses up” the path to the solution. NLG models excel in producing coherent sentences that describe, summarize or explain. But people do not speak so plainly and without reservation. Perhaps to generate realistic and humanized dialogue, then, NLG models must learn to encrypt informational sentences, thereby achieving a “show, don’t tell” subtextual effect. 

The Crossword

You have in front of you a grid of dark and light squares, and you ought to fill in the light squares with letters to form words based on the given clues. In a standard American-style crossword, the clue is usually a short definition of the answer. For example, “Verbal equivalent of tomato throwing” clues for “BOOING” and “Deadly African snake” might clue for “MAMBA.”

The cryptic crossword takes the puzzle a step further. A “cryptic clue” is composed of two elements:

  1. A definitional element. This ranges from a simple synonym to an allusive paraphrasing of the solution (e.g. “…blights on the landscape” clues for EYESORES). If a cryptic clue were pared down to the definitional element, the cryptic crossword would resemble the average American-style crossword. 
  2. A cryptic element that wittily suggests the solution.

In a cryptic clue, the two elements are designed to be concatenated in such a way that the most “natural” reading (or surface reading) gives little indication of the solution. In fact, the best cryptic clues have surface readings that distract or mislead the solver. Take, for example, this clue: “After the third of August, I have flexibility (4).”

The clue is a statement of its own! The surface reading only tells us that someone is rather busy or otherwise engaged until the third of August. 

The challenge, then, is to interpret the clue in such a way that its semantic representation reveals more than the surface reading. The best way to do this is often to try and split the given clue into its two elements. 

In this example, the definitional element might be the word/phrases:

  • “After,” in which case a fitting solution might be “post” or “rear”
  • “After the third of August,” with the solution being “fall”
  • “have flexibility,” with the solution being “open”
  • “flexibility,” with the solution being “give”

We move forward assuming that “flexibility” is the definitional element. That leaves “After the third of August, I have” as the cryptic element: 

  • “The third of August” = the third letter in August (“g”) 
  • “I have” = the contraction “i’ve.” 
  • The two put together is “give.”

Here’s another example: “Legendary N.F.L quarterback’s mausoleum ruined yard (3,5)”

  • This time around, the definitional element is in front: “Legendary N.F.L quarterback”
  • What remains is the cryptic element:
    • A common synonym for mausoleum = “tomb”
    • “ruined,” or words of the like suggesting a sudden change or disruption (“crashed,” “cooked,” “drunk,” “novel,” “fresh”) commonly indicates the use of an anagram
    • “ruined yard” may be rearranged = “rady”
    • “tomb” + “rady” = “TOMBRADY,” a legendary N.F.L quarterback.

ENIGMA and the Generative Process

Hardcastle’s model imitates the human process of writing a crossword clue. ENIGMA

  1. Receives the input solution word
  2. Renders all possible “clue plans”
    1. e.g. given input MEDIEVAL, clue plans might include but are not limited to:
      1. anagram(EMAILED + V)
      2. MEAL around anagram(DIVE)
  3. Generates a cryptic element that is most thematically associative and semantically appropriate
    1. Confusedly emailed with five Romans
    2. Madly dive into meal
  4. Generates a definitional element that is thematically associative and semantically appropriate
    1. in ancient times
    2. it’s primitive.
  5. Stitches the cryptic and definitional elements together, in accordance with grammatical rules (excluding all filler or connective words which may distract from the clue).
    1. Confusedly emailed with five Romans in ancient times.
    2. Madly dive into mealit’s primitive.
  6. The cryptic clue is ranked by a rubric of Hardcastle’s design, with emphasis on the cryptic dissonance between the two readings of the clue (e.g. to solve the clue ‘five Romans’ must be read as V, but has surface reading Adjective-Noun pair)

There additionally exists a special clue plan for simple “Two Meanings clues,” where the cryptic element is a second definitional element. Given the input word “MEDIEVAL,” ENIGMA might find two synonyms and stitch the two together meaningfully (in accordance with thematic association and semantic fit) to produce a clue like “Antique from the Dark Ages (8).”

Let us then direct special attention to steps 3-4, which refer to thematic association and semantic fit. Hardcastle prioritized these two measures to optimize the coherence of the surface reading, such that “one can imagine that the clue describes some actual situation… a crossword clue that lacks a realistic surface reading is universally regarded as artless.” 

To measure the thematic association, Hardcastle used a custom Russian Doll Algorithm. He scored incidences where word pairs appeared with relative proximity over multiple windows, using weighting to take account of the different window sizes. The Russian Doll Algorithm can be especially effective with polysemic terms (words with multiple meanings that overcount incorrect collocations) if all such terms are sense-tagged.

The Russian Doll Algorithm, however, is an insufficient measure for semantic fit, as distributional analysis conveys information about broad (or thematic) association only. A clear and sensible surface reading requires precise lexical choice. Hardcastle thus instead extracts syntactic collocates from the BNC to identify adjective modifiers, direct objects, and nominal subjects (i.e. word pairs that involve nouns). 

In the evaluative stage, Hardcastle conducted a Turing-style test with 60 regular cryptic crossword solvers. Presented with a list of ENIGMA- and human-written clues for the same solution, they correctly identified the latter 70% of the time. Crossword compilers and expert solvers generally criticized ENIGMA on the model’s lacking in human wit or humor. 

How NLG Models Might Generate Subtext

Cryptic clues are not so different from how we choose to speak to one another. 

Recall that the cryptic clue contains a cryptic element. That cryptic element elliptically communicates the solution. Things we say, too, contain a “cryptic element,” some sort of encoded information. Pay attention to the encoded information, and you might find some hidden meaning to what has been said. This is commonly referred to as subtext. 

I tell you that “I’m shivering.” The encoded information could be that I could use something warm, possibly the jacket you are not wearing. The subtext may be that “I could use your jacket.” 

Subtext makes human dialogue complex, but also poetically ambiguous. Note the famous example from The Princess Bride. Westley says to Princess Buttercup “As you wish” (it is “all he ever said to her”). The encoded information could be that he cares deeply for Buttercup and does not know what else to say. The subtext in this particular case: he means “I love you.” 

Admittedly, human dialogue is not nearly as formulaic as a cryptic crossword clue. Not everything we say can be neatly arranged into definitional and cryptic elements. Nevertheless, NLG models can perhaps improve their ability to generate realistic human dialogue by mimicking the process as demonstrated by ENIGMA.

Consider, for example, that we want an NLG model to generate a dialogue between two characters on an evening date: Alex and Charlie. Alex wants to spend more time with Charlie. The generative steps below are modelled after ENIGMA’s process to accentuate the similarities:

  1. Input (Subtext): “I want to spend more time with you.”
  2. Rendering Dialogue Plan. Here, the model identifies all possible conversational elements that can communicate the subtext. Note that dialogue plans will vary with the choice of corpus and the customs or habits expressed in the included texts.
    • comment on the pleasantness of the evening
    • mention upcoming activities that might interest Charlie.
  3. Generating “Cryptic Elements” (fitting within the context of “evening date”)
    • “This evening has been wonderful, hasn’t it?”
    • “You know, I was thinking we could watch that new movie you mentioned.”
      • “movie” is thematically associative with “date” and “evening”
  4. Stitching the Elements Together:
    • Alex: “This evening has been wonderful, hasn’t it?”
    • Charlie: “Absolutely, I had a great time.”
    • Alex: “You know, I was thinking we could watch that new movie you mentioned. I haven’t seen it yet.”
    • Charlie: “That sounds like fun. Maybe we could do that.”

Note that Alex communicates the subtext in the absence of a “definitional element.”

Altogether, an ENIGMA-inspired process might help an NLG model generate text that communicates a want or need in a more pleasant and sensitive manner without losing on thematic, semantic, and grammatical coherency. 

Written by Clark Wu