Why NYT’s Connections makes you feel bad
(Visited 32741 times)The new daily game at the New York Times is called Connections, and I’ve seen a few people comment that they just don’t like it as much as Wordle or Spelling Bee. That the difficulty is inconsistent and it often makes you just feel stupid.
I thought it would be interesting to contrast this to Word Dad, a puzzle game made by my friend, master game designer John Cutter.
A brief aside on puzzles
All three of these are more correctly called puzzles, of course. The main difference between a game and a puzzle is that a puzzle has one real solution, an optimal way through the challenge. In a game, finding an optimal way through the challenge is known as a degenerate strategy or even “solving the game” if you’re a mathematician. This means that really, puzzles and games are terms that are matters of degree, not kind.
In A Theory of Fun I term games “a series of puzzles,” partly because a game is presenting you with lots of decisions, each of which has a theoretical optimum choice. In practices, most good puzzles also have many decisions, because you tend to play the same ruleset repeatedly, but with different statistical variations in the content. In that sense, like a game atom, puzzle are usually one logical ruleset, and have just one way in which the puzzle operates, which is a systemic machine that can take data variations. Sudoku is a machine; a given Sudoku layout is an individual puzzle. A Rubik’s Cube always has the same verbs and the same logic, but it has a ton of possible starting states. Not that different from fighting one monster versus another using the same combat system in an RPG! You get the same verbs, the same affordances.
But a puzzle’s output is usually pretty binary. You either got the answer or you didn’t. A game will give you finer-grained feedback, you can do better or worse at it. Many puzzle games try to offer an efficiency metric to give you a gauge instead. This is going to be important for our design comparison.
Connections
Connections is a game with a grid of 16 words. They’re jumbled around, but they are actually four sets of four. The trick is, you don’t know what the grouping terms are. The rules somewhat misleadingly give examples that are essentially noun-based: group all fish together, group all compound words starting with “fire” together. But then it says “categories will always be more specific than ‘five letter words,’ ‘names,’ or ‘verbs.'”
The trick of course is that words might well belong to many categories at once.
If that were all, the game would be much like Set, a game of set-building by examining the “stats” on the individual items and grouping them. Set, however, uses essentially numbers as its stats. A fixed list of colors, a fixed list of shapes. The permutation space is sizable but still small: there are only 81 distinct cards in Set. This is larger than a standard deck of cards, but also smaller in some key ways. (Doing the exercise of tallying up the number of “fields” and how large the “scale” is on each in a standard deck versus a Set deck is a worthy system design exercise I leave to the reader).
Word games have really really large, unbounded scales and fields. As the game rules mentioned, number of letters in the word is one such field. Meanings is another. Spellings is another. Heck, Connections could make a set of four words out of whether or not they use the dʒ, tʃ, or ŋ phonemes.
The learning loop
If we go back to Theory of Fun again, the core premise of the theory is that iteratively tackling a problem helps you come to understand the systemic machine in an intuitive way, and fun is the reward for that. All those varying Sudoku puzzles are there so you learn the generic methods of solving all Sudoku puzzles. The variations are like touching more parts of the elephant with your eyes closed. The more parts you touch, the closer you get to understanding the shape underneath.
So you loop over the game, and do it again with a slightly different problem set, and you keep doing it until you master the underlying logic. After that, you may only enjoy the game as a way to practice, or to mindlessly meditate. A larger game might scaffold you to more sophisticated problems by making this mastered element be simply one brick in a larger edifice of understanding. (Dan Cook has a great article walking through structures like this).
Ah, but what happens when a puzzle depends on your knowing facts, as opposed to methods?
Trivia games have this problem, so do spelling games, and games like Scrabble. They call for a large quantity of crystallized intelligence — large vocabularies, historical minutiae, memorized stuff. Games can help you learn memorized stuff, for sure! Anyone who has kids who learned the name and statistics of every Pokemon knows how that goes! But we learn those things in the service of mastering the loop.
Different people come to a game not only with different amounts of experience in systemic rulesets — which is why they bounce off of some genres — but also with differing amounts of crystallized knowledge. A core issue with trivia games, though, is that the statistical space of a trivia game is so large that playing more trivia games doesn’t particularly help you get better at trivia. This is why so many trivia games (such as Jackpot Trivia, the one I worked on with NTN Buzztime) have other mechanics alongside, in order to help people who may not bring a huge memorization library to the table when they play. In Trivial Pursuit you have a degree of control over the category, and you have control over which direction to travel. These effectively add more variables in the game so that a player with less trivia knowledge can outplay a player with more, through smart movement and concentrating questions on their strengths.
Can you learn to get better?
If Connections only relied on crystallized knowledge for the words, that would be one thing. But in practice, the set-building criteria themselves are often dependent on crystallized knowledge. Let’s look at yesterday’s puzzle. The words are:
- Flute
- Coffee
- Pound
- Oboe
- Stein
- Fricassee
- Bishop
- Tumbler
- Bassoon
- Frost
- Goblet
- Clarinet
- Olds
- Snifter
- Saxophone
- Balloon
The trick, of course, is that many of the words clump very easily into groups, and can belong to many groups. There are five wind instruments: flute, oboe, bassoon, clarinet, saxophone. Ah, but four of those five are reed instruments — which is trivia knowledge, relatively esoteric to anyone not a musician with a particular sort of training. And all five are woodwinds even though saxes are made of brass! All of those are things you blow on or into, but so are coffee and balloons! There are four drinking vessels for alcohol: stein, snifter, goblet, and tumbler. Ah, but Gertrude Stein, Robert Frost, and Ezra Pound all jump out as famous poets. Those last names are like a card in a deck being both an ace — all poets — while all being of different suits, where the suits are other categories they could belong to.
You get the idea — the more you know, the more categories you see. The puzzle is trying to teach a form of orthogonal thinking, to push players to find unusual ways to group things. But it’s fundamentally elitist — it basically requires you to have a broad education to find the categories in the first place.
The success/failure metric
If you had unlimited tries, this wouldn’t matter. You could essentially brute force the puzzle. Create a set of four. See if you can find a set in the remaining. Repeat. Any time you cannot, back out a level, undo the set you already tried. Over time, this algorithm will in fact give you the result, probably along with a lot of Googling.
The fact that the puzzle is susceptible to a brute force search isn’t a problem — so is Sudoku! Lots of games are like that, actually. Crosswords are! In fact, A Theory of Fun would argue that learning how to do that is in fact the lesson the game teaches, since it can’t teach you modernist poets or the difference between classical wind instruments itself. No, what it can teach you is how to be methodical, and the value of research.
But in order to add a sense of challenge, the designers of this puzzle decided to not let you use that algorithm. If you make an incorrect set more than four times, you lose. Now the game is handicapped in teaching you its intrinsic lessons! (You can still do it, but the puzzle affordances push you not to. You could make index cards with the words, sit with Google, try to solve it offline, then once you do, input your solution. But this is a pain in the ass).
Fundamentally, the game invites you to make guesses, but punishes you for them — and a missed guess doesn’t help you prune the logic space, only the trivia space.
As a result, it is entirely possible to build a solve that is all wrong, based entirely on valid category groupings that aren’t the ones that the puzzle designer intended.
Inconsistent difficulty
But building a good puzzle of this sort is very very hard. Do you feel clever when you notice that fricassee, balloon, coffee, and bassoon all use doubled vowels? Probably — the thrill of seeing a pattern you knew must exist. But did the designer mean to also have four words with two O’s in them? Balloon, bassoon, oboe, and saxophone do. It’s a valid pattern that is a red herring. A well-designed puzzle of this sort should have fewer red herrings than the number of mistakes it allows. But no puzzle maker will be able to anticipate all the valid groupings a knowledgeable and clever person will be able to find.
Out of those five wind instruments, by the way, the right answer was flute, clarinet, oboe, and saxophone. To the player, this can’t help but feel arbitrary — the only reason bassoon isn’t valid is because of the constraints of a different set.
When I said this was an elitist’s puzzle, I meant it. It demands a lot of knowledge, or a lot of time, it basically makes you “do the crossword in pen,” and it’s basically not something you can ever expect to have a good learning loop. It will always feel like it is inconsistent in difficulty, and that’s why we are seeing the reception we are.
I am not sure you can fix the basic premise here, to solve these issues. You’d need to have a fixed list of category types, and that would render the problem space trivial rather quickly over time. One approach might be to gather data on the the wrong solves players assemble, so that over time you can get a sense of which set types are harder than others. But it’d be a lot of manual labor, probably. Maybe ChatGPT could deduce what a submission was meant to group as… Maybe.
Over time, I expect that players will start to deduce the rules the NYT authors themselves use for authoring… what they consider to be an easy versus hard category. They have such a rubric — supposedly the four sets are each of a different difficulty level. But… the easy one this time was supposed to be the woodwinds. So… I suspect there’s a lot to learn about what actually makes for hard versus easy categories. Big Data may be the only way to actually arrive at a valid answer in order to smooth out the play experience.
As an aside, the advanced style of NYT crossword has many similar elements to this. But there are several things that help: difficulty ratings on the puzzles, learning puzzle authors’ styles, and the wide availability of easier crosswords out there, which help you learn the underlying logic (which at the advanced level, includes truly esoteric trivia, often multilingualism, advanced degrees, and a keen sensitivity for wordplay). If the only crossword puzzles in the world were like advanced NYT ones, few people would do crosswords.
Where a puzzle becomes a game
Word Dad has a very different approach. Unlike Connections, it does not have a singular answer to the puzzle. You see spaces for three words of varying length. You have a set of letters, exactly enough to fill all the spaces. You just need to make three words.
Once again, this is open-ended set building, and rewards crystallized knowledge in the sense of a wide vocabulary. The contrast to a typical puzzle is that the game accepts any valid solution that makes three words! Now, instead of the puzzle designer being in charge of the “category” axis, you are. This makes the words into verbs rather than content, which is the same move that is pulled by Scrabble and Boggle and countless other classics.
You have a success metric, too. All valid words, you see, fall into one of three levels: common, uncommon, and rare. This gives an implicit “score” you can get. If we assign 1 point for common, 2 for uncommon, and 3 for rare, you can solve the puzzle with any score from 3 though 9. (The game doesn’t actually show you a numerical score this way, but players have gravitated quickly towards aiming for scores of 9 — i.e. three rare words). Rare words, by the way, are not all that rare! You will almost never need to reach for a dictionary.
You also get only one submission — so you can experiment all you like, iteratively finding an optimal solution — and even better, there are frequently several solves worth 9 points! Invalid words are quietly rejected with no penalty.
Word Dad rewards you with a groaner dad joke or pun when you solve the puzzle, which is a nice bit of feedback that strongly themes the game. But the underlying problem space is already pleasing. You feel in control, you can iteratively learn the patterns to to go for. In this case, the logic rulesets you are deducing are “which words tend to be common and rare in a corpus?”, “what is the letter frequency of words in English” which is a nicely bounded statistical space to master, and a fun resource allocation problem across that landscape with vowels. You quickly learn tactics that turn STUB (uncommon) into BUST (still uncommon) into BUTS (rare!).
A fun wrinkle: given a dictionary of valid words, you can procedurally generate Word Dad puzzles, and guarantee 9 point solves. You just pick three rare words and scramble them! So it’s easy to make more content for, rather than hard. This is a virtue.
Making it even better
Word Dad uses word length as its difficulty metric. Over the course of the week, the words get longer. But I don’t think this is actually the right metric at all.
In practice, the thing that most affects difficulty is “how many valid solves there are.” The more there are, the more opportunities for players to deploy their vocabulary verbs, the more chances to feel smart by finding more solutions. If a given layout only had one solution, given that the words are generated by starting with three rares, the puzzle would be extremely hard, and back to binary: either you score 9 or you lose.
You could address this by adding a solver to your generation routine: have an unscrambler check the layout, and see how many valid solves it can find. Reject puzzles with less than a certain threshold. Ideally, reject them unless they permit the full range of scores from 3 to 9. Personally, I would also reject puzzles that only have one 9-point solve.
You could also make your generator better. The main thing that constrains the possible solutions landscape is the number of vowels. One of the things that currently messes up the difficulty ramp from the early week easy puzzle to the hard end of week puzzle is that you might get an “easy” puzzle with very few vowels. This cuts the possible solutions down dramatically, and starts favoring “trivia” knowledge like “Scrabble words with no vowels” and the like.
Lastly, there would be a big win in expanding the scoring scale. A decent player can start scoring 9 every single day without too much effort. In that sense, the game is too easy. This is no major sin: There is lots of room in the world for an easy daily diversion, but designers have to eat, so improving the game’s retention is still a worthy goal. Easy ways to add “skill ceiling” would include
- Including time as a score component: a higher score goes to a 9 point solve in a shorter time — this rewards one sort of player
- Having rarity within solutions: there are often several 9 point solves. Grant more score for esoteric, unusual solutions compared to the commonest solution the playerbase arrived at. This would reward a creative impulse and add a whole new logic puzzle to solve (“how do other players of this game tend to think?”)
- Bonuses for unusual words: a solve that included a word no one else used could provide a different sort of thrill to a player, and provide an orthogonal goal to pursue.
None of these need conflict with one another, and they’d all add to the game’s depth and replayability. It’s probably not hard to come up with more.
Game grammar
Anyway… I didn’t really mean to spend my Saturday afternoon doing a close systems analysis, but it’s been a while since I did anything of the sort. Or posted on the blog at all!
If you are a game systems designer, these are ways of thinking you should learn. Learn to see how games — even word games — are built out of common elements like set formation, tokens as verbs, statistical distributions, learning loops, and all the other concepts I’ve mentioned here. This is what I call game grammar: the underlying rules all games share. This same analysis can be done on a platformer or an RPG, a tabletop game or a sport. They’re all games in the end.
3 Responses to “Why NYT’s Connections makes you feel bad”
Sorry, the comment form is closed at this time.
This “Connections” puzzle is used in a TV programme we have in the UK called “Only Connect”. It has a set of other puzzles that are equally arbitrary, elitist and unfair, Fortunately, the worst ones aren’t suitable for newspapers.
For example, one of the puzzles asks you to guess the next word in the sequence, with 4 points if you get it after the first word down to 1 if you get it after the fourth, which doesn’t seem too bad except that sometimes you can only really get the fourth word if you’ve already got the first three and sometimes you can guess on the first one. Oh, and if you get it wrong you don’t get to guess again.
“Only Connect” has been going since 2008, and some people are strangely good at making connections. If only the puzzle-setters were better at setting the puzzles….
I myself do make a distinction between games and puzzles, by the way:
Play ends when you decide to stop.
Puzzles end when you win them or decide to stop.
Games end when you win them, lose them or decide to stop.
Basically, you can lose a game but you can’t lose a puzzle. That said, all single-player games are going to be puzzles.
As a woodwind player and cocktail afficiando, the Comnections game you referenced would have drove me mad because I realize a a flute is also glass for containing alcoholic beverages.
It’s funny this article mentions diphthongs because there was a grouping that relied on pronounciation of the word, specifically a silent L. This is problematic since not everyone pronounces words the same, and not everyone can hear spoken word.
I came to this article for validation after failing another puzzle.
[…] Source link […]