An Exercise in Rational Cooperation and Communication: Let’s Play Hanabi

Why Play Hanabi?

Hanabi is a game requiring modeling others’ minds, communication, and strategy. Its unique challenges and cooperative (rather than adversarial) objective have piqued the interest of the AI community, which recently began using it as a testing environment for AI agents.

Hanabi is a card game in which 2-5 players must cooperate to put on a dazzling firework display. The cards in the game represent different stages of fireworks. The most basic version of the game has five different colors of fireworks (cards), each of which has five stages, numbered 1-5, that must be set up in ascending order. The players must play cards from their hands to add them to the communal display on the table. The players’ score at the end of the game is the sum of the latest stage of each color they successfully deployed; reaching stage 5 for all 5 colors receives the maximum score of 25.

However, a slight wrinkle: players must hold their cards so that they face away from them; no player is allowed to look at their own cards, only those of other players. To play their cards at the right time, each player must rely on clues from their teammates.

I like this game a lot. I also think it is good for people to play in that it forces you to think about how others will interpret your communications when they don’t have the same information or perspective that you do. The communication skills and theory of mind required for Hanabi are also good for real life—in handling interpersonal conflicts with people you ultimately want to cooperate with, or in explaining your expertise to a layperson, as some examples. (I believe this strongly enough that I will volunteer myself as a partner for any reader interested in trying the game online; just leave a comment or send a message.)

Hanabi Rules

Instead of or in addition to reading this section, you can watch a humorous explainer video.

To start, shuffle all the cards together and deal 5 cards to each player in a 2 or 3 player game or 4 cards to each player in a 4 or 5 player game. Place the rest of the cards in the middle of the table and set up the 8 clock tokens and three fuse tokens face-up.

Players take turns until the game is over. On their turn, a player must take one of three actions:

  1. Give a hint to another player

  2. Play a card from their hand

  3. Discard a card from their hand

Give a Hint: To give a hint, a player picks one other player, then points to cards in that player’s hand that match either a number or color (e.g., “this card is a 5” or “these three cards are red”). They must point to all the cards that match the color or number. This is the only way to communicate card values or colors in Hanabi. To give a hint, the player must flip over one of 8 clock tokens. If there is no token to flip over, the player cannot give a hint.

Example: Bob is holding a red 3, a green 2, a blue 2, and a blue 1. Alice wants to give Bob a clue. Two of the clues she is allowed to give are “This card is a 3” and “These two cards are blue,” while pointing at the corresponding card(s). She is not allowed to say “This card is a two” while pointing to only one of the two 2s; she must indicate both cards if she wants to tell Bob which cards are 2s.

Play a Card: If a player thinks one of their cards is ready to be added to the display, they can announce they are playing a card, and then put it on the table. If it is ready to be added to the display, great! The card is added to its color’s pile and the players’ score increases by 1. If instead it is too early to play that card, or another copy of that card has already been played, it is discarded and the players lose one of the three fuse tokens. If all three fuse tokens are lost, the display explodes and the players score 0 points.

Example: The game has just begun and there are no cards on the table. On the first turn, Alice tells Bob he is holding a 1. On Bob’s subsequent turn, he plays the card Alice pointed out. It is a blue 1, which is added to the display.

Example: In the middle of the game the green fireworks are at stage 3 and the yellow fireworks are at stage 2. Bob is holding a card that he thinks is a green 4, so he announces he is playing a card and puts it on the table. However, Bob’s card was actually a yellow 4, which cannot be played before a yellow 3. Bob discards his yellow 4 and one fuse token.

Playing a 5 of a particular color suit completes that color firework and un-flips a clue token as a bonus.

Discard a Card: Players can announce they are discarding a card and then remove one of their cards in their hand from the game. Doing this returns one clue token to its unflipped state, allowing the team to give another clue in the future. Other than playing 5s, this is the only way to un-flip a clue token. Most cards in the game have multiple copies. Since each color suit only needs one card of each value to be played, there are plenty of cards that can be “safely” discarded.

Whenever a player loses a card by playing or discarding, they draw another at the end of their turn. When the deck runs out, everyone takes one last turn, and then the game ends. The game can also end early if the display is completed.

Optional Rules

There are lots of variations on the base game which range from fun to challenging to absolutely diabolical. I will outline the two most common below.

Rainbow Cards: The game also includes a rainbow suit, which can be included as a 6th suit along the other 5 solid colors. Depending on the desired difficulty, players can choose to have the rainbow cards be their own distinct color along the basic 5, or they can choose for the rainbow cards to match all color clues. This way, a player who is told one of their cards is red, for instance, cannot be sure if the indicated card is red or rainbow until they receive more information.

Black Powder: An expansion to the base game adds a “black powder” suit of cards, which are added to the deck like the rainbow cards. Black cards differ from the other cards in two ways. First, they cannot be indicated by any color clues (e.g. “this card is black” is an illegal clue). Second, they must be played in descending order, instead of the normal ascending order that the other suits use.

Why is this on LessWrong?

Playing Hanabi online over the past month or two taught me lessons about communication. It’s also a case study in how an anonymous crowd finds and uses Schelling points.

People who play Hanabi on a particular site (or in person with the same group of people) will gradually evolve, by group consensus, norms and expectations about how to play. For example, in online settings:

  • Most experienced players will discard from the right side of their hand, and will expect others to do the same.

  • If a color clue indicates multiple cards, in the absence of any other information, it is assumed that the leftmost card indicated by that clue is ready to be played.

The benefit of having such norms is hopefully obvious; players can take advantage of “pre-loaded” information to correctly indicate which cards to play while using fewer clues. Most, if not all, of the norms are arguably the “best” norms that could be chosen, often because they are Schelling points in player strategy that arise naturally from the rules of the game. For example, with respect to the above norms:

  • The “oldest” cards—that a player has held for the longest time—are displayed on the right in online games. If a player’s rightmost card were important, other players would have had the most opportunity to indicate that to be the case. Since they chose not to, it is likely safe to discard.

  • Newly drawn cards are displayed on the left side of a player’s hand. Color clues usually mean “play.” So if a color clue touches multiple cards, it is most natural to assume it is intended for the newest card that it touches—if your teammate had wanted you to play the older card, they would have said so, instead of waiting until now.

The casual reference to Schelling points is not to be overlooked—anonymous players online approaching Hanabi with a common strategy is not unalike the New York City question.

A highly ranked player will be expected by other highly ranked players to know these informal rules, to the point where not following them is met with confusion (if not outright hostility—it is internet gaming, after all). However, there isn’t perfect consensus about particular edge cases of these norms, and there are sometimes situations where perfectly communicating information to another player is simply not possible. With respect to this reality, there are basically two types of players, which I will nickname “Goofus” and “Gallant.”

Goofus believes that the correct way to play Hanabi is to perfectly understand and follow the informal rules. If a player he gave a clue to misunderstands it and misplays, Goofus is likely to be confused or angry. To Goofus, to play Hanabi is to execute an algorithm that he has only partial control of. He relies on other players to correctly execute their parts of the algorithm, and feels helpless when they don’t.

Gallant understands that what makes communication good or bad is whether or not it is correctly understood by its recipient. If a player he gave a clue to misunderstands it and misplays, Gallant asks himself, “why didn’t my clue mean what I thought it meant?” To Gallant, to play Hanabi is to set an objective; the algorithm of clue-giving and interpreting can and must change to reach it.

(An aside: This post was inspired by my experience after I had the poor fortune to be matched with a Goofus yesterday. During our game, he/​she gave me a clue that was ambiguous: I could tell that the indicated card was important, but not if it was ready to be played immediately or if it should be saved for later. In fact, he/​she had meant “play this card now,” but I instead took the risk-averse action and discarded some other card instead of playing. In response the player sent me an angry message, intentionally (I think) lost the game for us, and then wrote a negative comment about me, visible to other players, about how I don’t follow the right informal norms of play. I glanced at the player’s recent activity: it contained two full pages of the player writing negative comments about people with whom he/​she had played Hanabi, stretching back weeks. It struck me as impressive that one can consistently fail at communicating and yet insist that they are communicating correctly.)

I see this as the ultimate “moral of the story” when it comes to Hanabi: the value of good communication lies in whether or not it is properly understood, and ultimately is measured by whether or not it produces the desired behavior in its recipient.

Don’t be Goofus. Be Gallant.

Where to Play or Buy Hanabi

I play Hanabi regularly online (usually with strangers) on Board Game Arena; there are a few other websites with different features available as well. You can also order a physical copy of the game from your favorite online retailer or board game shop. The game components are simple enough that you could make your own set, with some effort.

A benefit of playing online instead of in person is that most online implementations of Hanabi track clues for you, eliminating the memory aspect of the game and allowing you to focus only on the logic and communication. (In my opinion, this is a serious plus.)

Hanabi can be played in less than half an hour, and I recommend it for adults and families with children aged 8-10 and up, or perhaps even younger if they are particularly clever.