So has AI conquered Bridge ?
Bridge, like most card games, is a game of incomplete information. It is a game of many facets, most of which will have to remain unstated here. However the calculation of probabilities, and the constant revision of probabilities during play, is a key feature. Much the same can be said of poker and of certain other card games. But bridge is the game I know and I am still learning from, 45 years after I first got infected with the bridge bug
We have Alpha Go and Alpha Zero but the wizards at DeepMind have not yet seen fit to release an Alpha Bridge. Poker is played to high expert standard by specialist AI’s such as Pluribus but bridge playing computers up till today remain well short of that. Those commonly available play parts of the game extremely well, usually making mechanistic decisions correctly yet also make quite surprising and often comical errors. It has been noted bridge would seem to be to be an excellent training ground for AGI, in ways chess and go are not.
The French company NUKKAI has just created somewhat of an earthquake by revealing a bridge playing AI called Nook that, it is claimed, can beat the world’s best players. So has bridge finally been conquered? This post will show this is not yet the case. I hope also to give a flavour of the type of probabilistic problems that routinely need to be solved at the bridge table, by AI or human, where Nook may score over a human and why, and where its performance may be somewhat inflated due to some features specific to the training and test set-up. I’ve tried to write with the non-player in mind and I proffer apologies in advance if I have failed to do that adequately.
NUKKAI have published on You Tube a 5hr commentary on entire event. To my shame my poor French renders me unable to follow it closely and I’d welcome comments from French speakers who have followed it, especially as they may be able to back-up or possibly even refute some of the comments I make. Commentaries in English are also starting to become available. Kit Woolsey, a well known expert, is publishing in depth analyses of the most interesting hands on BridgeWinners.
A word about the mechanics of the game. Please feel free to skip this section if you are familiar with it. The game is a trick taking game played with a full deck of 52 cards, dealt equally between four players, who make up two partnerships sitting in opposition. There are two parts to each hand. The first part is the Auction in which the pairs vie to decide the Contract, the number of tricks greater than six that one side or the other contract to make with one suit chosen as trumps (playing with no trump suit is also an option). The side that wins the Auction are the Declaring side. The second part is the Play of the Hand. The player to the left of the Declarer (which of the pair on the Declaring side is the Declarer is decided by a slightly arcane method I won’t go into) leads a card and the partner of the Declarer lays his hand down face upward and takes no further part in the hand. His cards are played, in correct rotation, by the Declarer. The other two players at the table ( the Defenders) have to cooperate (within the rules of the game!) to take enough tricks to defeat the Contract. All three players are able to have sight of exactly half the cards, their own hand, and the one on the table (The Dummy). The usual rules for trick-taking games apply. Players must follow suit if they can. Aces are high. The highest card played in the suit led to the trick wins the trick, unless there is a trump suit and a player who cannot follow suit can play a trump to win the trick. The hand that wins the trick leads to the next one.
At the end of play of all 52 cards (13 tricks) the contract will either be satisfied or it is defeated. Scores are assigned and we move on to the next hand.
Firstly let us note that the only part of the game Nook has been trained to do is play Declarer. No bidding, no defence. No collaboration with a partner required. So it is, at best, very good at only one third of the game. The set up is the Human/AI play Declarer against two of the top current bridge playing computers. It is an important point that there is always this constancy of silicon opposition, who might be expected to (and apparently do) always defend in a consistent and predictable manner. 8 experts each played 100 hands so the full test was over 800 hands. However the auction and the contract was always the same (the contract was 3NT, 9 tricks to be made with no trump suit) and there was essentially no information available in the auction to influence declarer play. So it would be very wrong to say the test evaluated AI skill over the full gamut of declarer play.
After the initial lead, it is normal for a Declarer to make a strategic plan. A good plan will optimise the likelihood of enough tricks being made to achieve the contract. Sometimes a good plan will be ultra cautious, guarding against unlikely distributions to ensure the contract is made. On other occasions, if the contract is ambitious, a risky plan may be required to have any chance of achieving the contract.
This part of the operation would seem to be relatively straightforward for a good AI to master. Finding a plan that works best against the largest number of distributions does not seem too dissimilar to the planning required in chess and go. Commentators have indeed identified cases where Nook’s apparent plan was superior to that of the human opponent.
After the initial lead by opponent/ partner 25 cards remain unseen and their positions between the two other players often need to be inferred to achieve optimum play. There is often information available from the Auction that provides an immediate Baysian update (although not in the test of Nook). Thereafter the play of any unseen card may allow another Baysian update. If we don’t count the triviality of the last trick, this provides possibly 23 instances where a Baysian update may be applied (and this doesn’t include any inferences to be drawn from the cards declarer chooses to play from dummy as well).
Each Baysian update may require a change in the plan. This process of making inferences from opponents actions and gathering additional information, to adjust prior probabilities assigned to several different card distributions ( we could also call them models) is called “Card Reading”, a term with somewhat mystical association. When carried out successfully, and the actual distribution is found to match the predicted one, it does feel a bit like magic.
The most informative case is where a defender can’t follow suit. Immediately the distribution of that suit is known and the likely distributions of the other suits becomes altered. But more subtle inferences can also be derived from the nature of cards played to follow suit, or chosen as discards.
It is unlikely that all possible eventualities are taken account of in the original plan, the combinatorial explosion of possible distributions is too vast. So Nook most likely reassesses the plan after each card, similar to a human player.
At this stage let’s look at some simple situations where probabilistic reasoning can be applied and how an AI might need to reason. Here is quite a common type of position in a single suit.
Declarer wants to make 3 tricks. She could first play the K,and then lead the 3 to the J, winning if the Q is on the left (should the left hand player play the Q it will be topped by the A). Or she could lead the 3 to the A and, on the next trick, lead the 2 towards the 10. This wins if the Q is on the right. A priori it is a guess, a 50:50 proposition which play to choose. However an observant player will generally have additional information that adjusts the probabilities one way or the other. This information can be gained from the Auction, or during play. For instance, declarer may know from the Auction that the left hand player is highly likely to have 6 cards in another suit and, knowing the declaring partnership has 5 cards in that same suit, right hand player therefore has 2 of that suit. Left Hand, starting with thirteen cards, therefore has at the outset 7 “empty spaces“ and Right Hand 11 “empty spaces” in which to put the Q. All other things being equal, the chance of the Q of this suit being on the Right is now favoured to the extent of 11⁄18.
A good player will not be content with that. She will play other suits to test their distribution and the disposition of high cards, with a view to learning more about the hand. Sometimes this will not be productive but sometimes it will harden the odds, making it more likely the Q is on the Right. More rarely the picture will change completely and odds start to favour the Q being on the Left. Sometimes certainty can be achieved and the position of the key card can be identified with probability 1.
It would seem clear that an AI should be able to recalculate these probabilities faster and more accurately than a human can. Superiority in this part of the game would not be surprising. Whether Nook can plan the play to elicit as much information from the opposition as possible is an interesting question.
As far as I can ascertain the computer defenders do play in a deterministic fashion. Their manner of play of the low cards in any given situation will be consistent. There does appear to be some indication that Nook may indeed seek out additional information to a certain extent, relying on the deterministic play of the defenders to build up a true picture (human defenders would rarely be as accommodating). More on this later.
Here is an example that is sometimes quoted as classically Baysian. It is somewhat infamous, in that the principle, the so called “Principle of Restricted Choice”, is, in my experience, not accepted (sometimes vehemently so) by some bridge players. Although unintuitive at first glance, the logic behind it is correct, however.
There are 4 cards missing In the suit. Suppose, to the first trick, you play the K and see Left Hand play the 8 (you play small from Dummy) and Right Hand plays the Q. What is the play that gives the best chance of not losing a trick in the suit?
There are two holdings that are consistent with Right Hand’s play of the Q. Q on its own (singleton) or QJ doubleton. If the first holding is correct you need to play a low card to the 10, finessing Left Hand’s J. If the second is correct you need to play to the A. The two holdings are roughly equal in likelihood, given no further information. So, it must be the same as a toss of a coin surely?
No, it isn’t. With no other information available, the play to the 10 figures to win two thirds of the time. The reason is that, with QJ, Right Hand has a choice of two equivalent plays on the first trick, there being no material effect on the outcome of playing the Q rather than the J. Consequently the chance of the Q being originally in the holding QJ, is one half that of the Q being singleton, where there is no choice of play.
This situation is exactly equivalent to the famous Monty Hall problem. The case where the prize is behind your door corresponds to that of when Right Hand has QJ. The Host can choose which of the other two doors to open. The case where the prize is behind one of the other two doors corresponds to the singleton Q case. The host has no choice which door to open, without revealing the prize. You should switch doors.
The principle of restricted choice is most well known for examples involving just high cards, similar to that above. However in principle it can often be applied in some form or other in many situations. Asking the question of whether a play is forced, or is one of several equal choices, as often as possible, even regarding the smaller cards, should in theory allow more accurate assessment of which are the likely distributions. This requires great attention to detail, accurate sifting of signal from noise, and a strong and quick mental calculation ability, and few players are capable of sustaining the effort to do this consistently for a great length of time. Having said that, very good players, through long experience, do, almost unconsciously, pick up on clues that would be insignificant to weaker players and draw inferences on them.
There is a wrinkle that is important to mention in regard to this example. Suppose we have some extra information, namely that the agent on the right will always play the Q from QJ. There is no possible benefit to do so in this situation. However there is a common convention that, with two cards only, a defender will play the highest, so long as it doesn’t cost, thereby giving information to their partner as to their length in the suit. Players can be creatures of habit, and apply that convention when there is no need. This changes the odds. Now the play of the 10 is back to being close to a 50-50 choice. However, on the flip side, had Right Hand played the J rather than the Q, it would now be a near certainty that no other card accompanies it. Note that our consistent robot defenders are creatures of habit and are likely always play the same card from QJ doubleton. They are also likely to (say) always play low from three small, play high from doubleton etc etc. and thus signal the distribution of their cards accurately.
This introduces the point that each opponent has different experience, different strengths and weaknesses, different quirks and habits. The best players (including AI’s) need to take these into account if they can to accurately infer the probabilities of various holdings, from their play.
Nook may be able to gain significant advantage in these type of situations by constantly eking out Baysian inferences from the small card plays that even the best human players don’t consistently do. Also, crucially, Nook will eke out significant information in from the consistent play of defenders. Human defenders, by contrast, do not in general play consistently, thereby generating noise, and human declarers do not usually generally learn the playing style of individual opponents in great detail, as in most competitions, they will play against many opponents.
Comparative analysis of differential human/Nook strategies is still on going but evidence is emerging that Nook’s mastery may to a considerable extent be due to its exploitation of the behaviour of the defenders. Not only do they signal their distribution in a rigorous fashion but they make errors of play in a consistent fashion, which may very well be exploitable. It looks like Nook has learnt very well how to exploit the foibles of its opposition.
Let’s look at another card layout.
Declarer : A42
Declarer wants two tricks in the suit. A normal play here is to lead low from declarers’s hand and play either the Q or the 10 when a low card is played on the Left. If the K and J are on the Left either will do. If the K and J are on the Right neither will work. If the K is on the L and J on the R it is necessary to put in the Q. If the J is on the L and K on the R it is necessary to to put in the 10. A 50% proposition? No, not completely. A good player will usually put in the 10. The consideration is what Left Hand Player didn’t do on this trick, which is to put in the K. With K and only one other card, most average players would play the K to make sure they make a trick with it instead of it being lost under the A. So this holding is now unlikely. With K and 2 or more cards it is more normal to play low but some players might still find a reason to play the K. With J and one other card ( or more ) it would always be normal to play low in front of dummy. The consideration that the holding of Kx on the Left side ( where x denotes a small card) is unlikely, is enough to tweak the probabilities and make the 10 the right play. (Note: Before it is pointed out to me, yes, playing the A first before playing to the 10 would reveal immediately if Left Hand had Kx. In some cases this would be the right play. Here, let us assume declarer has other options for an extra trick and cannot afford to put all his eggs in one basket by risking the loss of two immediate tricks)
This is firstly, an example of drawing an inference from what hasn’t happened ( c.f. Sherlock Holmes’ dog that didn’t bark in the night). This is, in my view, one of the more difficult rationality techniques. The best bridge players are adept at this. The rest of us find this hard to do.
It will be interesting to find out how Nook handles this type of situation, Does such inferential thinking get encoded naturally during the training regime?
Secondly, it is an example where the Declarer has to take into account how another agent at the table is likely to play, given a certain holding. It is becoming clear that Nook has become adapt at this, albeit only because it has trained against a single deterministic opponent.
A last point, this is also an example of how the probabilities can change depending on the agents at the table. This situation offers a very good player (or exceptional AI) in the Left Hand seat an opportunity. With Kx she might very well quickly play low, (taking time to make that decision is not an option, it signals where the K is). She knows that Declarer doesn’t have AJx as otherwise he would likely be playing the Q or 10 towards the AJx to try to make three tricks if the K is on the other side. So playing low is not in fact likely to cost and will very likely induce a missed guess, the 10 losing to the J. The pay-back occurs if declarer is dealt A9x, as then later Declarer is likely to play from Dummy towards the 9 and lose a second trick to the lone K.
We have discussed how a human or AI can maximise their own performance. However good players must also try to minimise the performance of the opposition, by restricting the information they themselves give out.
For example there is a bridge aphorism ’play the card you are known to hold”.
Declarer plays the 4 from dummy and, R Hand playing the 7, puts in the Q, winning the trick. Now he cashes the A and Right Hand drops the K, the card he is known to have. Now declarer has no more information. He can play a third round of the suit to establish a winner, but he might suspect the suit is breaking badly. It is easier for declarer to place the cards if E plays the J rather than the K.
This we can call obfuscation. Taken to extreme, obfuscation becomes deception, with the distinction, should we want one, that an obfuscation is generally zero cost play, whereas a deceptive play should have some jeopardy if the deception is challenged.
Here is an example of deception by a defender.
It is normal here for Declarer to play low to the 10 or 9 , and, if it loses to the J, later play towards the Q or 9. There are two bites of the cherry at getting three tricks. On a good day, four tricks are possible. On this occasion Right Hand wins first time with the K. Later Declarer plays confidently towards the 9 and is dismayed to see it lose to the J that was ‘marked’ as being in the opposite hand. Perhaps now the AQ are stranded without access and cannot make tricks. Or perhaps declarer has in the meantime burned his boats elsewhere and staked all on the expectation of three tricks in this suit.
This is a lovely coup to bring off. But it is a gamble. If declarer sees through the stratagem you are likely to be considerably worse off.
Opportunities for true deception are more commonly found in defence play but can also be found in declarer play. Poker playing robots have been demonstrated to bluff effectively so in theory an exceptional bridge playing AI might also be able to play deceptively. However the AI has to recognise the relatively rare situations where deception is the optimum tactic and it is not clear how easy that is to train.
If there are poker and bridge playing robots that can deceive their human opponents, the question arises as whether they are actually unaligned, and, if so, whether they are unaligned in a trivial sense or in a fundamental sense that may be be important and useful to AI alignment researchers. This may be a well trod topic, for all I know, but if not, it is worthy of discussion.
It is time to summarise:
A) Nook has beaten eight out of eight human experts in one relatively specialised aspect of bridge play. Despite the small domain space and some additional caveats this does look an impressive achievement. Nook’s play was by no stretch of the imagination error free but it did on multiple occasions clearly locate the superior line of play.
B) Nook needs to plan a general line of play at the beginning that is flexible enough to take account of as many opposing distributions as possible. It is not surprising, given the state of the art, that Nook should be at least as good as a human at this.
C) Nook needs to make Bayesian inferences at each play of a card, reassess the plan and possibly change the plan if necessary. It seems likely Nook may have the advantage at going deeper than a human would in this regard, through always taking note of every card played and making Baysian inferences not just from the play of the high cards but also of the low cards.
D) Consideration of the expertise and style of play of the defenders may change a chosen line of play. Here, it seems that Nook has been trained against the same bridge robots that it is tested against and that these robots play in a deterministic fashion. If so Nook will have learnt the robots style of card play extremely well, and the typical errors that these computers make. It will be able to therefore read the defenders cards very well and may well have learnt to play in a way to promote the errors peculiar to these opponents. This might give Nook a very significant yet rather artificial advantage over a human who is trained against many different opponents. This advantage would seem to be enough to explain its success rate. It is this possibility that casts the most significant question mark on the world beating claims made by NUKKAI. More analysis is needed here and it will be interesting to hear how NUKKAI themselves reply to this point.
E) Ideally Nook should play in a way that disguises it’s own cards as much as possible from the opponents. It will be interesting to find out if it does play in this way. Can it also exploit situations where it is possible to deceive the defenders, albeit at risk, and gain from it? That would an exceptionally impressive achievement in my view.
F) Bridge has not been conquered by AI ….. yet!
A final note to you non-players. Bridge is an entrancing and life enhancing game but is rapidly becoming an old persons game, because the young now have many other gaming distractions to tempt them and new young blood is now a trickle where it used to be a flood. Yet for the aspiring rationalist I can think of no better training regime for Baysian thinking than the game of bridge. The examples I have presented represent a very small glimpse of the type of thinking required at the bridge table. If they have piqueued your interest, perhaps think about finding some like-minded friends and starting a game, or looking up a local bridge teacher and signing on for a course.