Agree with Bezzi. Confusion about chess notation and game rules wasn’t intended to happen, and I don’t think it applies very well to the real-world example. Yes, the human in the real world will be confused about which actions would achieve their goals, but I don’t think they’re very confused about what their goals are: create an aligned ASI, with a clear success/failure condition of are we alive.
You’re correct that the short time control was part of the experimental design for this game. I was remarking on how this game is probably not as accurate of a model of the real-world scenario as a game with longer time controls, but “confounder” was probably not the most accurate term.
I’m guessing that the right move is Qc5.
At the end of the Qxb5 line (after a4), White can respond with Rac1, to which Black doesn’t really have a good response. b6 gets in trouble with the d6 discovery, and Nd2 just loses a pawn after Rxc7 Nxb2 Rxb7 - Black may have a passed pawn on a4, but I doubt it’s enough not to lose.
That being said, that wasn’t actually what made me suspect Qc5 was right. It’s just that Qxb5 feels like a much more natural, more human move than Qc5. Before I even looked at any lines, I thought, “well, this looks like Richard checked with a computer, and it found a move better than the flawed one he thought of: Qc5.” Maybe this is even a position from a game Richard played, where the engine suggested Qc5 when he was analyzing it afterwards, or something like that.
I’m only about… 60% confident in that theory, but if I am right, it’ll… kind of invalidate the experiment for me, because the factor of “does it feel like a human move” isn’t something that’s supposed to be considered. Unfortunately, I’m not that good at making my brain ignore that factor and analyze the position without it.
Hoping I’m wrong; if it turns out “check if it feels human” isn’t actually helpful, I’ll hopefully be able to analyze other puzzles without paying attention to that.
Because I want to keep the option of being able to make promises. This way, people can trust that, while I might not answer every question they ask, the things that I do say to them are the truth. If I sometimes lie to them, that’s no longer the case, and I’m no longer able to trustworthily communicate at all.
Meta-honesty is an alternate proposed policy that could perhaps reduce some of the complication, but I think it only adds new complication because people have to ask you questions on the meta level whenever you say something for which they might suspect a lie. That being said, I also do stick to meta-honesty rules, and am always willing to discuss why I’m Glomarizing about something or my general policies about lying.
If B were the same level as A, then they wouldn’t pose any challenge to A; A would be able to beat them on their own without listening to the advice of the Cs.
I saw it fine at first, but after logging out I got the same error. Looks like you need a Chess.com account to see it.
I’ve created a Manifold market if anyone wants to bet on what happens. If you’re playing in the experiment, you are not allowed to make any bets/trades while you have private information (that is, while you are in a game, or if I haven’t yet reported the details of a game you were in to the public.)
The problem is that while the human can give some rationalizations as to “ah, this is probably why the computer says it’s the best move,” it’s not the original reasoning that generated those moves as the best option, because that took place inside the engine. Some of the time, looking ahead with computer analysis is enough to reproduce the original reasoning—particularly when it comes to tactics—but sometimes they would just have to guess.
[facepalms] Thanks! That idea did not occur to me and drastically simplifies all of the complicated logistics I was previously having trouble with.
Sounds like a good strategy! …although, actually, I would recommend you delete it before all the potential As read it and know what to look out for.
Agreed that it could be a bit more realistic that way, but the main constraint here is that we need a game where there are three distinct levels of players who always beat each other. The element of luck in games like poker and backgammon makes that harder to guarantee (as suggested by the stats Joern_Stoller brought up). And another issue is that it’ll be harder to find a lot of skilled players at different levels from any game that isn’t as popular as chess is—even if we find an obscure game that would in theory be a better fit for the experiment, we won’t be able to find any Cs for it.
No computers, because the advisors should be reporting their own reasoning (or, 2⁄3 of the time, a lie that they claim is their own reasoning.) I would prefer to avoid explicit coordination between the advisors, because the AIs might not have access to each other in the real world, but I’m not sure at the moment whether player A can show the advisors each other’s suggestions and ask for critiques. I would prefer not to give either dishonest advisor information on who the other two were, since the real-world AIs probably can’t read each other’s source code.
I was thinking I would test the players to make sure they really could beat each other as they should be able to. Good points on using blitz and doing the test afterwards; the main constraint as to whether it happens before or after the game is that I would prefer to do it beforehand to know whether the rankings were accurate rather than playing for weeks and only later realizing we were doing the wrong test.
I wasn’t thinking of much in the way of limits on what Cs could say, although possibly some limits on whether the Cs can see and argue against each other’s advice. C’s goal is pretty much just “make A win the game” or “make A lose the game” as applicable.
I’m definitely thinking a prototype would help. I’ve actually been contacted about applying for a grant to make this a larger experiment, and I was planning on first running a one-day game or two as a prototype before expanding it with more people and longer games.
Individual positions like that could be an interesting thing to test; I’ll likely have some people try out some of those too.
I think the aspect where the deceivers have to tell the truth in many cases to avoid getting caught could make it more realistic, as in the real AI situation the best strategy might be to present a mostly coherent plan with a few fatal flaws.