I predicted your odds of winning to be 50% with queen+rook odds, 1% with queen odds, 0.2% with 2 bishops odds, and 0.1% with rook odds. When you started describing strategies tailored to odds games that you were going to use, I felt cheated! I thought you were just going to play your normal 1100-rated game, but I made a big mistake. I forgot that you’re a general intelligence, not a narrow, 1100-rated chess AI. Stockfish’s NNUE was never trained on positions like the ones at the start of your odds games since they can’t be reached from a normal 32-piece start, so its ability to generalize to these new board states is anybody’s guess. What you intended as a conflict between a less intelligent[1] agent with more resources and a more intelligent agent with less resources turned out to be (more intelligent + less resources + less general) vs. (less intelligent + more resources + more general). If the generalization gap helped you at all, then the amount of resources required to overcome a given intelligence gap is higher than your experiment suggests. Bad news for our species.
I expect the chess analogy would break down when considering agents that can obfuscate their intentions and actions better than a human can. If it could not win in a direct conflict, why wouldn’t it just wait until it could?
I do love the “brains vs. brawns” framing, though, and I’m looking forward to what you write about asymmetrical conflicts. If you or anyone else is interested in repeating this experiment without the confounding variable of generality, I suggest using a handicapped version of SF, set to some arbitrary rating. A truly enterprising person could probably automate many matches of of SF vs. SF at different ratings and make a series of 2D plots, each plot being of a different material imbalance, with their ratings plotted against each other, showing the odds of the higher rated one winning for a given rating pair. Unfortunately, the result of that might be largely determined by how it’s handicapped, since doing it by reducing search depth would mean the deeper seeing one would see strictly more than the other. I suspect this handicapping would much more strongly favor the more intelligent SF than handicapping by occasionally playing random legal moves.
I predicted your odds of winning to be 50% with queen+rook odds, 1% with queen odds, 0.2% with 2 bishops odds, and 0.1% with rook odds. When you started describing strategies tailored to odds games that you were going to use, I felt cheated! I thought you were just going to play your normal 1100-rated game, but I made a big mistake. I forgot that you’re a general intelligence, not a narrow, 1100-rated chess AI. Stockfish’s NNUE was never trained on positions like the ones at the start of your odds games since they can’t be reached from a normal 32-piece start, so its ability to generalize to these new board states is anybody’s guess. What you intended as a conflict between a less intelligent[1] agent with more resources and a more intelligent agent with less resources turned out to be (more intelligent + less resources + less general) vs. (less intelligent + more resources + more general). If the generalization gap helped you at all, then the amount of resources required to overcome a given intelligence gap is higher than your experiment suggests. Bad news for our species.
I expect the chess analogy would break down when considering agents that can obfuscate their intentions and actions better than a human can. If it could not win in a direct conflict, why wouldn’t it just wait until it could?
I do love the “brains vs. brawns” framing, though, and I’m looking forward to what you write about asymmetrical conflicts. If you or anyone else is interested in repeating this experiment without the confounding variable of generality, I suggest using a handicapped version of SF, set to some arbitrary rating. A truly enterprising person could probably automate many matches of of SF vs. SF at different ratings and make a series of 2D plots, each plot being of a different material imbalance, with their ratings plotted against each other, showing the odds of the higher rated one winning for a given rating pair. Unfortunately, the result of that might be largely determined by how it’s handicapped, since doing it by reducing search depth would mean the deeper seeing one would see strictly more than the other. I suspect this handicapping would much more strongly favor the more intelligent SF than handicapping by occasionally playing random legal moves.
measuring intelligence as Elo rating at standard chess, without odds