on a meta level I wonder whether I should have actually been less straightforward in my presentation of what I believed. In theory, there’s a difference between optimizing for Alex to win, and being completely honest to Alex, and it might have been better for me to have been more strategic about my presentation. As in, not suggesting suspicious-looking moves like 30. f7, even though I thought they were right. Optimizing in someone’s favor by not being completely honest with them sure is a really risky sort of thing to do, and I doubt I really could have pulled it off all that well, but it’s something to take into consideration in the real-world AI scenario.
One option to mitigate the risk is to be open about what you’re doing. “I think the best move here is X, but I realize that X looks very suspicious, so I’m going to recommend that you do Y instead in order to hedge against me being dishonest.”
One option to mitigate the risk is to be open about what you’re doing. “I think the best move here is X, but I realize that X looks very suspicious, so I’m going to recommend that you do Y instead in order to hedge against me being dishonest.”