Kaj_Sotala comments on AGI Safety FAQ / all-dumb-questions-allowed thread

Kaj_Sotala 10 Jun 2022 19:44 UTC
9 points
6
Assume you have a very simple reinforcement learning AI that does nothing but chooses between two actions, A and B. And it has a goal of “maximizing reward”. “Reward”, in this case, doesn’t correspond to any qualia; rather “reward” is just a number that results from the AI choosing a particular action. So what “maximize reward” actually means in this context is “choose the action that results in the biggest numbers”.
Say that the AI is programmed to initially just try choosing A ten times in a row and B ten times in a row.
When the AI chooses A, it is shown the following numbers: 1, 2, 2, 1, 2, 2, 1, 1, 1, 2 (total 15).
When the AI chooses B, it is shown the following numbers: 4, 3, 4, 5, 3, 4, 2, 4, 3, 2 (total 34).
After the AI has tried both actions ten times, it is programmed to choose its remaining actions according to the rule “choose the action that has historically had the bigger total”. Since action B has had the bigger total, it then proceeds to always choose B.
To achieve this, we don’t need to build the AI to have qualia, we just need to be able to build a system that implements a rule like “when the total for action A is greater than the total for action B, choose A, and vice versa; if they’re both equal, pick one at random”.
When we say that an AI “is rewarded”, we just mean “the AI is shown bigger numbers, and it has been programmed to act in ways that result in it being shown bigger numbers”.
We talk about the AI having “goals” and “wanting” things by an application of the intentional stance. That’s Daniel Dennett’s term for the idea that, even if a chess-playing AI had a completely different motivational system than humans do (and chess-playing AIs do have that), we could talk about it having a “goal” of “wanting” to win at chess. If we assume that the AI “wants” to win the chess, then we can make more accurate predictions of its behavior—for instance, we can assume that it won’t make moves that are obviously losing moves if it can avoid them.
What’s actually going on is that the chess AI has been programmed with rules like “check whether a possible move would lead to losing the game and if so, try to find another move to play instead”. There’s no “wanting” in the human sense going on, but it still acts in the kind of a way that a human would act, if that human wanted to win a game of chess. So saying that the AI “wants” to win the game is a convenient shorthand for “the AI is programmed to play the kinds of moves that are more likely to lead it to win the game, within the limits of its ability to predict the likely outcomes of those moves”.

Kaj_Sotala comments on AGI Safety FAQ /​ all-dumb-questions-allowed thread

Kaj_Sotala comments on AGI Safety FAQ / all-dumb-questions-allowed thread