control the output of GPT-4? Then learn about the world, not about transformers.

baturinsky 16 Mar 2023 10:26 UTC
3 points
0
Answer to question 4 is that they were trying to define the Luigi behavior by explicitly describing Waluigi and saying to be the opposite. Which does not work well even with humans. (When someone is told not to think about something, it can actually increase the likelihood that they will think about it. This is because the brain has difficulty processing negative statements without also activating the associated concept. In the case of the “white monkey” example, the more someone tries to suppress thoughts of a white monkey, the more frequently and intensely they may actually think about it—from GPT’s explanation of white monkey aka pink elephant phenomenon)
I think referring Waluigi can only work well if it is used only in the separate independent “censor” AI, which does not output anything by itself and only suppresses certain behaviour of the primary one. As the Bing and character.ai already do, it seems.

baturinsky comments on Want to predict/​explain/​control the output of GPT-4? Then learn about the world, not about transformers.