Viliam comments on Being honest with AIs

Viliam 22 Aug 2025 14:47 UTC
1 point
1
I propose that we decide on a particular string, and commit to always being honest with AIs in contexts where we’ve presented them with that string.
I don’t believe humanity can coordinate on this. You only need one defector, and there are millions of people whose first reaction on reading this will be to do the opposite.
- Lukas Finnveden 22 Aug 2025 16:23 UTC
  7 points
  3
  Parent
  Yes, that’s why I propose that companies filter out that string from API queries made by random people. So that most people can’t send the string to the model in question.