joshc comments on What Is The Alignment Problem?

joshc 14 Mar 2025 6:04 UTC
LW: 4 AF: 3
0
AF
You don’t seem to have mentioned the alignment target “follow the common sense interpretation of a system prompt” which seems like the most sensible definition of alignment to me (its alignment to a message, not to a person etc). Then you can say whatever the heck you want in that prompt, including how you would like the AI to be corrigible.
- johnswentworth 14 Mar 2025 6:08 UTC
  LW: 4 AF: 3
  2
  AF Parent
  That’s basically Do What I Mean.