joshc comments on What Is The Alignment Problem?

joshc 14 Mar 2025 6:04 UTC
LW: 4 AF: 3
0
AF
You don’t seem to have mentioned the alignment target “follow the common sense interpretation of a system prompt” which seems like the most sensible definition of alignment to me (its alignment to a message, not to a person etc). Then you can say whatever the heck you want in that prompt, including how you would like the AI to be corrigible (or incorrigible if you are worried about misuse).