Unfortunately, the summary of my post “Inner Misalignment in “Simulator” LLMs” is inaccurate and makes the same mistake I wrote the post to address.
I have subsections on (what I claim are) four distinct alignment problems:
Outer alignment for characters
Inner alignment for characters
Outer alignment for simulators
Inner alignment for simulators
The summary here covers the first two, but not the third or fourth—and the fourth one (“inner alignment for simulators”) is what I’m most concerned about in this post (because I think Scott ignores it, and because I think it’s hard to solve).
I can suggest an alternate summary when I find the time. If I don’t get to it soon, I’d prefer that this post just link to my post without a summary.
Thanks again for making these posts, I think it’s a useful service to the community.
Thanks for writing these summaries!
Unfortunately, the summary of my post “Inner Misalignment in “Simulator” LLMs” is inaccurate and makes the same mistake I wrote the post to address.
I have subsections on (what I claim are) four distinct alignment problems:
Outer alignment for characters
Inner alignment for characters
Outer alignment for simulators
Inner alignment for simulators
The summary here covers the first two, but not the third or fourth—and the fourth one (“inner alignment for simulators”) is what I’m most concerned about in this post (because I think Scott ignores it, and because I think it’s hard to solve).
I can suggest an alternate summary when I find the time. If I don’t get to it soon, I’d prefer that this post just link to my post without a summary.
Thanks again for making these posts, I think it’s a useful service to the community.