For example, I think a post that was a strict superset of this post, which contained the same scatological dataset alongside several similar ones, and which was called something like “Testing the limits of emergent misalignment” would do worse at the intended job of this post. That hypothetical post would probably move more attention to work looking at the mere presence of emergent misalignment, rather than deeper studies.
For example, I think a post that was a strict superset of this post, which contained the same scatological dataset alongside several similar ones, and which was called something like “Testing the limits of emergent misalignment” would do worse at the intended job of this post. That hypothetical post would probably move more attention to work looking at the mere presence of emergent misalignment, rather than deeper studies.
I like your framing. Very cool to see as a junior researcher.