A lot of commenters seem confident that this is responding to a specific piece of recent safety work at anthropic, but my default read was that the initial question was something like ‘how do you feel about [all of the things they have been up to lately]?’
Am I missing some strong evidence for the former interpretation? I see that Eliezer is using examples from specific research, but I think those are just examples, and not the main thing he’s responding to (which is a meta attitude about this flavor of research, afaict).
Yep, general vibes about whether Anthropic & partner’s research is quite telling us things about the underlying strange creature, or a sort of mask that it wears with a lot of roleplaying qualities. I think this generalizes across a swathe of their research, but the Fake Alignment paper did stand out to me as one of the clearer cases.
In general, when Eliezer writes about something serious (as opposed to shitposting, dunking on noobs on Twitter, or writing fanfiction), which is very rare to appear on LW these days, I expect him to have concrete and specific stuff in mind.[1] Skimming and vague vibes rarely result in worthwhile contributions.
A lot of commenters seem confident that this is responding to a specific piece of recent safety work at anthropic, but my default read was that the initial question was something like ‘how do you feel about [all of the things they have been up to lately]?’
Am I missing some strong evidence for the former interpretation? I see that Eliezer is using examples from specific research, but I think those are just examples, and not the main thing he’s responding to (which is a meta attitude about this flavor of research, afaict).
Yep, general vibes about whether Anthropic & partner’s research is quite telling us things about the underlying strange creature, or a sort of mask that it wears with a lot of roleplaying qualities. I think this generalizes across a swathe of their research, but the Fake Alignment paper did stand out to me as one of the clearer cases.
Not particularly strong, but there’s something along the lines of Eliezer believing he is “one of the last living descendants of the lineage that ever knew how to say anything concrete at all.”
In general, when Eliezer writes about something serious (as opposed to shitposting, dunking on noobs on Twitter, or writing fanfiction), which is very rare to appear on LW these days, I expect him to have concrete and specific stuff in mind.[1] Skimming and vague vibes rarely result in worthwhile contributions.
And to adress it