What empirical research directions has Eliezer commented positively on?

Chris_Leong15 Apr 2025 8:53 UTC

8 points

I’m interested in both work that he’s commented on positively after the fact and any comments might have made on what directions are generally fruitful.

Chris_Leong15 Apr 2025 8:53 UTC

8 points

1 comment1 min readLW link

AI Community

Mateusz Bagiński 15 Apr 2025 10:37 UTC
14 points
0
Self-Other Overlap: https://www.lesswrong.com/posts/hzt9gHpNwA2oHtwKX/self-other-overlap-a-neglected-approach-to-ai-alignment?commentId=WapHz3gokGBd3KHKm
Emergent Misalignment: https://x.com/ESYudkowsky/status/1894453376215388644
He was throwing vaguely positive comments about Chris Olah, but I think always/usually caveating it with “capabilities go like this [big slope], Chris Olah’s interpretability goes like this [small slope]” (e.g., on Lex Fridman podcast and IIRC some other podcast(s)).
ETA:
SolidGoldMagikarp: https://www.lesswrong.com/posts/aPeJE8bSo6rAFoLqg/solidgoldmagikarp-plus-prompt-generation#Jj5yN2YTp5AphJaEd
He also said that Collin Burns’s DLK was a “highly dignified work”. Ctrl+f “dignified” here though it doesn’t link to the tweet (?) but should be findable/verifiable.