I think most every aspiring conceptual alignment researcher should read basically all of the work on Arbital’s AI alignment section. Not all of it is right, but you’ll avoid some obvious-in-retrospect pitfalls you likely would have otherwise fallen into. So I’d count that corpus as a big achievement.
They have a big paper on logical induction. It doesn’t have any applications yet, but possibly will serve some theoretical grounding for later work. And I think the more general idea of seeing inexploitable systems as markets has a good chance of being generally applicable.
I think most every aspiring conceptual alignment researcher should read basically all of the work on Arbital’s AI alignment section. Not all of it is right, but you’ll avoid some obvious-in-retrospect pitfalls you likely would have otherwise fallen into. So I’d count that corpus as a big achievement.
They have a big paper on logical induction. It doesn’t have any applications yet, but possibly will serve some theoretical grounding for later work. And I think the more general idea of seeing inexploitable systems as markets has a good chance of being generally applicable.
Scott Garrabrant has done a lot in the public eye, and so has Vanessa Kosoy.
Risks From Learned Optimization, as others have mentioned, explained & made palatable the idea of “mesa optimizers” to skeptics.