Seth Herd comments on Yonatan Cale’s Shortform

Seth Herd 14 Jan 2025 18:55 UTC
5 points
2
I think all 7 of those plans are far short of adequate to count as a real plan. There are a lot of more serious plans out there, but I don’t know where they’re nicely summarized.
What’s the short timeline plan? poses this question but also focuses on control, testing, and regulation—almost skipping over alignment.
Paul Christiano’s and Rohin Shah’s work are the two most serious. Neither of them have published a “this is the plan” concise statement, and both have probably substantially updated their plans.
These are the standard-bearers for “prosaic alignment” as a real path to alignment of AGI and ASI. There is tons of work on aligning LLMs, but very little work AFAICT on how and whether that extends to AGIs based on LLMs. That’s why Paul and Rohin are the standard bearers despite not working publicly directly on this for a few years.
I work primaily on this, since I think it’s the most underserved area of AGI x-risk—aligning the type of AGI people are most likely to build on the current path.
My plan can perhaps be described as extending prosaic alignment to LLM agents with new techniques, and from there to real AGI. A key strategy is using instruction-following as the alignment target. It is currently probably best summarized in my response to “what’s the short timeline plan?”