Thanks for the comment! I agree that we live in a highly suboptimal world, and I do not think we are going to make it, but it’s worth taking our best shot.
I don’t think of the CoEm agenda as “doing AGI right.” (for one, it is not even an agenda for building AGI/ASI, but of bounding ourselves below that) Doing AGI right would involve solving problems like P vs PSPACE, developing vastly more deep understanding of Algorithmic Information Theory and more advanced formal verification of programs. If I had infinite budget and 200 years, the plan would look very different, and I would feel very secure in humanity’s future.
Alas, I consider CoEm an instance of a wider class of possible alignment plans that I consider the “bare minimum for Science to work.” I generally think any plans more optimistic than this require some other external force of things going well, which might be empirical facts about reality (LLMs are just nice because of some deep pattern in physics) or metaphysics (there is an actual benevolent creator god intervening specifically to make things go well, or Anthropic Selection is afoot). Many of the “this is what we will get, so we have to do this” type arguments just feel like cope to me, rather than first principles thinking of “if my goal is a safe AI system, what is the best plan I can come up with that actually outputs safe AI at the end?”, reactive vs constructive planning. Of course, in the real world, it’s tradeoffs all the way down, and I know this. You can read some of my thoughts about why I think alignment is hard and current plans are not on track here.
I don’t consider this agenda to be maximally principled or aesthetically pleasing, quite the opposite, it feels like a grubby engineering compromise that simply has a minimum requirement to actually do science in a non-insane way. There are of course various even more compromising positions, but I think those simply don’t work in the real world. I think the functionalist/teleological/agent based frameworks that are currently being applied to alignment work on LW are just too confused to ever really work in the real world, the same way how I think that the models of alchemy just can never actually get you to a safe nuclear reactor and you need to at least invent calculus (or hell at least better metallurgy!) and do actual empiricism and stuff.
As for pausing and governance, I think governance is another mandatory ingredient to a good outcome, most of the work there I am involved with happens through ControlAI and their plan “A Narrow Path”. I am under no illusion that these political questions are easy to solve, but I do believe they are possible and necessary to solve, and I have a lot of illegible inside info and experience here that doesn’t fit into a LW comment. If there is no mechanism by which reckless actors are prevented from killing everyone else by building doomsday machines, we die. All the technical alignment research in the world is irrelevant to this point. (And “pivotal acts” are an immoral pipedream)
Thanks for the comment! I agree that we live in a highly suboptimal world, and I do not think we are going to make it, but it’s worth taking our best shot.
I don’t think of the CoEm agenda as “doing AGI right.” (for one, it is not even an agenda for building AGI/ASI, but of bounding ourselves below that) Doing AGI right would involve solving problems like P vs PSPACE, developing vastly more deep understanding of Algorithmic Information Theory and more advanced formal verification of programs. If I had infinite budget and 200 years, the plan would look very different, and I would feel very secure in humanity’s future.
Alas, I consider CoEm an instance of a wider class of possible alignment plans that I consider the “bare minimum for Science to work.” I generally think any plans more optimistic than this require some other external force of things going well, which might be empirical facts about reality (LLMs are just nice because of some deep pattern in physics) or metaphysics (there is an actual benevolent creator god intervening specifically to make things go well, or Anthropic Selection is afoot). Many of the “this is what we will get, so we have to do this” type arguments just feel like cope to me, rather than first principles thinking of “if my goal is a safe AI system, what is the best plan I can come up with that actually outputs safe AI at the end?”, reactive vs constructive planning. Of course, in the real world, it’s tradeoffs all the way down, and I know this. You can read some of my thoughts about why I think alignment is hard and current plans are not on track here.
I don’t consider this agenda to be maximally principled or aesthetically pleasing, quite the opposite, it feels like a grubby engineering compromise that simply has a minimum requirement to actually do science in a non-insane way. There are of course various even more compromising positions, but I think those simply don’t work in the real world. I think the functionalist/teleological/agent based frameworks that are currently being applied to alignment work on LW are just too confused to ever really work in the real world, the same way how I think that the models of alchemy just can never actually get you to a safe nuclear reactor and you need to at least invent calculus (or hell at least better metallurgy!) and do actual empiricism and stuff.
As for pausing and governance, I think governance is another mandatory ingredient to a good outcome, most of the work there I am involved with happens through ControlAI and their plan “A Narrow Path”. I am under no illusion that these political questions are easy to solve, but I do believe they are possible and necessary to solve, and I have a lot of illegible inside info and experience here that doesn’t fit into a LW comment. If there is no mechanism by which reckless actors are prevented from killing everyone else by building doomsday machines, we die. All the technical alignment research in the world is irrelevant to this point. (And “pivotal acts” are an immoral pipedream)