then you might not get a chance to fix “alignment bugs” before you lose control of the AI.
This is pointing at a way in which solving alignment is difficult. I think a charitable interpretation of Chris Olah’s graph takes this kind of difficulty into account. Therefore I think the title is misleading.
I think your discussion of the AI-for-alignment-research plan (here and in your blog post) points at important difficulties (that most proponents of that plan are aware of), but also misses very important hopes (and ways in which these hopes may fail). Joe Carlsmith’s discussion of the topic is a good starting point for people who wish to learn more about these hopes and difficulties. Pointing at ways in which the situation is risky isn’t enough to argue that the probability of AI takeover is e.g. greater than 50%. Most people working on alignment would agree that the risk of that plan is unacceptably high (including myself), but this is a very different bar from “>50% chance of failure” (which I think is an interpretation of “likely” that would be closer to justifying the “this plan is bad” vibe of this post and the linked post).
This is pointing at a way in which solving alignment is difficult. I think a charitable interpretation of Chris Olah’s graph takes this kind of difficulty into account. Therefore I think the title is misleading.
I think your discussion of the AI-for-alignment-research plan (here and in your blog post) points at important difficulties (that most proponents of that plan are aware of), but also misses very important hopes (and ways in which these hopes may fail). Joe Carlsmith’s discussion of the topic is a good starting point for people who wish to learn more about these hopes and difficulties. Pointing at ways in which the situation is risky isn’t enough to argue that the probability of AI takeover is e.g. greater than 50%. Most people working on alignment would agree that the risk of that plan is unacceptably high (including myself), but this is a very different bar from “>50% chance of failure” (which I think is an interpretation of “likely” that would be closer to justifying the “this plan is bad” vibe of this post and the linked post).