I missed this when it was posted, so appreciate the mention here. Upon a quick skim, it occurs to me that in order to qualify as an AI safety plan, as opposed to approach (which seems fine, and is the word used in the paper title), it would need to explicitly list and address the major safety-relevant contingencies.
For example, the paper has a section on “Amplified Oversight” but no estimate of how likely this research direction is to fail (reach a dead end or fail to achieve positive results in time), what GDM plans to do in that case, or what GDM thinks our civilization should do, e.g., whether it would be too late to re-aim for an AI pause at that point. (I looked for other documents from GDM which may have addressed this point, but was unable to find one. Pointers welcome if it’s already addressed elsewhere.)
Also, apologies if this feels like goalpost-moving or tangential to the previous discussion. These thoughts occurred to me as I was reading the paper with “plan” in the back of my mind, and they seem worth writing down independently of your discussion with Zvi.
Yup I think you are correct in how you are interpreting the document. I ~never call it a “plan” myself, though this is mostly because the word “plan” has been given a new meaning by the AGI safety community that makes it no longer a useful word (Paul #26 expresses a similar sentiment).[1]
My original point was just using this paper as evidence against the claim that “GDM’s plan is to have AIs do our alignment homework”. If one instead said “GDM has no plan”, I would think that they were being misleading but I wouldn’t think they were misinformed / wrong.
For example, outside of this community, it is not the case that a “plan” is expected to have an explicit estimate of how likely the proposed actions will fail. I haven’t looked into it, but I expect that e.g. the Baruch Plan did not involve an extended section talking about what happens if the United Nations became ineffective due to gridlock between member countries and so could not conduct its proposed inspections appropriately.
For example, outside of this community, it is not the case that a “plan” is expected to have an explicit estimate of how likely the proposed actions will fail. I haven’t looked into it, but I expect that e.g. the Baruch Plan did not involve an extended section talking about what happens if the United Nations became ineffective due to gridlock between member countries and so could not conduct its proposed inspections appropriately.
At one point my comment draft had a parenthetical “(unless the contingencies are obvious or common sensical)” but I removed it before submitting to be more concise. I think in the case of the Baruch Plan (which I confirmed does not have an explicit contingency covering potential failure), it’s pretty obvious what would happen in case of gridlock or other kind of dysfunction: the world would go back to the status quo of a nuclear arms race.
But in case of GDM’s approach, I genuinely have little idea what you (personally or GDM as an org) think should or would happen if e.g. Amplified Oversight were found to be lacking down the line, or how likely you think this is. Without this information, how does someone discuss or judge how good it is as a plan, or decide whether we (various decision makers, or society as a whole) should allow GDM to carry it out?
To me, these are some of the main purposes of “having a plan”, which makes me tempted to say “GDM doesn’t have a plan”, but I’ll note your objection to this, and think about how else to convey my point in the future.
I missed this when it was posted, so appreciate the mention here. Upon a quick skim, it occurs to me that in order to qualify as an AI safety plan, as opposed to approach (which seems fine, and is the word used in the paper title), it would need to explicitly list and address the major safety-relevant contingencies.
For example, the paper has a section on “Amplified Oversight” but no estimate of how likely this research direction is to fail (reach a dead end or fail to achieve positive results in time), what GDM plans to do in that case, or what GDM thinks our civilization should do, e.g., whether it would be too late to re-aim for an AI pause at that point. (I looked for other documents from GDM which may have addressed this point, but was unable to find one. Pointers welcome if it’s already addressed elsewhere.)
Also, apologies if this feels like goalpost-moving or tangential to the previous discussion. These thoughts occurred to me as I was reading the paper with “plan” in the back of my mind, and they seem worth writing down independently of your discussion with Zvi.
Yup I think you are correct in how you are interpreting the document. I ~never call it a “plan” myself, though this is mostly because the word “plan” has been given a new meaning by the AGI safety community that makes it no longer a useful word (Paul #26 expresses a similar sentiment).[1]
My original point was just using this paper as evidence against the claim that “GDM’s plan is to have AIs do our alignment homework”. If one instead said “GDM has no plan”, I would think that they were being misleading but I wouldn’t think they were misinformed / wrong.
For example, outside of this community, it is not the case that a “plan” is expected to have an explicit estimate of how likely the proposed actions will fail. I haven’t looked into it, but I expect that e.g. the Baruch Plan did not involve an extended section talking about what happens if the United Nations became ineffective due to gridlock between member countries and so could not conduct its proposed inspections appropriately.
At one point my comment draft had a parenthetical “(unless the contingencies are obvious or common sensical)” but I removed it before submitting to be more concise. I think in the case of the Baruch Plan (which I confirmed does not have an explicit contingency covering potential failure), it’s pretty obvious what would happen in case of gridlock or other kind of dysfunction: the world would go back to the status quo of a nuclear arms race.
But in case of GDM’s approach, I genuinely have little idea what you (personally or GDM as an org) think should or would happen if e.g. Amplified Oversight were found to be lacking down the line, or how likely you think this is. Without this information, how does someone discuss or judge how good it is as a plan, or decide whether we (various decision makers, or society as a whole) should allow GDM to carry it out?
To me, these are some of the main purposes of “having a plan”, which makes me tempted to say “GDM doesn’t have a plan”, but I’ll note your objection to this, and think about how else to convey my point in the future.