So do you think that the only way to get to AGI is via a learned optimizer? I think that the definitions of AGI (and probably optimizer) here are maybe a bit fuzzy.
I think it’s pretty likely that it is possible to develop AI systems which are more competent than humans in a variety of important domains, which don’t perform some kind of optimization process as part of their computation.
I think the failure case identified in this post is plausible (and likely) and is very clearly explained so props for that!
However, I agree with Jacob’s criticism here. Any AGI success story basically has to have “the safest model” also be “the most powerful” model, because of incentives and coordination problems.
Models that are themselves optimizers are going to be significantly more powerful and useful than “optimizer free” models. So the suggestion of trying to avoiding mesa-optimization altogether is a bit of a fabricated option. There is an interesting parallel here with the suggestion of just “not building agents” (https://www.gwern.net/Tool-AI).
So from where I am sitting, we have no option but to tackle aligning the mesa-optimizer cascade head-on.
AGI will require both learning and planning, the latter of which is already then a learned mesa optimizer. And AGI may help create new AGI, which is also a form of mesa-optimization. Yes it’s unavoidable.
To create friendly but powerful AGI, we need to actually align it to human values. Creating friendly but weak AI doesn’t matter.
So do you think that the only way to get to AGI is via a learned optimizer?
I think that the definitions of AGI (and probably optimizer) here are maybe a bit fuzzy.
I think it’s pretty likely that it is possible to develop AI systems which are more competent than humans in a variety of important domains, which don’t perform some kind of optimization process as part of their computation.
I think the failure case identified in this post is plausible (and likely) and is very clearly explained so props for that!
However, I agree with Jacob’s criticism here. Any AGI success story basically has to have “the safest model” also be “the most powerful” model, because of incentives and coordination problems.
Models that are themselves optimizers are going to be significantly more powerful and useful than “optimizer free” models. So the suggestion of trying to avoiding mesa-optimization altogether is a bit of a fabricated option. There is an interesting parallel here with the suggestion of just “not building agents” (https://www.gwern.net/Tool-AI).
So from where I am sitting, we have no option but to tackle aligning the mesa-optimizer cascade head-on.
AGI will require both learning and planning, the latter of which is already then a learned mesa optimizer. And AGI may help create new AGI, which is also a form of mesa-optimization. Yes it’s unavoidable.
To create friendly but powerful AGI, we need to actually align it to human values. Creating friendly but weak AI doesn’t matter.