peterbarnett comments on Alignment Problems All the Way Down

peterbarnett 22 Jan 2022 14:51 UTC
1 point
0
So do you think that the only way to get to AGI is via a learned optimizer?
I think that the definitions of AGI (and probably optimizer) here are maybe a bit fuzzy.
I think it’s pretty likely that it is possible to develop AI systems which are more competent than humans in a variety of important domains, which don’t perform some kind of optimization process as part of their computation.
- Sam Ringer 23 Jan 2022 8:50 UTC
  2 points
  0
  Parent
  I think the failure case identified in this post is plausible (and likely) and is very clearly explained so props for that!
  
  However, I agree with Jacob’s criticism here. Any AGI success story basically has to have “the safest model” also be “the most powerful” model, because of incentives and coordination problems.
  
  Models that are themselves optimizers are going to be significantly more powerful and useful than “optimizer free” models. So the suggestion of trying to avoiding mesa-optimization altogether is a bit of a fabricated option. There is an interesting parallel here with the suggestion of just “not building agents” (https://www.gwern.net/Tool-AI).
  
  So from where I am sitting, we have no option but to tackle aligning the mesa-optimizer cascade head-on.
- jacob_cannell 22 Jan 2022 19:20 UTC
  2 points
  0
  Parent
  AGI will require both learning and planning, the latter of which is already then a learned mesa optimizer. And AGI may help create new AGI, which is also a form of mesa-optimization. Yes it’s unavoidable.
  
  To create friendly but powerful AGI, we need to actually align it to human values. Creating friendly but weak AI doesn’t matter.