MSRayne comments on [Intro to brain-like-AGI safety] 14. Controlled AGI

MSRayne 16 Jun 2022 20:06 UTC
3 points
0
AF
Your human flourishing example sounds like it wouldn’t generalize well. As the AI’s capacities grow stronger it would start taking more and more work for humans to analyze its plans and determine how much flourishing is in them, and if it grows more intelligent after we deploy it we will have no way to determine if its thought assessor generalizes wrongly. This is, I would think, a rather basic and obvious flaw in relying on any part of the world model directly.
As for how to code that stuff, well, I’ll figure out how to do that after we’ve all figured out how to mathematically specify those things. :P
- Steven Byrnes 16 Jun 2022 21:07 UTC
  3 points
  0
  AF Parent
  it would start taking more and more work for humans to analyze its plans and determine how much flourishing is in them
  I’m not sure where you’re getting that. The thing I described in my last comment did not include the humans analyzing the AI’s plans, it only involved the humans labeling YouTube videos.
  It would be lovely if humans could reliably analyze the AI’s plans. But I fear that our interpretability techniques will not be up to that challenge.
  we will have no way to determine if its thought assessor generalizes wrongly
  I agree, see §14.4.
  - MSRayne 16 Jun 2022 21:34 UTC
    LW: 3 AF: 2
    0
    AF Parent
    Ah, sorry, I misunderstood you.