Rohin Shah comments on The theory-practice gap

Rohin Shah 22 Sep 2021 13:59 UTC
LW: 4 AF: 4
AF
It’s nothing quite so detailed as that. It’s more like “maybe in the exotic circumstances we actually encounter, the objective does generalize, but also maybe not; there isn’t a strong reason to expect one over the other”. (Which is why I only say it is plausible that the AI system works fine, rather than probable.)
You might think that the default expectation is that AI systems don’t generalize. But in the world where we’ve gotten an existential catastrophe, we know that the capabilities generalized to the exotic circumstance; it seems like whatever made the capabilities generalize could also make the objective generalize in that exotic circumstance.
- Edouard Harris 23 Sep 2021 14:12 UTC
  LW: 3 AF: 3
  AF Parent
  I see. Okay, I definitely agree that makes sense under the “fails to generalize” risk model. Thanks Rohin!