Thane Ruthenis comments on A Case for the Least Forgiving Take On Alignment

Thane Ruthenis 6 May 2023 16:08 UTC
LW: 4 AF: 3
0
AF
Interesting, thanks.
I don’t expect a discontinuous jump at the point you hit the universality property
Agreed that this point (universality leads to discontinuity) probably needs to be hashed out more. Roughly, my view is that universality allows the system to become self-sustaining. Prior to universality, it can’t autonomously adapt to novel environments (including abstract environments, e. g. new fields of science). Its heuristics have to be refined by some external ground-truth signals, like trial-and-error experimentation or model-based policy gradients. But once the system can construct and work with self-made abstract objects, it can autonomously build chains of them — and that causes a shift in the architecture and internal dynamics, because now its primary method of cognition is iterating on self-derived abstraction chains, instead of using hard-coded heuristics/modules.
- Rohin Shah 6 May 2023 16:22 UTC
  LW: 4 AF: 3
  0
  AF Parent
  I agree that there’s a threshold for “can meaningfully build and chain novel abstractions” and this can lead to a positive feedback loop that was not previously present, but there will already be lots of positive feedback loops (such as “AI research → better AI → better assistance for human researchers → AI research”) and it’s not clear why to expect the new feedback loop to be much more powerful than the existing ones.
  (Aside: we’re now talking about a discontinuity in the gradient of capabilities rather than of capabilities themselves, but sufficiently large discontinuities in the gradient of capabilities have much of the same implications.)
  - Thane Ruthenis 6 May 2023 16:56 UTC
    LW: 4 AF: 3
    0
    AF Parent
    it’s not clear why to expect the new feedback loop to be much more powerful than the existing ones
    Yeah, the argument here would rely on the assumption that e. g. the extant scientific data already uniquely constraint some novel laws of physics/engineering paradigms/psychological manipulation techniques/etc., and we would be eventually able to figure them out even if science froze right this moment. In this case, the new feedback loop would be faster because superintelligent cognition would be faster than real-life experiments.
    And I think there’s a decent amount of evidence for this. Consider that there are already narrow AIs that can solve protein folding more efficiently than our best manually-derived algorithms — which suggests that better algorithms are already uniquely constrained by the extant data, and we’ve just been unable to find them. Same may be true for all other domains of science — and thus, a superintelligence iterating on its own cognition would be able to outspeed human science.