Noosphere89 comments on AI #15: The Principle of Charity

Noosphere89 8 Jun 2023 16:36 UTC
1 point
−4

That does not mean that the techniques that align current systems will transfer to smarter-than-human or otherwise actually dangerous systems. I expect most if not all current techniques to fail exactly when we need them not to fail, although many disagree and it is possible I am wrong. Even if they do fail, they could (or could not) offer insight that helps figure out something that would work in their place, or help us understand better how hard the underlying problems are.

This depends on what we mean by it transferring. If we stick to Non-RL approaches like LLMs, I almost certainly think that alignment will still work even at very high capabilities. In particular, I expect the profit motive alone to solve the alignment problem for LLMs, and expand the set of AIs we can align from there.