alignment researchers are clearly not in charge of the path we take to AGI
If that’s the case, we’re doomed no matter what we try. So we had better back up and change it.
Don’t springboard by RLing LLMs; you will get early performance gains and alignment will fail. We need to build something big we can understand. We probably need to build something small we can understand first.
If that’s the case, we’re doomed no matter what we try. So we had better back up and change it.
Don’t springboard by RLing LLMs; you will get early performance gains and alignment will fail. We need to build something big we can understand. We probably need to build something small we can understand first.