Thanks for pointing that out and for the engagement despite that. I have changed the title and added a short note on the edit.
What really bothers me is the trajectory we seem to have chosen: Continue to scale the models and monitor them for misalignment. This plan has some obvious flaws:
1. Verifiable way to know if we got an aligned or seemingly aligned AI as a result; since evaluations can’t distinguish between the two.
2. White-box techniques seem to be pretty limited currently and it is uncertain if we will get distinguishable signals if most reliable techniques are developed past a certain(unknown) capability mark.
3. If we continue moving towards more automated pipelines, because we feel it is safe, we won’t be able to limit catastrophes.
I don’t see much push from technical perspective against this trajectory. For people starting out work in AI safety from technical perspective, I don’t see many suggestions that challenge this trajectory and propose alternatives. I see there is scientist AI from Yoshua Bengio but it doesn’t seem to be discussed as much. I see theoretical work with Simplex and SLT, but seems still at beginning stage.
And since most beginner work is about replicating and extending current work, it creates this chain-reaction, and as a result, a majority of the work ends up being the one following the same trajectory.
I got that. I’m pointing out that you’re probably being downvoted because your title is quite inaccurate.
Thanks for pointing that out and for the engagement despite that. I have changed the title and added a short note on the edit.
What really bothers me is the trajectory we seem to have chosen: Continue to scale the models and monitor them for misalignment. This plan has some obvious flaws:
1. Verifiable way to know if we got an aligned or seemingly aligned AI as a result; since evaluations can’t distinguish between the two.
2. White-box techniques seem to be pretty limited currently and it is uncertain if we will get distinguishable signals if most reliable techniques are developed past a certain(unknown) capability mark.
3. If we continue moving towards more automated pipelines, because we feel it is safe, we won’t be able to limit catastrophes.
I don’t see much push from technical perspective against this trajectory. For people starting out work in AI safety from technical perspective, I don’t see many suggestions that challenge this trajectory and propose alternatives. I see there is scientist AI from Yoshua Bengio but it doesn’t seem to be discussed as much. I see theoretical work with Simplex and SLT, but seems still at beginning stage.
And since most beginner work is about replicating and extending current work, it creates this chain-reaction, and as a result, a majority of the work ends up being the one following the same trajectory.