That… really feels to me like it’s learning the wrong lessons. This somehow assumes that this is the only way you are going to get surprising misgeneralization.
Like, I am not arguing for irreducible uncertainty, but I certainly think the update of “Oh, zero shot anyone predicted this specific weird form of misgeneralization, but actually if you make this small tweak to it it generalizes super robustly” is very misguided.
Of course we are going to see more examples of weird misgeneralization. Trying to fix EM in particular feels like such a weird thing to do. I mean, if you have to locally ship products you kind of have to, but it certainly will not be the only weird thing happening on the path to ASI, and the primary lesson to learn is “we really have very little ability to predict or control how systems misgeneralize”.
That… really feels to me like it’s learning the wrong lessons. This somehow assumes that this is the only way you are going to get surprising misgeneralization.
Like, I am not arguing for irreducible uncertainty, but I certainly think the update of “Oh, zero shot anyone predicted this specific weird form of misgeneralization, but actually if you make this small tweak to it it generalizes super robustly” is very misguided.
Of course we are going to see more examples of weird misgeneralization. Trying to fix EM in particular feels like such a weird thing to do. I mean, if you have to locally ship products you kind of have to, but it certainly will not be the only weird thing happening on the path to ASI, and the primary lesson to learn is “we really have very little ability to predict or control how systems misgeneralize”.