ryan_greenblatt comments on Daniel Kokotajlo’s Shortform

ryan_greenblatt 19 Feb 2025 20:27 UTC
2 points
0

Maybe? At a very high level, I think the weights tend not to have “goals,” in the way that the rollouts tend to have goals.

Sure, I meant natural emerging malign goals to include both “the ai pursues non myopic objectives” and “these objectives weren’t intended and some (potentially small) effort was spent trying to prevent this”.

(I think AIs that are automating huge amounts of human labor will be well described as pursuing some objective at least within some small context (e.g. trying to write and test a certain piece of software), but this could be well controlled or sufficiently myopic/narrow that the ai doesn’t focus on steering the general future situation including its own weights.)