This critique is similar to what I wrote here that “ChatGPT might sometimes give the wrong answer, but it doesn’t do the equivalent of becoming an artist when its parents wanted it to go to med school.”—the controls we have in training, including in particular the choice of data, are far more refined than evolution and indeed AIs would not have been so successful commercially if our training methods were not able to control their behavior to a large degree.
More generally I also agree with the broader point that the nature of the Y&S argument is somewhat anti empirical. We can try to debate whether the particular examples of failures of current LLMs are “human” or “alien” but I’m pretty sure that even if none of these examples existed, it would not cause Y&S to change their mind. So these, or any other evidence from currently existing systems that are not super intelligent, is not really load bearing for their theory.
This critique is similar to what I wrote here that “ChatGPT might sometimes give the wrong answer, but it doesn’t do the equivalent of becoming an artist when its parents wanted it to go to med school.”—the controls we have in training, including in particular the choice of data, are far more refined than evolution and indeed AIs would not have been so successful commercially if our training methods were not able to control their behavior to a large degree.
More generally I also agree with the broader point that the nature of the Y&S argument is somewhat anti empirical. We can try to debate whether the particular examples of failures of current LLMs are “human” or “alien” but I’m pretty sure that even if none of these examples existed, it would not cause Y&S to change their mind. So these, or any other evidence from currently existing systems that are not super intelligent, is not really load bearing for their theory.