Well, if someone originally started worrying based on strident predictions of sophisticated internal reasoning with goals independent of external behavior, then realizing that’s currently unsubstantiated should cause them to down-update on AI risk. That’s why it’s relevant. Although I think we should have good theories of AI internals.
I know I reacted to this comment, but I want to emphasize that this:
Well, if someone originally started worrying based on strident predictions of sophisticated internal reasoning with goals independent of external behavior,
Is to first order arguably the entire AI risk argument, that is if we make the assumption that the external behavior gives strong evidence about it’s internal structure, then there is no reason to elevate the AI risk argument at all, given the probably aligned behavior of GPTs when using RLHF.
More generally, the stronger the connection between external behavior and internal goals, the less worried you should be about AI safety, and this is a partial disagreement with people that are more pessimistic, albeit I have other disagreements there.
Well, if someone originally started worrying based on strident predictions of sophisticated internal reasoning with goals independent of external behavior, then realizing that’s currently unsubstantiated should cause them to down-update on AI risk. That’s why it’s relevant. Although I think we should have good theories of AI internals.
I know I reacted to this comment, but I want to emphasize that this:
Is to first order arguably the entire AI risk argument, that is if we make the assumption that the external behavior gives strong evidence about it’s internal structure, then there is no reason to elevate the AI risk argument at all, given the probably aligned behavior of GPTs when using RLHF.
More generally, the stronger the connection between external behavior and internal goals, the less worried you should be about AI safety, and this is a partial disagreement with people that are more pessimistic, albeit I have other disagreements there.