Agreed, I think these examples don’t provide nearly enough evidence to conclude that the AIs want reward (or anything else really, it’s too early to say!) I’d want to see a lot more CoT examples from multiple different AIs in multiple different settings, and hopefully also do various ablation tests on them. But I think that this sort of more thorough investigation would provide inconclusive-but-still-substantial evidence to help us narrow down the possibility space!
Agreed, I think these examples don’t provide nearly enough evidence to conclude that the AIs want reward (or anything else really, it’s too early to say!) I’d want to see a lot more CoT examples from multiple different AIs in multiple different settings, and hopefully also do various ablation tests on them. But I think that this sort of more thorough investigation would provide inconclusive-but-still-substantial evidence to help us narrow down the possibility space!