Daniel Kokotajlo comments on Daniel Kokotajlo’s Shortform

Daniel Kokotajlo 8 Jul 2025 17:05 UTC
4 points
0
I think it gives some evidence, but not lots. But then there have been other similar findings as well, that add more evidence, such as the anthropic talk. TBC I’m not at all confident, in fact I’m probably still less than 50% that reinforcement will be the (main) optimization target. But my credence in this hypothesis has risen over the last six months basically.