Kabir Kumar comments on Kabir Kumar’s Shortform

Kabir Kumar 10 Oct 2025 16:33 UTC
1 point
0
A superintelligent AI will be better at breaking evals than humans, so I expect there is a big gap between “our top researchers have tried and failed to find any loopholes in our alignment evals” and “a superintelligence will not be able to find any loopholes”
Agreed. This is why this plan is ultimately just a tool to get good/useful theory work done faster, more efficiently, etc.