Daniel Kokotajlo comments on Six Thoughts on AI Safety

Daniel Kokotajlo 25 Jan 2025 18:07 UTC
8 points
8
Constant instead of temporal allocation. I do agree that as capabilities grow, we should be shifting resources to safety. But rather than temporal allocation (i.e., using AI for safety before using it for productivity), I believe we need constant compute allocation: ensuring a fixed and sufficiently high fraction of compute is always spent on safety research, monitoring, and mitigations.

I think we should be cranking up the compute allocation now, and also we should be making written safety case sketches & publishing them for critique by the scientific community, and also if the safety cases get torn to shreds such that a reasonable disinterested expert would conclude ‘holy shit this thing is not safe, it plausibly is faking alignment already and/or inclined to do so in the future’ then we halt internal deployment and beef up our control measures / rebuild with a different safer design / etc. Does not feel like too much to ask, given that everyone’s lives are on the line.