Seth Herd comments on Increasing AI Strategic Competence as a Safety Approach

Seth Herd 3 Feb 2026 20:47 UTC
7 points
0
It seems like this doesn’t even take a lot of strategic competence (although that would be a nice addition).

It seems like all this takes is better general reasoning abilities. And that might not be too much to hope for.

Imagine someone asks GPT6 “Should we slow down progress toward AGI?” it might very well answer something like:
It looks like people with the highest level of relevant expertise disagree dramatically on whether we’re on track to solve alignment. So yes you should probably slow down unless you want to take a big risk in order to reach breakthroughs faster.
Assumptions and logic:
I’m assuming that people who have thought about this the most, including surrounding relevant expertise, are a lot more likely to be right. This is how things work in pretty much every other domain we can get answers about. Most of the people who think it’s very likely to succeed seem to not have much expertise, or be highly motivated to think it will succeed, or both. So it looks like if you don’t want to take at least a 20% chance of everyone dying or an otherwise really bad future getting locked in, yes you should probably slow down.

I can do a bunch more web searches and write up a very lengthy report if you like, but based on my knowledge base and thinking about it for a few seconds, it seems like the conclusion is very likely to be along these lines. Different assumptions would probably just shift the odds of disastrous outcomes from 20-80% (there are so many unknowns I doubt I’d go farther toward the extremes under any realistic assumptions)
Would you like help composing a letter to your representative or your loved ones?
I’m writing a post now on how emulating human-like metacognition might be a route to better and more reliable reasoning. Abram Demski has also done some work on reducing slop from LLMs with similar goals in mind (although both of us are thinking more of slop reduction for working directly on alignment. The strategic advice is an angle I’ve only considered a little, but it seems like an important one.
There may be better cooperative strategies that a reasonably intelligent and impartial system could recommend to everyone. I’ve worried in the past that such strategies don’t exist, but I’m far from sure and hopeful that they do. A little logically sound strategic advice might make the difference if it’s given impartially to all involved.