Seth Herd comments on If anyone builds it, everyone will plausibly be fine

Seth Herd 23 Sep 2025 17:35 UTC
2 points
0
I’ll just toss in my answer (I think it’s a fairly common perspective on this):
An agent capable of running a one-year research project would probably (not certainly) be a general reasoner. One could construct an agent that could do a year worth of valuable research without giving it the ability to do general reasoning. But when you think about how a human does a year of valuable research, it is absolutely crucial that the employ general problem-solving skills often to debug that progress and make it actually worthwhile (let alone producing any breakthroughs).
If it can do general reasoning, then it can formulate the questions “why am I doing this?” and “what if I did become god-emperor?” (or other context-expanding thoughts). That creates the new problems I discuss in LLM AGI may reason about its goals which I think are background assumptions of the “alignment is hard” worldview.
I think that’s why the common intuition is that you’d need to solve the alignment problem to have an aligned one-year human-level atent.
Perhaps your “year of research” isn’t meant to be particularly groundbreaking, just doing a bunch of studies on how LLMs produce outputs. Figuring out how those results contribute to actually solving the alignment problem might be left to humans. This research could be done theoretically by a subhuman or non-general-reasoning AI. It doesn’t look to me like progress is going in this direction, but this is a matter of opinion and interpretation; my argument here isn’t sufficient to establish general reasoners as the easier and therefor likelier path to year-long research agents, but I do think it’s quite likely to happen that way in the current trajectory..
If we do make year-long research agents that can’t really reason but can still do useful research, this might be helpful, but I don’t think it’s reasonable to assume this will get alignment solved. Having giant stacks of poorly-thought-out research could be really useful, or almost not at all. And note the compute bottleneck in running research, and the disincentives against spending a year running tons of agents doing research.