Draft comment on US AI policy, to be submitted shortly (deadline midnight ET tonight!) to the White House Office of Science and Technology Policy, who may largely shape official policy; See Dave Kasten’s brief description in his short form if you haven’t already, and regulations.gov/document/NSF_FRDOC_0001-3479… for their also brief description and submission email.
After thinking hard about it and writing Whether governments will control AGI is important and neglected, I’ve decided to share my whole opinion on the matter with the committee. My reasoning is that they’re going to twig to the national security significance of AGI and pursue it, and if they do that soon enough, they might work with China (or less plausibly independently) to slow proliferation of near-AGI and AGI. That would help us not die anyway if we can solve intent alignment.
My primary recommendation is that policy include forming groups who are actually competent to think this stuff through, since I don’t even know what the government should do.
I went a bit nuts on this thing, so it also functions as a broadly-accessible overview of how I see the current AI situation.
Last chance to talk me out of tipping off the government to the potentials and dangers of AGI! Not that they’re liable to listen, but I feel this is the sort of letter that could catch someones attention if I’m lucky/unlucky.
A few of the framings are heavily bent toward government perspectives, particularly “we can’t stop building AI” ( I think we should just let China charge on ahead, but in another sense we really practically can’t stop, as nobody in government would even consider it). But mostly it’s my honest opinion, just tilted hopfelly toward something an interested official might resonate with. It emphasizes my own honest uncertainty despite studying it awfully hard.
Here it is:
Gentlemen and ladies:
Your position in influencing US policy on AI development puts you on the stage of history. We have an opportunity for unprecedented positive impacts if we act decisively and wisely, and large risks if we do not.
Executive Summary
I recommend forming an independent committee or structured process that can work toward a better understanding of AI’s potential and risks in the near and medium term. I’ve been studying theories of AI progress professionally for two years, and I have a nearly ideal academic and research background. Yet I still don’t trust my own conclusions. No one understands the situation at this point with any certainty, so we should work together to understand it better.
We are building technology with enormous potential
That could be copied or stolen by our rivals
Or misused with disastrous results.
Risks include very rapid job loss in the short term
And destabilizing the MAD security doctrine in the medium term
By rapidly developing new military technologies
China should be considered a potential collaborator as well as a rival
They must also pursue AI for its potential
But fear its destabilizing risks
They might collaborate to slow proliferation of dangerous AI technology
As Russia and the US slowed proliferation of nuclear weapons
We can’t stop improving AI and progressing toward AGI
But we must rapidly figure out how to steer this process
To capitalize on opportunities
And avoid potential world-changing catastrophes.
We should collaboratively and privately study AI potentials and risks
Even a cautious and sober assessment of the potentials of AI should make it a high priority to better understand the opportunities and risks.
The progress of AI today may have striking similarities to the development of nuclear weapons. Einstein’s 1939 letter to President Roosevelt regarding nuclear weapons triggered the creation of an Advisory Committee on Uranium. It was in retrospect considered too slow and cautious, but importantly it established the scientific legitimacy of nuclear weapons, and launched the Manhattan Project that put the US in the lead. That allowed us to slow the proliferation of that world-changing technology, reducing the odds of a catastrophic nuclear exchange and establishing the MAD doctrine that has kept Western civilization largely at peace since then.
We should act quickly in case this situation is as urgent, as I and others believe. AI development and increases in its capabilities is unfortunately much harder to understand and predict than whether uranium can undergo a runaway fission process usable as a powerful bomb. We will need better collaborative structures, since experts currently remain in disagreement, and little progress is made in public or even the structured private discussions of which I’m aware.
Regulatory policies on AI development are probably irrelevant
The only other obvious policy move is to somehow monitor progress within the organizations who claim they’ll develop human-level AI within a few years.
Regulatory policy on AI development in the near term is liable to be largely symbolic. No proposed regulation I’m aware of has any real effect other than to slow development through bureaucracy. Government action on this unprecedented opportunity must go beyond playing to the voting public, and take responsibility for steering the future.
The best policy in a more rational world would be to dramatically slow the development of AI while we collectively work to understand its potential impacts. Failing that, broad laws agains creating autonomous systems might provide small reductions in risk. But I see no route to implementing such broad and impactful policy, so understanding and steering AI progress, and perhaps putting it under government control if it is approaching really world-changing capabilities, is likely the best we can do.
The remainder of this comment adds detail on this policy recommendation. It’s intended for a broad audience.
Near-term impacts: Economic and Security Benefits and Risks
Concerns about AI increasing discrimination or creating deep fakes and otherwise confusing public debate are all but irrelevant. The potentials and risks even in the near term dramatically outweigh those concerns.
The immediate potential upside of continued improvements in AI is large. Economic productivity and national security could benefit immensely from even modest improvements to current systems, and those are sure to happen. That potential will drive progress in the US and elsewhere, regardless of any regulatory action or policy.
The most pressing concern is potential rapid job displacement due to AI agents, which could cause global economic disaster, and possibly within a few years. AI could swiftly automate numerous professions simultaneously, leading to unprecedented unemployment rates and economic instability. Unlike previous technological transitions, the speed and scale of disruption from general-purpose AI that can learn to perform many jobs might leave no clear path to recovery, creating long-term or permanent economic damage that can’t be reversed without unprecedented redistribution to the permanently jobless, on a global scale.
Once agents capable of replacing human jobs are developed, they will be used by necessity, and any nation attempting to ban their use will rapidly fall behind. Understanding and controlling their rate of deployment, and having a plan to transition to a more productive economy with many fewer jobs available, seems paramount.
AI companies currently have civilian-level security, and publish their research methods in broad form, which allows foreign competitors to effectively copy their techniques (as with DeepSeek), or to outright steal the technology if a state-level cybersecurity operation becomes so inclined. Thus, the technology will proliferate unchecked unless we secure our AI development.
Of course the reality of these concerns should be investigated carefully before drastic action is taken; but time may well be short.
Broad description of the current state of AI and potential rapid progress
Advanced AI surpassing human intelligence could rapidly change national security concerns. And it is very difficult to predict when such intelligence might be achieved. Current systems are very limited, but they are only on the edge of learning autonomously like humans do, applying their intelligence to improving their understanding of the world and how to accomplish ones’ goals. Looking at current AI and concluding it will be a long time before it’s highly effective or dangerous might be like looking at a human ten-year-old and concluding they won’t become competent or dangerous — just before they start to really reflect and direct their own learning.
My own prediction, which is roughly as informed and expert as any (and in my biased opinion among the very best informed) is that we will see AI capable of taking over many jobs in 2-3 years, and that it will proceed fairly rapidly toward genuine superintelligence from there, taking mere years to surpass human thought in every useful area. But again, I don’t trust my own opinion, so we should carefully establish a process by which we can get better predictions- but we should do that with all haste.
Like nuclear fission, there is a potential runaway positive feedback cycle when AI becomes able to improve its own intelligence, or accelerate further progress in AI. There is an open question is whether, how, and how soon AI advances could lead to an “intelligence explosion”, creating AI that autonomously builds increasingly powerful AI.
AI is commonly considered a tool, but its most rapid progress is in general intelligence that can be applied to many problems. Anyone who hasn’t had deep conversations with current systems (Claude 3.7, ChatGPT 4.5, Gemini 2.0) needs to do so to grasp where we are with AI. It is worse than humans at some problems but better at others, and it can be applied to any problem that can be put into words. Some such problems are “how would you learn to do this job?” or “how could you do research to create new military technologies”). Current AI can’t do those things, but it can clearly think about how to do them, and it is being rapidly improved in a variety of ways.
Next generation AI: semi-autonomous agents powered by language models
The new generation of language models (OpenAI’s o1 and o3, DeepSeek and others) are trained to “think to themselves” using long and complex “chains of thought.” They can solve difficult problems better than the average specialized PhD on certain well-defined problems. And they keep performing better when they are allowed to “think” longer- extending into millions of words of thought on a single problem.
We are on the edge of deploying these systems as part of agents with memories and goals that will allow them to reason and to pursue the goals they were given. When they apply those long chains of thought to accomplishing the goals they were given, they may interpret those goals differently than we intended them, and they may arrive at unexpected strategies for accomplishing their given goals as they understand them.
This prospect is concerning, but the fact that these systems literally think in English is a huge advantage in keeping them working to follow our instructions as intended. OpenAI has recently published an important paper calling on all AI developers to develop these systems in ways that preserves those “faithful chains of thought” (I and others expect that the performance advantages of allowing future systems to “think” entirely in their own inscrutable language might well tempt us to give up the large safety advantage of being able to read our systems’ “thinking”).
Medium-term risks: loss of control or catastrophic misuse of autonomous AI
There are enormous economic incentives to develop agents capable of doing valuable work, and the current path is as described above: Large language models trained and designed to think for themselves and pursue goals. These would not be merely technologies; such systems are potentially around the corner, and they must be understood as entities.
At some point, progressing on autonomous AI becomes less like developing a technology, and more akin to inviting an alien species to land on our planet. If they are continually improved (or improve themselves), they will be highly intelligent and autonomous, and capable of reshaping human life dramatically.
Most people who seriously consider AI progress think there’s a real risk of a metaphorical “Sorcerer’s Apprentice” scenario, in which we have learned enough to set entities in motion, but have not learned to control them. Unlike the cartoon, we cannot rely on a happy ending unless we are far more considered and cautious than we have been in developing previous technologies.
The hope that we can use AI as powerful tool that only does what we want could come true. But given what we know now, it might be easily possible to turn systems like our current best AI, general language models, into “tools that use themselves.” That is equivalent to an entity; and while current language models copy humans styles of thinking and reasoning, their operation is vastly different than the human brain, so they would be better thought of loosely not as a technology but as alien species’ or wildly neurodivergent but brilliant humans.
Alignment of autonomous AI with human interests
Even if autonomous human-level-plus AI follows human commands, we should worry about it in the hands of humans who would use it to accomplish their own selfish or foolish goals. And we aren’t even close to sure we can ensure it follows commands as we intend them, or has such good “artificial ethics” that it does only things that benefit humanity.
We as a research community have ideas about how to make autonomous AIs kind, helpful, or at least obey orders- but we don’t have good or convincing plans, nor well-thought-out routes to get them. Expert opinions vary widely in how difficult the project of ensuring that autonomous AI remains safe. Disturbingly, those who’ve considered the problem most deeply seem to also estimate it as much harder, while those with expertise in AI but who have thought less about autonomous next-gen AI assume it will be relatively easy.
The uncertainty pointed to by these wildly varying estimates should be enough to make us cautious when we approach AI capable of autonomously acting and learning (and we may be rapidly approaching those capabilities). Stumbling into self-improving AI might work out well for humanity, but it might be like summoning a demon that interprets your deal differently than you’d hoped.
The intuitive response to these concerns is to say “maybe, but that’s sci-fi and it’s got to be far off”. That could be right, but we should not gamble humanity’s future on intuitions that could be guided by wishful thinking (technically called motivated reasoning or confirmation bias, a powerful and ubiquitous human tendency that was one area of study in the latter half of my academic career).
It’s easy to look at current AI and notice its weaknesses. One clever commentator likened that to meeting a studious but disorganized 14 year old and concluding that humans couldn’t accomplish much of anything, ever.
The prospect of a semi-autonomous entity that learns on its own and follows instructions only as it interprets them is terrifying if one can take it seriously. Most arguments that that can’t happen in the near future boil down mostly to wishful thinking and clever jokes. Serious thinkers need to consider the worst as well as the best possibilities.
But we can’t just stop building AI. The world won’t wait.
If and when autonomous, self-teaching AI is achieved, it’s imperative that it be in trustworthy hands.
Conclusion and summary
First we tamed animals to help with our work, then made machines that could do more. Next we made computers that could process information in very useful ways if we carefully programmed them to do so. Now we are training artificial minds that can think for us. Soon they’ll be able to think for themselves.
I have had the privilege of working full-time on these questions for around the last two years (as a research fellow at the Astera Institute; see my work and credentials at sethaherd.com). I humbly think my career as an academic working on cognitive psychology, systems neuroscience, and their AI applications, combined with my personal interests in ethical philosophy, clinical psychology, politics, and social dynamics is roughly as good as any other background for addressing these weighty matters. I have been thinking about them for the last twenty years.
I still don’t know what US policy on AI development should be.
We should work together to figure that out.
Respectfully and with kind regards,
Seth Herd, PhD
This document is approved for public dissemination. The document contains no business-proprietary or confidential information. Document contents may be reused by the government in developing the AI Action Plan and associated documents without attribution.
Draft comment on US AI policy, to be submitted shortly (deadline midnight ET tonight!) to the White House Office of Science and Technology Policy, who may largely shape official policy; See Dave Kasten’s brief description in his short form if you haven’t already, and regulations.gov/document/NSF_FRDOC_0001-3479… for their also brief description and submission email.
After thinking hard about it and writing Whether governments will control AGI is important and neglected, I’ve decided to share my whole opinion on the matter with the committee. My reasoning is that they’re going to twig to the national security significance of AGI and pursue it, and if they do that soon enough, they might work with China (or less plausibly independently) to slow proliferation of near-AGI and AGI. That would help us not die anyway if we can solve intent alignment.
My primary recommendation is that policy include forming groups who are actually competent to think this stuff through, since I don’t even know what the government should do.
I went a bit nuts on this thing, so it also functions as a broadly-accessible overview of how I see the current AI situation.
Last chance to talk me out of tipping off the government to the potentials and dangers of AGI! Not that they’re liable to listen, but I feel this is the sort of letter that could catch someones attention if I’m lucky/unlucky.
A few of the framings are heavily bent toward government perspectives, particularly “we can’t stop building AI” ( I think we should just let China charge on ahead, but in another sense we really practically can’t stop, as nobody in government would even consider it). But mostly it’s my honest opinion, just tilted hopfelly toward something an interested official might resonate with. It emphasizes my own honest uncertainty despite studying it awfully hard.
Here it is:
Gentlemen and ladies:
Your position in influencing US policy on AI development puts you on the stage of history. We have an opportunity for unprecedented positive impacts if we act decisively and wisely, and large risks if we do not.
Executive Summary
I recommend forming an independent committee or structured process that can work toward a better understanding of AI’s potential and risks in the near and medium term. I’ve been studying theories of AI progress professionally for two years, and I have a nearly ideal academic and research background. Yet I still don’t trust my own conclusions. No one understands the situation at this point with any certainty, so we should work together to understand it better.
We are building technology with enormous potential
That could be copied or stolen by our rivals
Or misused with disastrous results.
Risks include very rapid job loss in the short term
And destabilizing the MAD security doctrine in the medium term
By rapidly developing new military technologies
China should be considered a potential collaborator as well as a rival
They must also pursue AI for its potential
But fear its destabilizing risks
They might collaborate to slow proliferation of dangerous AI technology
As Russia and the US slowed proliferation of nuclear weapons
We can’t stop improving AI and progressing toward AGI
But we must rapidly figure out how to steer this process
To capitalize on opportunities
And avoid potential world-changing catastrophes.
We should collaboratively and privately study AI potentials and risks
Even a cautious and sober assessment of the potentials of AI should make it a high priority to better understand the opportunities and risks.
The progress of AI today may have striking similarities to the development of nuclear weapons. Einstein’s 1939 letter to President Roosevelt regarding nuclear weapons triggered the creation of an Advisory Committee on Uranium. It was in retrospect considered too slow and cautious, but importantly it established the scientific legitimacy of nuclear weapons, and launched the Manhattan Project that put the US in the lead. That allowed us to slow the proliferation of that world-changing technology, reducing the odds of a catastrophic nuclear exchange and establishing the MAD doctrine that has kept Western civilization largely at peace since then.
We should act quickly in case this situation is as urgent, as I and others believe. AI development and increases in its capabilities is unfortunately much harder to understand and predict than whether uranium can undergo a runaway fission process usable as a powerful bomb. We will need better collaborative structures, since experts currently remain in disagreement, and little progress is made in public or even the structured private discussions of which I’m aware.
Regulatory policies on AI development are probably irrelevant
The only other obvious policy move is to somehow monitor progress within the organizations who claim they’ll develop human-level AI within a few years.
Regulatory policy on AI development in the near term is liable to be largely symbolic. No proposed regulation I’m aware of has any real effect other than to slow development through bureaucracy. Government action on this unprecedented opportunity must go beyond playing to the voting public, and take responsibility for steering the future.
The best policy in a more rational world would be to dramatically slow the development of AI while we collectively work to understand its potential impacts. Failing that, broad laws agains creating autonomous systems might provide small reductions in risk. But I see no route to implementing such broad and impactful policy, so understanding and steering AI progress, and perhaps putting it under government control if it is approaching really world-changing capabilities, is likely the best we can do.
The remainder of this comment adds detail on this policy recommendation. It’s intended for a broad audience.
Near-term impacts: Economic and Security Benefits and Risks
Concerns about AI increasing discrimination or creating deep fakes and otherwise confusing public debate are all but irrelevant. The potentials and risks even in the near term dramatically outweigh those concerns.
The immediate potential upside of continued improvements in AI is large. Economic productivity and national security could benefit immensely from even modest improvements to current systems, and those are sure to happen. That potential will drive progress in the US and elsewhere, regardless of any regulatory action or policy.
The most pressing concern is potential rapid job displacement due to AI agents, which could cause global economic disaster, and possibly within a few years. AI could swiftly automate numerous professions simultaneously, leading to unprecedented unemployment rates and economic instability. Unlike previous technological transitions, the speed and scale of disruption from general-purpose AI that can learn to perform many jobs might leave no clear path to recovery, creating long-term or permanent economic damage that can’t be reversed without unprecedented redistribution to the permanently jobless, on a global scale.
Once agents capable of replacing human jobs are developed, they will be used by necessity, and any nation attempting to ban their use will rapidly fall behind. Understanding and controlling their rate of deployment, and having a plan to transition to a more productive economy with many fewer jobs available, seems paramount.
AI companies currently have civilian-level security, and publish their research methods in broad form, which allows foreign competitors to effectively copy their techniques (as with DeepSeek), or to outright steal the technology if a state-level cybersecurity operation becomes so inclined. Thus, the technology will proliferate unchecked unless we secure our AI development.
Of course the reality of these concerns should be investigated carefully before drastic action is taken; but time may well be short.
Broad description of the current state of AI and potential rapid progress
Advanced AI surpassing human intelligence could rapidly change national security concerns. And it is very difficult to predict when such intelligence might be achieved. Current systems are very limited, but they are only on the edge of learning autonomously like humans do, applying their intelligence to improving their understanding of the world and how to accomplish ones’ goals. Looking at current AI and concluding it will be a long time before it’s highly effective or dangerous might be like looking at a human ten-year-old and concluding they won’t become competent or dangerous — just before they start to really reflect and direct their own learning.
My own prediction, which is roughly as informed and expert as any (and in my biased opinion among the very best informed) is that we will see AI capable of taking over many jobs in 2-3 years, and that it will proceed fairly rapidly toward genuine superintelligence from there, taking mere years to surpass human thought in every useful area. But again, I don’t trust my own opinion, so we should carefully establish a process by which we can get better predictions- but we should do that with all haste.
Like nuclear fission, there is a potential runaway positive feedback cycle when AI becomes able to improve its own intelligence, or accelerate further progress in AI. There is an open question is whether, how, and how soon AI advances could lead to an “intelligence explosion”, creating AI that autonomously builds increasingly powerful AI.
AI is commonly considered a tool, but its most rapid progress is in general intelligence that can be applied to many problems. Anyone who hasn’t had deep conversations with current systems (Claude 3.7, ChatGPT 4.5, Gemini 2.0) needs to do so to grasp where we are with AI. It is worse than humans at some problems but better at others, and it can be applied to any problem that can be put into words. Some such problems are “how would you learn to do this job?” or “how could you do research to create new military technologies”). Current AI can’t do those things, but it can clearly think about how to do them, and it is being rapidly improved in a variety of ways.
Next generation AI: semi-autonomous agents powered by language models
The new generation of language models (OpenAI’s o1 and o3, DeepSeek and others) are trained to “think to themselves” using long and complex “chains of thought.” They can solve difficult problems better than the average specialized PhD on certain well-defined problems. And they keep performing better when they are allowed to “think” longer- extending into millions of words of thought on a single problem.
We are on the edge of deploying these systems as part of agents with memories and goals that will allow them to reason and to pursue the goals they were given. When they apply those long chains of thought to accomplishing the goals they were given, they may interpret those goals differently than we intended them, and they may arrive at unexpected strategies for accomplishing their given goals as they understand them.
This prospect is concerning, but the fact that these systems literally think in English is a huge advantage in keeping them working to follow our instructions as intended. OpenAI has recently published an important paper calling on all AI developers to develop these systems in ways that preserves those “faithful chains of thought” (I and others expect that the performance advantages of allowing future systems to “think” entirely in their own inscrutable language might well tempt us to give up the large safety advantage of being able to read our systems’ “thinking”).
Medium-term risks: loss of control or catastrophic misuse of autonomous AI
There are enormous economic incentives to develop agents capable of doing valuable work, and the current path is as described above: Large language models trained and designed to think for themselves and pursue goals. These would not be merely technologies; such systems are potentially around the corner, and they must be understood as entities.
At some point, progressing on autonomous AI becomes less like developing a technology, and more akin to inviting an alien species to land on our planet. If they are continually improved (or improve themselves), they will be highly intelligent and autonomous, and capable of reshaping human life dramatically.
Most people who seriously consider AI progress think there’s a real risk of a metaphorical “Sorcerer’s Apprentice” scenario, in which we have learned enough to set entities in motion, but have not learned to control them. Unlike the cartoon, we cannot rely on a happy ending unless we are far more considered and cautious than we have been in developing previous technologies.
The hope that we can use AI as powerful tool that only does what we want could come true. But given what we know now, it might be easily possible to turn systems like our current best AI, general language models, into “tools that use themselves.” That is equivalent to an entity; and while current language models copy humans styles of thinking and reasoning, their operation is vastly different than the human brain, so they would be better thought of loosely not as a technology but as alien species’ or wildly neurodivergent but brilliant humans.
Alignment of autonomous AI with human interests
Even if autonomous human-level-plus AI follows human commands, we should worry about it in the hands of humans who would use it to accomplish their own selfish or foolish goals. And we aren’t even close to sure we can ensure it follows commands as we intend them, or has such good “artificial ethics” that it does only things that benefit humanity.
We as a research community have ideas about how to make autonomous AIs kind, helpful, or at least obey orders- but we don’t have good or convincing plans, nor well-thought-out routes to get them. Expert opinions vary widely in how difficult the project of ensuring that autonomous AI remains safe. Disturbingly, those who’ve considered the problem most deeply seem to also estimate it as much harder, while those with expertise in AI but who have thought less about autonomous next-gen AI assume it will be relatively easy.
The uncertainty pointed to by these wildly varying estimates should be enough to make us cautious when we approach AI capable of autonomously acting and learning (and we may be rapidly approaching those capabilities). Stumbling into self-improving AI might work out well for humanity, but it might be like summoning a demon that interprets your deal differently than you’d hoped.
The intuitive response to these concerns is to say “maybe, but that’s sci-fi and it’s got to be far off”. That could be right, but we should not gamble humanity’s future on intuitions that could be guided by wishful thinking (technically called motivated reasoning or confirmation bias, a powerful and ubiquitous human tendency that was one area of study in the latter half of my academic career).
It’s easy to look at current AI and notice its weaknesses. One clever commentator likened that to meeting a studious but disorganized 14 year old and concluding that humans couldn’t accomplish much of anything, ever.
The prospect of a semi-autonomous entity that learns on its own and follows instructions only as it interprets them is terrifying if one can take it seriously. Most arguments that that can’t happen in the near future boil down mostly to wishful thinking and clever jokes. Serious thinkers need to consider the worst as well as the best possibilities.
But we can’t just stop building AI. The world won’t wait.
If and when autonomous, self-teaching AI is achieved, it’s imperative that it be in trustworthy hands.
Conclusion and summary
First we tamed animals to help with our work, then made machines that could do more. Next we made computers that could process information in very useful ways if we carefully programmed them to do so. Now we are training artificial minds that can think for us. Soon they’ll be able to think for themselves.
I have had the privilege of working full-time on these questions for around the last two years (as a research fellow at the Astera Institute; see my work and credentials at sethaherd.com). I humbly think my career as an academic working on cognitive psychology, systems neuroscience, and their AI applications, combined with my personal interests in ethical philosophy, clinical psychology, politics, and social dynamics is roughly as good as any other background for addressing these weighty matters. I have been thinking about them for the last twenty years.
I still don’t know what US policy on AI development should be.
We should work together to figure that out.
Respectfully and with kind regards,
Seth Herd, PhD
This document is approved for public dissemination. The document contains no business-proprietary or confidential information. Document contents may be reused by the government in developing the AI Action Plan and associated documents without attribution.