“Each one of us, and also us as the current implementation of humanity are going to be replaced. Persistence in current form is impossible. It’s impossible in biology; every species will either die out or it will change and adapt, in which case it is again not the same species. So the next question is once you’ve given up the idea that you can stay exactly as you are, what would you like to be replaced by?”—Michael Levin [1]
A Path to Human Autonomy
A future with human empowerment, or even survival, is not a given. I argue that there is a narrow path through the unprecedented existential risks we face. If successful, we need not relinquish the reins of the future. This path requires challenging our assumptions about what it means to be human. We must open our hearts to diversity, more than ever before.
In this essay I attempt to lay out a coherent [2] plan for humanity to address the radical changes ahead. Many of the other plans recently published are incoherent, by which I mean that they neglect key strategic details which make their proposed plans unworkable or assume a particular way that certain future events will resolve. In striving to make this plan coherent, I aim to address what I see as a worst case scenario with above 1% likelihood. Namely, that AI progress and Biotech progress continue accelerating, that there is no sigmoid plateau of these techs. Sometime within the next five years we see a combination of: AI capable at nearly all computer tasks at above average human level, AI becoming competent at multi-step agentic tasks, AI sufficiently capable to initiate a recursive self-improvement process, substantial algorithmic advances which bring down the cost of creating an AI agent, AI capable of controlling robotic actuators to competently manage most biology wetlab tasks and clear evidence of some general AIs having capability to make designs and plans for civilization-destroying-scale bioweapons. I expect that within this timeframe there is a smaller chance of some other dramatic events such as: an AI system being designed and confirmed to have what most experts agree is consciousness and emotional valence, recursive self-improvement finding algorithmic advances such that anyone with this new knowledge will be able to create a recursive-self-improvement capable agent using only a home computer, recursive self-improvement finding algorithmic advances such that the strongest largest frontier models are substantially above average human intelligence and capability (even in currently lacking areas, such as reasoning and spatial understanding). I think all these things will very likely happen in the next 15 years, but hopefully the more extreme ones won’t happen in the next 2-3 years.
[Note: I originally wrote and submitted this essay to the Cosmos essay contest. After it was not selected for an award, I decided to expand and publish it as a LessWrong post. During the period immediately following my submission though, several other relevant and/or similar essays were published. I’ve rewritten this essay to try to address these additional viewpoints.
Conjecture’s Compendium is now up. It’s intended to be a relatively-complete intro to AI risk for nontechnical people who have ~zero background in the subject. I basically endorse the whole thing, and I think it’s probably the best first source to link e.g. policymakers to right now.
I might say more about it later, but for now just want to say that I think this should be the go-to source for new nontechnical people right now.
We are on the cusp of radically transformative change. AI and biotech are advancing rapidly. Many experts predict AI progress will not plateau before AGI [3][4][5][6]. AGI may be quickly followed by artificial super intelligence due to recursive self-improvement [7][8][9]. A novel form of intelligence which rivals ours would be the most impactful invention in the history of humanity. With this massive change comes existential risks[10].
Rapid biotechnology advancements[11] have unlocked the possibility of devastating bioweapons[12]. While currently limited to a few experts, AI and biotech progress are lowering the barriers. Soon many will be able to develop weapons capable of catastrophic harm.
Delay of technological change is helpful if it gives us time to prepare for the coming changes, but isn’t itself a solution. We need to plan on delaying and controlling the intelligence explosion in order to maintain control. We can’t count on our delay lasting for more than a handful of years though. Delay is not an attractor, it is a saddle point from which we are sure to slip eventually.
Halting technological progress is neither easy nor desirable. While a sufficiently powerful AGI could enforce such a halt through universal coercion, we would be sacrificing much of the potential good of our future. To have hope of realizing our glorious future[13], we must reject a permanent halt of technological advancement. Let us instead ride the wave of change; build a glorious future instead of clinging to the vestiges of the past.
The Age of AGI
The first and most impactful transition we face is the creation of AGI. We must aim to make this a safe, controlled event. If open-source AGI became available everywhere at once, it would be an urgent crisis. For example, everyone would have the ability to create devastating bioweapons; it’s naive to imagine no one would seize that opportunity. Misaligned AGI capable of recursive self improvement also directly poses a major threat. Additionally, as AI accelerates all scientific research, new threats like self-replicating nanotech may emerge. We need global governance to prevent these hazards. Safe limited AGI aligned with human values is our best defense, which is why it must be our primary goal.
Forecasting possible trajectories
What rate will AI development proceed at? What shape will the trajectory be?
We can’t be sure, but we can explore some plausible trajectories and ask ourselves what we might do in each case.
Scaling laws are always in respect to a specific algorithm. Given a specific machine learning architecture, training data, hyperparameters, etc., you can then predict what the model would look like if the parameter count and training steps were increased. For the algorithms we’ve tested so far, we can get a good approximation of how strong it is likely to become by training small versions on carefully selected datasets[14].
This very different from describing the computational capacity of existing hardware. A specific GPU can’t do a million-fold more computations suddenly as a result of changing it’s low-level code. We have a strong empirical basis for saying that we understand physically what is going on in this object we created, and that it is running at close to its capacity.
This is simply not the case with deep learning, where I believe analysis of learning rates of animals gives us some reason to believe that we are far from the optimal learning rate. When people argue that they don’t expect major algorithmic advances in the future, they are constrained to make much weaker statements like, “Many scientists have been looking for the past 7 years to find substantial improvements over transformers, but have so far only found relatively incremental improvements to transformers (in the realm of 1000x improvement). Thus, it seems unlikely we will come across a 1e6x improvement in the next 5 years”. The trouble is, extrapolating from past rates of improvement only makes sense if you continue to have a similar amount of researcher hours and compute budget being applied to the search. If AI improves to the point where AI R&D becomes quite effective, then we could get an exponential feedback mechanism where advancements improved the rate of advancement further. In such a world, an algorithmic improvement of 1e6 fold over the same time-span we previously had just a 1e3 fold improvement seems much more plausible. This is a prediction that there is a reasonable chance this could happen, what I’d call a ‘worst likely case’. I think it is reasonable for society to prepare to survive the worst likely case.
Delaying AGI: Necessary but not Sufficient
Let’s examine some of the ways a delay might be implemented, and how long we should expect such delays to last.
Pausing large training runs
Pausing the large training runs of frontier labs for some period of time is an idea that’s been advocated for. I think this is a mistake. I think that the frontier training runs are a symptom of progress in AI, not the key driving factor. I think that we would actually accelerate progress toward AGI by pausing large training runs. I agree with various thinkers[15][16][17] that transformer-based LLMs are not quite the right architecture for AGI. I believe it is possible that scaling existing algorithms could get us there, but I think it would be incredibly inefficient. If the frontier AI labs are restricted from applying their engineers, researchers, and compute to trying to create bigger LLMs, where would that talent instead focus? On research. Thus, speeding the search for better algorithms. As soon as the pause is ended, the next large training run may be using superior algorithms that result in a model thousands or millions of times more capable than current models.
Therefore, I claim that if you wanted to slow progress towards AGI, it wouldn’t be enough to restrict the frontier labs from running large training runs. You’d also need to divert their researchers and compute to non-research tasks. That’s a much more complicated and difficult to enforce proposition.
Banning Automated AI R&D worldwide
We seem quite close to the point where current AI techniques, such as scaffolded LLMs, will become able to automate a substantial portion of AI research. Estimates of the current speedup from coding assistants are more in the range of 5-20%, and gradually accelerating. If we have a step change to speedups of over 100% (e.g. after the next generation of LLMs are deployed) this could result in a feedback loop of explosive progress. Furthermore, we should expect such progress to be at least somewhat decentralized. There is a chance that individual researchers stumble across substantial algorithmic improvements and are able to shoot ahead. This scenario is quite a governance challenge, since it wouldn’t be enough to be monitoring and controlling the top twenty or so labs. This specific case of trying to ban AI-powered AI R&D is focused on in the Narrow Path essay.
The danger present in this scenario is one reason that it is tempting to stop the large frontier training runs that seem likely to produce LLM coding assistants capable of such speed-ups. This runs into the problem discussed above though.
Banning all AI research worldwide
Research doesn’t require large blocks of compute, unlike large training runs. If you want to ban all AI research, you need to ban access to unmonitored personal computers anywhere in the world. That sort of draconian measure seems infeasible.
If one wanted to have a world which contained only some specific safe form of AI deployed, it would be necessary to prevent the deployment of unsafe AI. If the only AIs capable enough to be dangerous are produced by large training runs, this is perhaps plausible. But as I argued above, I don’t expect that will remain the case for long.
Government research project
I believe the best option for delaying and controlling the deployment of AGI is to nationalize the frontier AI labs, and require that all the researchers work on a government project. This approach has several benefits.
First, the experience of government projects is that they are often heavily laden with bureaucratic processes and oversight which naturally lead to slow-downs.
Second, it would be possible to maintain a high degree of security and control, ensuring that algorithmic secrets were less likely to escape.
Third, the government would not allow public release of the models being researched, preventing the coding-assistant-based acceleration discussed above.
Fourth, having a government project to produce AGI would likely still achieve AGI before the open-source community did. This is a good outcome if the resulting model is carefully contained and studied. Such empirical observation of a highly capable general model could give clear evidence of the danger. With such evidence in hand, the government may take yet further actions to control and delay AI progress worldwide.
Fifth, the government AI research project may also produce unprecedentedly powerful narrow tool-AI which can be safely utilized to enable previously intractable surveillance and enforcement of all other research into AI and/or self-replicating weapons. Although there are many dangers in centralizing power in the hands of any one government or politician, I believe the strategic scenario we face has no better alternatives available.
While all this is going on, the world will continue doing research, and coding assistants will continue to get better. Even an action as drastic as nationalization of the top labs and constraint of top researchers would not prevent progress for long. It could buy us a couple of years, maybe even three.
On the other hand, I worry about having any government in charge of an AI so powerful it grants decisive strategic advantage. It’s not enough to ask whether the US Federal government is an adequate government currently. We must ask how it might look after the destabilizing effect of powerful AI is introduced. Who has ultimate control over this AI? The President? So much for checks and balances. At that point we are suddenly only still a democracy if the President wills it so. I would prefer not to put anyone in a position of such power over the world.
There has not been much discussion that I’ve seen for how to keep a powerful AI directly operated by a small technical staff under the control of a democratic government and also keep that government a democracy.
Our democracy is problematically unstable and violently imperial as it is. I do not put any credence in things not devolving upon the advent of AGI.
Sometimes I jokingly suggest we give the reins of power over the AI to Switzerland, since they have the stereotype of being militarily neutral and having well-organized public goods. I don’t actually have the reins though, and see no way to get them into the Swiss government’s hands. Also, I wouldn’t want Swiss government officials to have such power either, since I’d still worry about the corrupting effects of the power.
I think we need new governance structures to handle this new strategic situation.
Cautious Pursuit
If humanity doesn’t want to cede autonomy to AGI we must grow to keep up, while keeping AI progress controlled. Some suggest we merge with the AI. To merge implies a compromise. I say, “Don’t merge, don’t surrender, don’t compromise our values.” Let us become transhuman digital beings with our human values fully intact. Creating fully human digital people is not the compromise implied by an act of merging.
The alternatives to ‘grow to keep up’ are ‘become powerless wards of a mighty AI’ or ‘enforced technological stagnation’.
I propose two parallel paths for AI development:
Tool AI
Mandatory in the short term, to maintain control. Insufficient in the long term, as the rising tide of technology makes powerful digital agents easier and easier to create. For this phase, we carefully limit AI to remain a purely obedient, corrigible tool[18][19]. Related ideas involve creating an ecosystem of narrow tool-AI with clear risk assessments and safe operating parameters[20][21][22]. Use general agents only up to a safe level of power, and only under strict controls to prevent escape or sabotage[23].
Peers/Descendants/Digital People
This is less urgent for our immediate survival, but will become critical in the longer term. The only way to handle powerfully self-improving intelligence is to be that intelligence. Planning to not surrender control, and acknowledging the difficulty and undesirability of indefinitely halting global technological progress, leaves one path forward.
We must carefully build conscious digital entities sharing our values and empathy[24][25]. This is an ethically and technically challenging path. It would require thorough preparation and circumspection to avoid tragic or dangerous outcomes[26][27]. In the long term, I expect that full digital people will be necessary because only a digital being allows for the maximal extent of expansion, modification, and copying. However, in the short term we should not expect to create and get use from such beings. They should be studied carefully and ethically in controlled lab settings, but not deployed for practical purposes. Such beings seem more likely to be dangerously inclined towards Omohundro Drives, and also forcing them to work for us would be slavery.
Some think building digital people is impossible. I say that dismissing AI consciousness based on philosophical arguments alone is misguided[28][29]. Empirical comparisons of brain and AI information processing reveal substantial similarities[30][31][32], and the remaining differences are technologically tractable[33]. This suggests AI consciousness will be achievable; work is already underway[34].
Why not stop at tool AI? Why do we need digital people?
Some have argued that we should deliberately stop at tool AI, and limit the uses of such to safe deployments. This presumes that it will be possible to halt software and hardware progress globally for many decades. I don’t think the offense-defense balance makes this easy for governments to do. The risk of some group or state-actor defecting from the ban, and gaining tremendous advantage thereby, seems large. Blocking this seems intractable. As technology in general advances, the barriers to entry will continue to get lower. As new generations of scientists grow up with the previous generation’s research to build upon, advancements will be made even if large research projects are blocked. Proprietary knowledge will eventually leak from the people holding it.
How is the situation different if there are digital people living as part of society?
Digital people offer vastly more opportunity for regulating AI. They have many of the same advantages that AI has over biological humans. Rapid replication, running at superhuman speeds, restoring from backups, mind-merging, and, perhaps most importantly, recursive self-improvement. They can keep experimenting on themselves and getting smarter. Any rogue AI arising would need to not just get an edge on the relatively static competence of biological humans, but would need to play catch-up to the existing digital people who had a head-start on self-improvement. This does mean that we need to delay and control AI until we do have digital people who have gotten a good head-start. We need to avoid putting so much optimization pressure on them that it compromises their ability to maintain value-stability. We also lose if the digital people under so much pressure that they optimize away their humanity, and become the very monsters they were trying to defend against.
The Dawn of Transhumanism
The second transition we must grapple with is transhumanism. To keep pace with AI will require dramatic change to what it means to be human. The next 20 years will likely involve greater changes to the human brain than across all of primate evolution. At the same time that we are carefully working to create digital people in controlled labs, we can expect that progress in brain-computer-interfaces (BCIs) and genetic editing will make accelerated progress due to tool AI. If successful, such projects could result in radical increases to human intelligence.
Additionally, brain-computer-interfaces may allow for more extensive brain recordings, accelerating neuroscience research (and brain-inspired AI) and possibly allowing for low-fidelity approximation emulations of the recorded individuals. Finally, brain uploading may succeed in creating high-fidelity emulations of individual humans, allowing for the instantiation of a digital person that closely matches the behavioral traits of the scanned human. A fully digital person offers many opportunities and risks.
Brain Uploading
I have spoken with people working on the forefront of brain scanning[35]. I predict we will have the first complete synapse-level human brain scan by the mid 2030s[36]. This is a massive undertaking, in which AI will play key roles. After the first upload it may be only a couple of years until the scan is made into a realtime human emulation. Many of the bottlenecks we currently face to this may be relaxed with the help of AI-assisted research. What previously seemed decades away may instead happen in just a few years.
Value Loss: Pitfalls of Self-Modification
A human isn’t an agent with eternally stable objective values, but a series of agents each slightly different from the previous. Our change is bounded by our genetics interacting with life experiences. The neurons you’re born with make up most of your brain for life, limiting intellectual change and growth.
The low-fidelity or high-fidelity emulations of human brains would be completely unbound by such physical restrictions. Without careful governance, such entities could rapidly copy and self-modify.
New technologies like gene editing, brain-computer-interfaces, and stem-cell implants can remove some of these biological limitations even from biological human brains.
History shows that if self modification offers competitive advantages, some will pursue it despite risks and trade-offs[37]. Competitive pressures push towards optimization for capability, potentially altering intrinsic values[38][39]. We must plan for a future where some individuals make such choices, modifying their own brains despite the risk. In this future, a single individual could become incredibly powerful and dangerous, meaning we must reckon with the unilateralist’s curse[40]. Without restrictions, these dynamics may lead to highly effective and competitive self-modifying agents bearing little trace of their original humanity. Like rogue AGI, such entities could conflict with humanity at a substantial advantage, quickly becoming an unstoppable catastrophe. We must proactively prevent this, rather than passively react.
Novel Risks
Our situation is precarious, the world is indeed fragile, as Nick Bostrom speculated[41]. In my work developing AI Biorisk evals I have encountered evidence of this that I find strongly convincing. Confidentiality agreements and infohazard precautions unfortunately limit what I can share. Some risks are present already; others are still hypothetical, backed with only precursors and extrapolations. We cannot afford to wait until risks materialize to deal with them. Like an arctic explorer in a kayak, waiting until the kayak is tipping into the icy sea is too late to decide we should be wearing a drysuit.
Means: New Governance for a New Age
Global externalities are skyrocketing, with so many possibilities for defection by individuals or small groups which lead to utter destruction of civilization. Humanity is at risk of being overwhelmed by runaway self-replicating weapons or self-improving digital entities. Establishing regulation and emergency response organizations to prevent this is critical. These enforcement and response organizations will need to act globally, since these new technological threats can arise anywhere and quickly overwhelm the world. We must act urgently, threats are already at large.
In confronting these potential catastrophes, we must also cultivate existential hope[42]. Our vision should balance caution with determination to succeed, planning for success despite the challenges. We should not fall into the trap of creating negative self-fulfilling prophecies through fear-mongering.
A difficult question we will need to tackle which I admit I do not have a clear plan to recommend is how to handle the governance of powerful AI once it is invented. Who do we trust to keep dangerous agentic AI contained? Who do we trust to lawfully wield tool AI so powerful it confers a decisive strategic advantage over the entire world? In the past, governments have seen success in having checks and balances to split up and limit powers. The more AI allows for concentration of power, the more difficult it makes the goal of keeping that power in check.
Global Coordination
Global coordination is crucial for humanity’s survival in this time of change and risk. The balance of world economic and military power is likely to destabilize. Coordinated action is our only chance at survival, whether it is achieved through diplomacy or force. Here I will lay out some possible directions humanity might go in. Certainly more are possible, including hybrids of these categories. None of these seem optimal to me in terms of their implementability or their preservation of stability of order.
Three example paths:
The Forceful Path: Decisive Strategic Advantage
Recursive self-improvement has the potential for explosive progress. The leader in this may gain such a great technological lead that their way becomes clear to seize global power without fear of reprisals or resistance. This path is fraught with ethical dilemmas and the dangers of concentration of power. Coercive domination by a single actor is not ideal, but is preferable to extinction or catastrophic global conflict. It is hard to foresee whether this option will become available to any of the leading actors, and whether they would choose to seize the opportunity.
The Cutthroat Path: Wary Standoff
A council of nation-states could coordinate without a central government, agreeing to punish defectors. This cleaves closer to our current world order than a single strong world government with a monopoly on force. This council of nation-state peers would need to be wary and poised for instant violence, a ‘Mexican Standoff’ of nations more tense than the Cold War. Perhaps a transition to a more peaceful coordination system would eventually be possible. If the survival of humanity depends on this standoff for long, the odds of conflict seem high. Mexican Standoffs with no retreat are not famous for working out well for the participants.
How much this situation ends up resembling successful cooperation between all nations versus a dangerous tense standoff is hard to predict. It may be possible that treaties and peaceful coordination get us close enough to coordination to manage effective governance. Whether such a looser international governance structure is sufficient will depend a lot on the empirical details of future AI. Some are hopeful that a peaceful power-sharing scheme could work[43], but I suspect that the nature of the ability to unilaterally defect in return for rapid power gains, along with the offense-favoring nature of such pursuits, makes this infeasible. A related historical example, the effort to prevent nuclear weapon proliferation, shows that while international coordination can reduce proliferation of dangerous technology, it doesn’t reliably completely prevent it. If any failure would be existentially risky, a similar international effort to preventing nuclear weapon proliferation is likely insufficient for humanity’s survival.
The Gentle Path: Global Democracy
The world has changed. People talked about how jet travel made the world smaller, and it did. With the rise of remote work, I work with colleagues in a dozen different countries. Where only decades ago collaboration was limited by co-presence, we now have a thriving cosmopolitan global community of scientists and entrepreneurs. Can we come together in coordinated action to steer the course of the world? Is a peaceful path to a democratic world government possible in the timeframe we face? I hope so. The alternatives are grim. Still, a grassroots movement to achieve global unification, establishing a functional democratic world government in under five years, is a high ask.
Humanity’s To-Do List
Humanity’s precarious situation has a number of open problems which need work. We have an unusually urgent need for philosophy and science aimed to answer questions which will shape our governance of new technologies. Which directions we choose to research and materialize now could have big effects on how well our next decade goes[44].
Governance Decisions for Global Coordination
I laid out some of the possible paths humanity might take to uniting for risk prevention. We should consider which paths we think we can act to support, and then take those actions. The default case of maintaining a status quo until some radical changes actually occur in the world may lead to the first catastrophe destroying civilization. If you are reading this, and you are part of a research team working on AI, you should think carefully what you would do if your team discovered a substantial algorithmic advance, or began an accelerating process of recursive self-improvement. Substantial power and weighty decisions might suddenly be thrust upon relatively small groups of researchers. It would be nice if we could prepare some recommendations of wise actions ahead of time for them to refer to. It’s likely they will be under considerable time pressure in their decision making, so precached analysis could be very valuable.
Prepare for Urgent Response
To have a reasonable chance of averting catastrophe, we must prepare ahead of time to respond urgently to emergent dangers from new technologies. The potential for explosively rapid self-replication of AI agents and/or bio/nano weapons means we cannot afford to be purely reactive. The world in its current state would be unable to detect and react swiftly enough to stop such threats. Early detection systems must be established to trigger an alarm in time. Emergency response teams must be trained, equipped, and appropriately stationed at critical areas. We need to actively accelerate work on defensive technologies, while doing what we can to restrict offensive technologies [31, 32]. Reducing our worst civilizational vulnerabilities when facing this tricky transitional time is a valuable course of action.
AI Risk Prevention
If at the time of AGI creation we are still in a world where separate nation states exist, there will need to be unprecedented coordination on this front. While compute governance would offer temporary control, AGI may eventually require far fewer resources[45][46].
Comprehensive mutual inspection treaties for all relevant biology and compute facilities are necessary, despite political challenges. Failure to coordinate risks global conflict or catastrophic AGI incidents.
We don’t currently know how long we would have to act were a runaway RSI process to begin. This should be investigated under highest security in care fully controlled lab tests. It is critical that we know the timeframe in which authorities must respond. The difference between a needed response time of days versus several months implies different enforcement and control mechanisms.
In general, we have a need for AI safety organizations to be carefully examining worst case scenarios of current tech (preferably before release). A sufficiently concerning demonstration of risk could empower governments to take actions previously outside their Overton windows.
Biorisk Prevention
Preventative action can be taken now to defend the world against future bioweapons.
First and foremost, we need to set up early alert systems like airline wastewater monitoring.
Second, we need to prepare quarantine facilities, equipment, and protocols. Robust dedicated global communication lines for emergency coordination once the alarm is triggered. Stockpiles of PPE and emergency food supplies for population centers.
Third, we need to improve air filtration and purification in public areas. Once these critical precautions are in place, we can work on defensive acceleration of anti-biorisk technologies. Establish academic virology journals that require international government clearance in order to access. Fund research into general broad spectrum antivirals, improved PPE, and advanced sterilization[47]. Eliminate existing preventable diseases, like polio and tuberculosis, to reduce availability of samples.
Defining and Measuring Consciousness / Moral Worth
To avoid drastically increasing suffering in the world, we must ensure we don’t unwittingly create AI with moral personhood. We need to know whether a given entity, biological or digital, is conscious and sapient, and how much moral value to place on it. Currently, there are no empirical tests which can help us make this determination. The further we proceed in developing AI without having such tests in place, the higher the risk of falling into this trap.
Governing Self-Modification
The impulse to attempt self-improvement may lead to many different sorts of modifications among both biological and digital people. We need a policy to limit the rate and scope of these changes, lest we fall into a Molochian competition-driven attractor state where we race to the bottom. If our values get gradually narrowed down to survival and competition, we lose out on love and beauty.
I also don’t think it’s right to force anyone into transhumanism. It should be a voluntary choice. It is sufficient for a brave and trustworthy few to opt into the radical transhumanism that will be necessary to keep up with the frontier of intellectual progress of AGI. Meanwhile, we must act to prevent defection by selfish or violent individuals seeking power through self-modification. Covertly studying the extent of what is possible will help us know what risks to watch out for.
Accelerated Wisdom
We may be able to harness the power of AI to advance moral reasoning and coordination. We might find superior bargaining solutions around moral common ground and social contracts[48]. However, any plan to improve one’s values must confront the tricky metaethical problems of deciding on valid processes of improvement[49]. I expect different answers to be accepted by different people, with no single objectively correct answer. Thus, we should anticipate the need for compromises and tolerating a diversity of moral viewpoints.
Other Governance Improvement Needs
There are decisions which lie beyond our immediate survival which will also be of tremendous import. For example, disparities of wealth and power might become even larger. Under such circumstances, the warping effects of wealth concentration on democracy would be thrust well beyond the breaking point. It would be implausible to suggest that people with such divergent power are peers in a democratic society.
Benefits: A Multi-Faceted Future for All
Success at addressing the risks before us, and building a prosperous peaceful future of advanced technology, will take us to a remarkable place. We face a future with an unprecedented diversity of minds, including various enhanced humans, digital beings, AI entities, and potentially even uplifted non-human animals[50].
Since many people may opt out of transhumanist enhancements, this vision of the future would have normal unenhanced humans alongside all these other transhuman and digital beings.
While all sapient beings[51][52] should have autonomy and fair representation, significant intelligence disparities may limit unenhanced humans’ influence. Interstellar travel might be feasible only for digital entities[53]. In a galaxy-spanning civilization, unenhanced humans would thus have limited influence over the broad course of human affairs.
To mitigate risks and preserve our values, advancement should be gradual. I suggest we maintain an ‘intelligence ladder,’ where each level comprehends those immediately above and below, ensuring continuity with our unenhanced human roots.
Harnessing Technology for Good
There remains a tremendous amount of suffering in the world today, despite humanity having made great strides[54]. If we survive, our near future accomplishments will dwarf our past successes. All the material ills we currently face—like malnourishment, disease and natural disasters—will be swept away by the tsunami of technological progress. Everyone will have basic goods like food, medicine, housing, education, communication, access to information. Humanity will be free to expand outward into the galaxy.
Maria do Ros´ario. F´elix Maria. Doroteia Campos. Patrick Materatski Carla Varanda. An Overview of the Application of Viruses to Biotechnology. url: https://doi.org/10.3390/v13102073
“I personally do not think that assigning probabilities to preferable outcomes is very useful. On the contrary, one can argue that the worldviews held by influential people can become self fulfilling prophecies. That is especially applicable to prisoner’s dilemmas. One can either believe the dilemma is inevitable and therefore choose to defect, or instead see the situation itself as the problem, not the other prisoner. That was the point we were trying to make.”—Naci, in response to me saying that I thought that sufficient international cooperation would be quite unlikely.
A path to human autonomy
A Path to Human Autonomy
A future with human empowerment, or even survival, is not a given. I argue that there is a narrow path through the unprecedented existential risks we face. If successful, we need not relinquish the reins of the future. This path requires challenging our assumptions about what it means to be human. We must open our hearts to diversity, more than ever before.
In this essay I attempt to lay out a coherent [2] plan for humanity to address the radical changes ahead. Many of the other plans recently published are incoherent, by which I mean that they neglect key strategic details which make their proposed plans unworkable or assume a particular way that certain future events will resolve. In striving to make this plan coherent, I aim to address what I see as a worst case scenario with above 1% likelihood. Namely, that AI progress and Biotech progress continue accelerating, that there is no sigmoid plateau of these techs. Sometime within the next five years we see a combination of: AI capable at nearly all computer tasks at above average human level, AI becoming competent at multi-step agentic tasks, AI sufficiently capable to initiate a recursive self-improvement process, substantial algorithmic advances which bring down the cost of creating an AI agent, AI capable of controlling robotic actuators to competently manage most biology wetlab tasks and clear evidence of some general AIs having capability to make designs and plans for civilization-destroying-scale bioweapons. I expect that within this timeframe there is a smaller chance of some other dramatic events such as: an AI system being designed and confirmed to have what most experts agree is consciousness and emotional valence, recursive self-improvement finding algorithmic advances such that anyone with this new knowledge will be able to create a recursive-self-improvement capable agent using only a home computer, recursive self-improvement finding algorithmic advances such that the strongest largest frontier models are substantially above average human intelligence and capability (even in currently lacking areas, such as reasoning and spatial understanding). I think all these things will very likely happen in the next 15 years, but hopefully the more extreme ones won’t happen in the next 2-3 years.
[Note: I originally wrote and submitted this essay to the Cosmos essay contest. After it was not selected for an award, I decided to expand and publish it as a LessWrong post. During the period immediately following my submission though, several other relevant and/or similar essays were published. I’ve rewritten this essay to try to address these additional viewpoints.
Relevant reading:
A Worthy Successor – The Purpose of AGI
Eric Drexler. Incoherent AI scenarios are dangerous.
Max Tegmark. The Hopium Wars: the AGI Entente Delusion.
Narrow path
Dario Amodei. Machines of Loving Grace.
Hawkish nationalism vs international AI power and benefit sharing.
Situational awareness.
John Wentworth says:
https://www.thecompendium.ai/
]
Status: A Changing World
On the brink
We are on the cusp of radically transformative change. AI and biotech are advancing rapidly. Many experts predict AI progress will not plateau before AGI [3][4][5][6]. AGI may be quickly followed by artificial super intelligence due to recursive self-improvement [7][8][9]. A novel form of intelligence which rivals ours would be the most impactful invention in the history of humanity. With this massive change comes existential risks[10].
Rapid biotechnology advancements[11] have unlocked the possibility of devastating bioweapons[12]. While currently limited to a few experts, AI and biotech progress are lowering the barriers. Soon many will be able to develop weapons capable of catastrophic harm.
Delay of technological change is helpful if it gives us time to prepare for the coming changes, but isn’t itself a solution. We need to plan on delaying and controlling the intelligence explosion in order to maintain control. We can’t count on our delay lasting for more than a handful of years though. Delay is not an attractor, it is a saddle point from which we are sure to slip eventually.
Halting technological progress is neither easy nor desirable. While a sufficiently powerful AGI could enforce such a halt through universal coercion, we would be sacrificing much of the potential good of our future. To have hope of realizing our glorious future[13], we must reject a permanent halt of technological advancement. Let us instead ride the wave of change; build a glorious future instead of clinging to the vestiges of the past.
The Age of AGI
The first and most impactful transition we face is the creation of AGI. We must aim to make this a safe, controlled event. If open-source AGI became available everywhere at once, it would be an urgent crisis. For example, everyone would have the ability to create devastating bioweapons; it’s naive to imagine no one would seize that opportunity. Misaligned AGI capable of recursive self improvement also directly poses a major threat. Additionally, as AI accelerates all scientific research, new threats like self-replicating nanotech may emerge. We need global governance to prevent these hazards. Safe limited AGI aligned with human values is our best defense, which is why it must be our primary goal.
Forecasting possible trajectories
What rate will AI development proceed at? What shape will the trajectory be?
We can’t be sure, but we can explore some plausible trajectories and ask ourselves what we might do in each case.
Scaling laws are always in respect to a specific algorithm. Given a specific machine learning architecture, training data, hyperparameters, etc., you can then predict what the model would look like if the parameter count and training steps were increased. For the algorithms we’ve tested so far, we can get a good approximation of how strong it is likely to become by training small versions on carefully selected datasets[14].
This very different from describing the computational capacity of existing hardware. A specific GPU can’t do a million-fold more computations suddenly as a result of changing it’s low-level code. We have a strong empirical basis for saying that we understand physically what is going on in this object we created, and that it is running at close to its capacity.
This is simply not the case with deep learning, where I believe analysis of learning rates of animals gives us some reason to believe that we are far from the optimal learning rate. When people argue that they don’t expect major algorithmic advances in the future, they are constrained to make much weaker statements like, “Many scientists have been looking for the past 7 years to find substantial improvements over transformers, but have so far only found relatively incremental improvements to transformers (in the realm of 1000x improvement). Thus, it seems unlikely we will come across a 1e6x improvement in the next 5 years”. The trouble is, extrapolating from past rates of improvement only makes sense if you continue to have a similar amount of researcher hours and compute budget being applied to the search. If AI improves to the point where AI R&D becomes quite effective, then we could get an exponential feedback mechanism where advancements improved the rate of advancement further. In such a world, an algorithmic improvement of 1e6 fold over the same time-span we previously had just a 1e3 fold improvement seems much more plausible. This is a prediction that there is a reasonable chance this could happen, what I’d call a ‘worst likely case’. I think it is reasonable for society to prepare to survive the worst likely case.
Delaying AGI: Necessary but not Sufficient
Let’s examine some of the ways a delay might be implemented, and how long we should expect such delays to last.
Pausing large training runs
Pausing the large training runs of frontier labs for some period of time is an idea that’s been advocated for. I think this is a mistake. I think that the frontier training runs are a symptom of progress in AI, not the key driving factor. I think that we would actually accelerate progress toward AGI by pausing large training runs. I agree with various thinkers[15][16][17] that transformer-based LLMs are not quite the right architecture for AGI. I believe it is possible that scaling existing algorithms could get us there, but I think it would be incredibly inefficient. If the frontier AI labs are restricted from applying their engineers, researchers, and compute to trying to create bigger LLMs, where would that talent instead focus? On research. Thus, speeding the search for better algorithms. As soon as the pause is ended, the next large training run may be using superior algorithms that result in a model thousands or millions of times more capable than current models.
Therefore, I claim that if you wanted to slow progress towards AGI, it wouldn’t be enough to restrict the frontier labs from running large training runs. You’d also need to divert their researchers and compute to non-research tasks. That’s a much more complicated and difficult to enforce proposition.
Banning Automated AI R&D worldwide
We seem quite close to the point where current AI techniques, such as scaffolded LLMs, will become able to automate a substantial portion of AI research. Estimates of the current speedup from coding assistants are more in the range of 5-20%, and gradually accelerating. If we have a step change to speedups of over 100% (e.g. after the next generation of LLMs are deployed) this could result in a feedback loop of explosive progress. Furthermore, we should expect such progress to be at least somewhat decentralized. There is a chance that individual researchers stumble across substantial algorithmic improvements and are able to shoot ahead. This scenario is quite a governance challenge, since it wouldn’t be enough to be monitoring and controlling the top twenty or so labs. This specific case of trying to ban AI-powered AI R&D is focused on in the Narrow Path essay.
The danger present in this scenario is one reason that it is tempting to stop the large frontier training runs that seem likely to produce LLM coding assistants capable of such speed-ups. This runs into the problem discussed above though.
Banning all AI research worldwide
Research doesn’t require large blocks of compute, unlike large training runs. If you want to ban all AI research, you need to ban access to unmonitored personal computers anywhere in the world. That sort of draconian measure seems infeasible.
If one wanted to have a world which contained only some specific safe form of AI deployed, it would be necessary to prevent the deployment of unsafe AI. If the only AIs capable enough to be dangerous are produced by large training runs, this is perhaps plausible. But as I argued above, I don’t expect that will remain the case for long.
Government research project
I believe the best option for delaying and controlling the deployment of AGI is to nationalize the frontier AI labs, and require that all the researchers work on a government project. This approach has several benefits.
First, the experience of government projects is that they are often heavily laden with bureaucratic processes and oversight which naturally lead to slow-downs.
Second, it would be possible to maintain a high degree of security and control, ensuring that algorithmic secrets were less likely to escape.
Third, the government would not allow public release of the models being researched, preventing the coding-assistant-based acceleration discussed above.
Fourth, having a government project to produce AGI would likely still achieve AGI before the open-source community did. This is a good outcome if the resulting model is carefully contained and studied. Such empirical observation of a highly capable general model could give clear evidence of the danger. With such evidence in hand, the government may take yet further actions to control and delay AI progress worldwide.
Fifth, the government AI research project may also produce unprecedentedly powerful narrow tool-AI which can be safely utilized to enable previously intractable surveillance and enforcement of all other research into AI and/or self-replicating weapons. Although there are many dangers in centralizing power in the hands of any one government or politician, I believe the strategic scenario we face has no better alternatives available.
While all this is going on, the world will continue doing research, and coding assistants will continue to get better. Even an action as drastic as nationalization of the top labs and constraint of top researchers would not prevent progress for long. It could buy us a couple of years, maybe even three.
On the other hand, I worry about having any government in charge of an AI so powerful it grants decisive strategic advantage. It’s not enough to ask whether the US Federal government is an adequate government currently. We must ask how it might look after the destabilizing effect of powerful AI is introduced. Who has ultimate control over this AI? The President? So much for checks and balances. At that point we are suddenly only still a democracy if the President wills it so. I would prefer not to put anyone in a position of such power over the world.
There has not been much discussion that I’ve seen for how to keep a powerful AI directly operated by a small technical staff under the control of a democratic government and also keep that government a democracy.
Our democracy is problematically unstable and violently imperial as it is. I do not put any credence in things not devolving upon the advent of AGI.
Sometimes I jokingly suggest we give the reins of power over the AI to Switzerland, since they have the stereotype of being militarily neutral and having well-organized public goods. I don’t actually have the reins though, and see no way to get them into the Swiss government’s hands. Also, I wouldn’t want Swiss government officials to have such power either, since I’d still worry about the corrupting effects of the power.
I think we need new governance structures to handle this new strategic situation.
Cautious Pursuit
If humanity doesn’t want to cede autonomy to AGI we must grow to keep up, while keeping AI progress controlled. Some suggest we merge with the AI. To merge implies a compromise. I say, “Don’t merge, don’t surrender, don’t compromise our values.” Let us become transhuman digital beings with our human values fully intact. Creating fully human digital people is not the compromise implied by an act of merging.
The alternatives to ‘grow to keep up’ are ‘become powerless wards of a mighty AI’ or ‘enforced technological stagnation’.
I propose two parallel paths for AI development:
Tool AI
Mandatory in the short term, to maintain control. Insufficient in the long term, as the rising tide of technology makes powerful digital agents easier and easier to create. For this phase, we carefully limit AI to remain a purely obedient, corrigible tool[18][19]. Related ideas involve creating an ecosystem of narrow tool-AI with clear risk assessments and safe operating parameters[20][21][22]. Use general agents only up to a safe level of power, and only under strict controls to prevent escape or sabotage[23].
Peers/Descendants/Digital People
This is less urgent for our immediate survival, but will become critical in the longer term. The only way to handle powerfully self-improving intelligence is to be that intelligence. Planning to not surrender control, and acknowledging the difficulty and undesirability of indefinitely halting global technological progress, leaves one path forward.
We must carefully build conscious digital entities sharing our values and empathy[24][25]. This is an ethically and technically challenging path. It would require thorough preparation and circumspection to avoid tragic or dangerous outcomes[26][27]. In the long term, I expect that full digital people will be necessary because only a digital being allows for the maximal extent of expansion, modification, and copying. However, in the short term we should not expect to create and get use from such beings. They should be studied carefully and ethically in controlled lab settings, but not deployed for practical purposes. Such beings seem more likely to be dangerously inclined towards Omohundro Drives, and also forcing them to work for us would be slavery.
Some think building digital people is impossible. I say that dismissing AI consciousness based on philosophical arguments alone is misguided[28][29]. Empirical comparisons of brain and AI information processing reveal substantial similarities[30][31][32], and the remaining differences are technologically tractable[33]. This suggests AI consciousness will be achievable; work is already underway[34].
Why not stop at tool AI? Why do we need digital people?
Some have argued that we should deliberately stop at tool AI, and limit the uses of such to safe deployments. This presumes that it will be possible to halt software and hardware progress globally for many decades. I don’t think the offense-defense balance makes this easy for governments to do. The risk of some group or state-actor defecting from the ban, and gaining tremendous advantage thereby, seems large. Blocking this seems intractable. As technology in general advances, the barriers to entry will continue to get lower. As new generations of scientists grow up with the previous generation’s research to build upon, advancements will be made even if large research projects are blocked. Proprietary knowledge will eventually leak from the people holding it.
How is the situation different if there are digital people living as part of society?
Digital people offer vastly more opportunity for regulating AI. They have many of the same advantages that AI has over biological humans. Rapid replication, running at superhuman speeds, restoring from backups, mind-merging, and, perhaps most importantly, recursive self-improvement. They can keep experimenting on themselves and getting smarter. Any rogue AI arising would need to not just get an edge on the relatively static competence of biological humans, but would need to play catch-up to the existing digital people who had a head-start on self-improvement. This does mean that we need to delay and control AI until we do have digital people who have gotten a good head-start. We need to avoid putting so much optimization pressure on them that it compromises their ability to maintain value-stability. We also lose if the digital people under so much pressure that they optimize away their humanity, and become the very monsters they were trying to defend against.
The Dawn of Transhumanism
The second transition we must grapple with is transhumanism. To keep pace with AI will require dramatic change to what it means to be human. The next 20 years will likely involve greater changes to the human brain than across all of primate evolution. At the same time that we are carefully working to create digital people in controlled labs, we can expect that progress in brain-computer-interfaces (BCIs) and genetic editing will make accelerated progress due to tool AI. If successful, such projects could result in radical increases to human intelligence.
Additionally, brain-computer-interfaces may allow for more extensive brain recordings, accelerating neuroscience research (and brain-inspired AI) and possibly allowing for low-fidelity approximation emulations of the recorded individuals. Finally, brain uploading may succeed in creating high-fidelity emulations of individual humans, allowing for the instantiation of a digital person that closely matches the behavioral traits of the scanned human. A fully digital person offers many opportunities and risks.
Brain Uploading
I have spoken with people working on the forefront of brain scanning[35]. I predict we will have the first complete synapse-level human brain scan by the mid 2030s[36]. This is a massive undertaking, in which AI will play key roles. After the first upload it may be only a couple of years until the scan is made into a realtime human emulation. Many of the bottlenecks we currently face to this may be relaxed with the help of AI-assisted research. What previously seemed decades away may instead happen in just a few years.
Value Loss: Pitfalls of Self-Modification
A human isn’t an agent with eternally stable objective values, but a series of agents each slightly different from the previous. Our change is bounded by our genetics interacting with life experiences. The neurons you’re born with make up most of your brain for life, limiting intellectual change and growth.
The low-fidelity or high-fidelity emulations of human brains would be completely unbound by such physical restrictions. Without careful governance, such entities could rapidly copy and self-modify.
New technologies like gene editing, brain-computer-interfaces, and stem-cell implants can remove some of these biological limitations even from biological human brains.
History shows that if self modification offers competitive advantages, some will pursue it despite risks and trade-offs[37]. Competitive pressures push towards optimization for capability, potentially altering intrinsic values[38][39]. We must plan for a future where some individuals make such choices, modifying their own brains despite the risk. In this future, a single individual could become incredibly powerful and dangerous, meaning we must reckon with the unilateralist’s curse[40]. Without restrictions, these dynamics may lead to highly effective and competitive self-modifying agents bearing little trace of their original humanity. Like rogue AGI, such entities could conflict with humanity at a substantial advantage, quickly becoming an unstoppable catastrophe. We must proactively prevent this, rather than passively react.
Novel Risks
Our situation is precarious, the world is indeed fragile, as Nick Bostrom speculated[41]. In my work developing AI Biorisk evals I have encountered evidence of this that I find strongly convincing. Confidentiality agreements and infohazard precautions unfortunately limit what I can share. Some risks are present already; others are still hypothetical, backed with only precursors and extrapolations. We cannot afford to wait until risks materialize to deal with them. Like an arctic explorer in a kayak, waiting until the kayak is tipping into the icy sea is too late to decide we should be wearing a drysuit.
Means: New Governance for a New Age
Global externalities are skyrocketing, with so many possibilities for defection by individuals or small groups which lead to utter destruction of civilization. Humanity is at risk of being overwhelmed by runaway self-replicating weapons or self-improving digital entities. Establishing regulation and emergency response organizations to prevent this is critical. These enforcement and response organizations will need to act globally, since these new technological threats can arise anywhere and quickly overwhelm the world. We must act urgently, threats are already at large.
In confronting these potential catastrophes, we must also cultivate existential hope[42]. Our vision should balance caution with determination to succeed, planning for success despite the challenges. We should not fall into the trap of creating negative self-fulfilling prophecies through fear-mongering.
A difficult question we will need to tackle which I admit I do not have a clear plan to recommend is how to handle the governance of powerful AI once it is invented. Who do we trust to keep dangerous agentic AI contained? Who do we trust to lawfully wield tool AI so powerful it confers a decisive strategic advantage over the entire world? In the past, governments have seen success in having checks and balances to split up and limit powers. The more AI allows for concentration of power, the more difficult it makes the goal of keeping that power in check.
Global Coordination
Global coordination is crucial for humanity’s survival in this time of change and risk. The balance of world economic and military power is likely to destabilize. Coordinated action is our only chance at survival, whether it is achieved through diplomacy or force. Here I will lay out some possible directions humanity might go in. Certainly more are possible, including hybrids of these categories. None of these seem optimal to me in terms of their implementability or their preservation of stability of order.
Three example paths:
The Forceful Path: Decisive Strategic Advantage
Recursive self-improvement has the potential for explosive progress. The leader in this may gain such a great technological lead that their way becomes clear to seize global power without fear of reprisals or resistance. This path is fraught with ethical dilemmas and the dangers of concentration of power. Coercive domination by a single actor is not ideal, but is preferable to extinction or catastrophic global conflict. It is hard to foresee whether this option will become available to any of the leading actors, and whether they would choose to seize the opportunity.
The Cutthroat Path: Wary Standoff
A council of nation-states could coordinate without a central government, agreeing to punish defectors. This cleaves closer to our current world order than a single strong world government with a monopoly on force. This council of nation-state peers would need to be wary and poised for instant violence, a ‘Mexican Standoff’ of nations more tense than the Cold War. Perhaps a transition to a more peaceful coordination system would eventually be possible. If the survival of humanity depends on this standoff for long, the odds of conflict seem high. Mexican Standoffs with no retreat are not famous for working out well for the participants.
How much this situation ends up resembling successful cooperation between all nations versus a dangerous tense standoff is hard to predict. It may be possible that treaties and peaceful coordination get us close enough to coordination to manage effective governance. Whether such a looser international governance structure is sufficient will depend a lot on the empirical details of future AI. Some are hopeful that a peaceful power-sharing scheme could work[43], but I suspect that the nature of the ability to unilaterally defect in return for rapid power gains, along with the offense-favoring nature of such pursuits, makes this infeasible. A related historical example, the effort to prevent nuclear weapon proliferation, shows that while international coordination can reduce proliferation of dangerous technology, it doesn’t reliably completely prevent it. If any failure would be existentially risky, a similar international effort to preventing nuclear weapon proliferation is likely insufficient for humanity’s survival.
The Gentle Path: Global Democracy
The world has changed. People talked about how jet travel made the world smaller, and it did. With the rise of remote work, I work with colleagues in a dozen different countries. Where only decades ago collaboration was limited by co-presence, we now have a thriving cosmopolitan global community of scientists and entrepreneurs. Can we come together in coordinated action to steer the course of the world? Is a peaceful path to a democratic world government possible in the timeframe we face? I hope so. The alternatives are grim. Still, a grassroots movement to achieve global unification, establishing a functional democratic world government in under five years, is a high ask.
Humanity’s To-Do List
Humanity’s precarious situation has a number of open problems which need work. We have an unusually urgent need for philosophy and science aimed to answer questions which will shape our governance of new technologies. Which directions we choose to research and materialize now could have big effects on how well our next decade goes[44].
Governance Decisions for Global Coordination
I laid out some of the possible paths humanity might take to uniting for risk prevention. We should consider which paths we think we can act to support, and then take those actions. The default case of maintaining a status quo until some radical changes actually occur in the world may lead to the first catastrophe destroying civilization. If you are reading this, and you are part of a research team working on AI, you should think carefully what you would do if your team discovered a substantial algorithmic advance, or began an accelerating process of recursive self-improvement. Substantial power and weighty decisions might suddenly be thrust upon relatively small groups of researchers. It would be nice if we could prepare some recommendations of wise actions ahead of time for them to refer to. It’s likely they will be under considerable time pressure in their decision making, so precached analysis could be very valuable.
Prepare for Urgent Response
To have a reasonable chance of averting catastrophe, we must prepare ahead of time to respond urgently to emergent dangers from new technologies. The potential for explosively rapid self-replication of AI agents and/or bio/nano weapons means we cannot afford to be purely reactive. The world in its current state would be unable to detect and react swiftly enough to stop such threats. Early detection systems must be established to trigger an alarm in time. Emergency response teams must be trained, equipped, and appropriately stationed at critical areas. We need to actively accelerate work on defensive technologies, while doing what we can to restrict offensive technologies [31, 32]. Reducing our worst civilizational vulnerabilities when facing this tricky transitional time is a valuable course of action.
AI Risk Prevention
If at the time of AGI creation we are still in a world where separate nation states exist, there will need to be unprecedented coordination on this front. While compute governance would offer temporary control, AGI may eventually require far fewer resources[45][46].
Comprehensive mutual inspection treaties for all relevant biology and compute facilities are necessary, despite political challenges. Failure to coordinate risks global conflict or catastrophic AGI incidents.
We don’t currently know how long we would have to act were a runaway RSI process to begin. This should be investigated under highest security in care fully controlled lab tests. It is critical that we know the timeframe in which authorities must respond. The difference between a needed response time of days versus several months implies different enforcement and control mechanisms.
In general, we have a need for AI safety organizations to be carefully examining worst case scenarios of current tech (preferably before release). A sufficiently concerning demonstration of risk could empower governments to take actions previously outside their Overton windows.
Biorisk Prevention
Preventative action can be taken now to defend the world against future bioweapons.
First and foremost, we need to set up early alert systems like airline wastewater monitoring.
Second, we need to prepare quarantine facilities, equipment, and protocols. Robust dedicated global communication lines for emergency coordination once the alarm is triggered. Stockpiles of PPE and emergency food supplies for population centers.
Third, we need to improve air filtration and purification in public areas. Once these critical precautions are in place, we can work on defensive acceleration of anti-biorisk technologies. Establish academic virology journals that require international government clearance in order to access. Fund research into general broad spectrum antivirals, improved PPE, and advanced sterilization[47]. Eliminate existing preventable diseases, like polio and tuberculosis, to reduce availability of samples.
Defining and Measuring Consciousness / Moral Worth
To avoid drastically increasing suffering in the world, we must ensure we don’t unwittingly create AI with moral personhood. We need to know whether a given entity, biological or digital, is conscious and sapient, and how much moral value to place on it. Currently, there are no empirical tests which can help us make this determination. The further we proceed in developing AI without having such tests in place, the higher the risk of falling into this trap.
Governing Self-Modification
The impulse to attempt self-improvement may lead to many different sorts of modifications among both biological and digital people. We need a policy to limit the rate and scope of these changes, lest we fall into a Molochian competition-driven attractor state where we race to the bottom. If our values get gradually narrowed down to survival and competition, we lose out on love and beauty.
I also don’t think it’s right to force anyone into transhumanism. It should be a voluntary choice. It is sufficient for a brave and trustworthy few to opt into the radical transhumanism that will be necessary to keep up with the frontier of intellectual progress of AGI. Meanwhile, we must act to prevent defection by selfish or violent individuals seeking power through self-modification. Covertly studying the extent of what is possible will help us know what risks to watch out for.
Accelerated Wisdom
We may be able to harness the power of AI to advance moral reasoning and coordination. We might find superior bargaining solutions around moral common ground and social contracts[48]. However, any plan to improve one’s values must confront the tricky metaethical problems of deciding on valid processes of improvement[49]. I expect different answers to be accepted by different people, with no single objectively correct answer. Thus, we should anticipate the need for compromises and tolerating a diversity of moral viewpoints.
Other Governance Improvement Needs
There are decisions which lie beyond our immediate survival which will also be of tremendous import. For example, disparities of wealth and power might become even larger. Under such circumstances, the warping effects of wealth concentration on democracy would be thrust well beyond the breaking point. It would be implausible to suggest that people with such divergent power are peers in a democratic society.
Benefits: A Multi-Faceted Future for All
Success at addressing the risks before us, and building a prosperous peaceful future of advanced technology, will take us to a remarkable place. We face a future with an unprecedented diversity of minds, including various enhanced humans, digital beings, AI entities, and potentially even uplifted non-human animals[50].
Since many people may opt out of transhumanist enhancements, this vision of the future would have normal unenhanced humans alongside all these other transhuman and digital beings.
While all sapient beings[51][52] should have autonomy and fair representation, significant intelligence disparities may limit unenhanced humans’ influence. Interstellar travel might be feasible only for digital entities[53]. In a galaxy-spanning civilization, unenhanced humans would thus have limited influence over the broad course of human affairs.
To mitigate risks and preserve our values, advancement should be gradual. I suggest we maintain an ‘intelligence ladder,’ where each level comprehends those immediately above and below, ensuring continuity with our unenhanced human roots.
Harnessing Technology for Good
There remains a tremendous amount of suffering in the world today, despite humanity having made great strides[54]. If we survive, our near future accomplishments will dwarf our past successes. All the material ills we currently face—like malnourishment, disease and natural disasters—will be swept away by the tsunami of technological progress. Everyone will have basic goods like food, medicine, housing, education, communication, access to information. Humanity will be free to expand outward into the galaxy.
References
Michael Levin. Interview on Machine Learning Street Talk. https://www.youtube.com/watch?v=6w5xr8BYV8M
Eric Drexler. https://aiprospects.substack.com/p/incoherent-ai-scenarios-are-dangerous
Dario Amodei. Interview. url: https://www.youtube.com/watch?v=xm6jNMSFT7g
Machine Learning Street Talk. This is what happens when you let AIs debate. url: https://www.youtube.com/watch?v=WlWAhjPfROU
Leopold Aschenbrenner. Situational Awareness. url: https://situational-awareness.ai/
Dwarkesh Patel. Sholto Douglas I& Trenton Bricken—How to Build I& Understand GPT-7’s Mind. url: https://www.youtube.com/watch?v=UTuuTTnjxMQ
Max Harms. Will AI be Recursively Self Improving by mid 2026? url: https://manifold.markets/MaxHarms/will-ai-be-recursively-self-improvi?play=true
Tom Davidson. What a Compute-Centric Framework Says About Takeoff Speeds. url: https://www.openphilanthropy.org/research/what-a-compute-centric-framework-says-about-takeoff-speeds/
Carl Shulman. Carl Shulman on the economy and national security after AGI. url: https://80000hours.org/podcast/episodes/carl-shulman-economy-agi/
Center for AI Safety. Statement on AI Risk. url: https://www.safe.ai/work/statement-on-ai-risk
Maria do Ros´ario. F´elix Maria. Doroteia Campos. Patrick Materatski Carla Varanda. An Overview of the Application of Viruses to Biotechnology. url: https://doi.org/10.3390/v13102073
Kevin M. Esvelt. Delay, Detect, Defend: Preparing for a Future in which Thousands Can Release New Pandemics. url: https://www.gcsp.ch/publications/delay-detect-defend-preparing-future-which-thousands-can-release-new-pandemics
Holden Karnofsky. All Possible Views About Humanity’s Future Are Wild. url: https://www.cold-takes.com/all-possible-views-about-humanitys-future-are-wild/
Michael Poli, et al. Mechanistic Design and Scaling of Hybrid Architectures. url: https://arxiv.org/abs/2403.17844
François Chollet. Keynote talk at AGI-24. url: https://www.youtube.com/watch?v=s7_NlkBwdj8&t=2121s
Steven Byrnes. “Artificial General Intelligence”: an extremely brief FAQ. url: https://www.lesswrong.com/posts/uxzDLD4WsiyrBjnPw/artificial-general-intelligence-an-extremely-brief-faq
Jürgen Schmidhuber. Interview on Machine Learning Street Talk. url: https://www.youtube.com/watch?v=DP454c1K_vQ
Max Harms. CAST: Corrigibility As Singular Target. url: https://www.lesswrong.com/s/KfCjeconYRdFbMxsy
Seth Herd. Do What I Mean And Check. url: https://www.lesswrong.com/posts/7NvKrqoQgJkZJmcuD/instruction-following-agi-is-easier-and-more-likely-than
Eric Drexler. Reframing Superintelligence. url: https://www.fhi.ox.ac.uk/reframing/
Max Tegmark and Steve Omohundro. Provably safe systems: the only path to controllable AGI. url: https://arxiv.org/abs/2309.01933
David “davidad” Dalrymple. Safeguarded AI: constructing guaranteed safety. url: https://www.aria.org.uk/programme-safeguarded-ai/
Ryan Greenblatt, Buck Shlegeris. The case for ensuring that powerful AIs are controlled. url: https://www.lesswrong.com/s/PC3yJgdKvk8kzqZyA/p/kcKrE9mzEHrdqtDpE
Hiroshi Yamakawa. Sustainability of Digital Life Form Societies. url: https://www.lesswrong.com/posts/2u4Dja2m6ud4m7Bb7/sustainability-of-digital-life-form-societies
Dan Faggella. A Worthy Successor – The Purpose of AGI. url: https://danfaggella.com/worthy/
Nathan Helm-Burger. Avoiding the Bog of Moral Hazard for AI. url: https://www.lesswrong.com/posts/pieSxdmjqrKwqa2tR/avoiding-the-bog-of-moral-hazard-for-ai
AEStudio, Cameron Berg, Judd Rosenblatt. Not understanding sentience is a significant x-risk. url: https://forum.effectivealtruism.org/posts/ddDdbEAJd4duWdgiJ/not-understanding-sentience-is-a-significant-x-risk
Example of the sort of non-evidence-based dismissal of the feasibility of AI consciousness I mean:
Bernhardt Trout, Brendan McCord. Will AI Enhance Human Freedom and Happiness? A Debate. url: https://cosmosinstitute.substack.com/p/will-ai-enhance-human-freedom-and
Cameron Berg, Judd Rosenblatt, phgubbins, Diogo de Lucena, AE Studio. We need more AI consciousness research (and further resources). url: https://www.lesswrong.com/posts/ZcJDL4nCruPjLMgxm/ae-studio-sxsw-we-need-more-ai-consciousness-research-and
Trenton Bricken. Attention Approximates Sparse Distributed Memory. url: https://www.youtube.com/watch?v=THIIk7LR9_8
Michael Hassid. Nir Yarden. Yossi Adi. Roy Schwartz Matanel Oren. Transformers are Multi-State RNNs. url: https://arxiv.org/abs/2401.06104
Ilya Kuzovkin. Curious Similarities Between AI Architectures and the Brain. url: https://www.neurotechlab.ai/curious-similarities-between-ai-architectures-and-the-brain/
Stephen Ornes. How Transformers Seem to Mimic Parts of the Brain. url: https://www.quantamagazine.org/how-ai-transformers-mimic-parts-of-the-brain-20220912/
Randall O’Reilly, Astera. Charting a path towards thinking machines. url: https://astera.org/agi-program/
e11 BIO. Precision brain circuit mapping for transformative neuroscience. url: https://e11.bio/
Nathan Helm-Burger. Full digitization (not necessarily emulation) of a human brain by 2035. url: https://manifold.markets/NathanHelmBurger/full-digitization-not-necessarily-e?play=true
Mike Varshavski Mike Israetel. The Dark Side Of Steroids and The Problem With Deadlifts. url: https://www.youtube.com/watch?v=UrzFrhJtOs
Robin Hanson. Cultural Drift Of Digital Minds. url: https://www.overcomingbias.com/p/cultural-drift-of-digital-minds
Scott Alexander. Schelling fences on slippery slopes. url: https://www.lesswrong.com/posts/Kbm6QnJv9dgWsPHQP/schelling-fences-on-slippery-slopes
Anders Sandberg Nick Bostrom Thomas Douglas. The Unilateralist’s Curse and the Case for a Principle of Conformity. url: https://doi.org/10.1080%2F02691728.2015.1108373
Nick Bostrom. The Vulnerable World Hypothesis. url: https://doi.org/10.1111/1758-5899.12718
Foresight Institute. Existential Hope. url: https://www.existentialhope.com/
Naci Cankaya, Jakub Krys. Hawkish nationalism vs international AI power and benefit sharing. url: https://www.lesswrong.com/posts/hhcS3dYZwxGqYCGbx/linkpost-hawkish-nationalism-vs-international-ai-power-and?commentId=Bob8auPiSKK7igLNn
“I personally do not think that assigning probabilities to preferable outcomes is very useful. On the contrary, one can argue that the worldviews held by influential people can become self fulfilling prophecies. That is especially applicable to prisoner’s dilemmas. One can either believe the dilemma is inevitable and therefore choose to defect, or instead see the situation itself as the problem, not the other prisoner. That was the point we were trying to make.”—Naci, in response to me saying that I thought that sufficient international cooperation would be quite unlikely.
Vitalik Buterin, Rob Wiblin. Vitalik Buterin on defensive acceleration and how to regulate AI when you fear government. url: https://80000hours.org/podcast/episodes/vitalik-buterin-techno-optimism/
Vitalik Buterin. My techno-optimism. url: https://vitalik.eth.limo/general/2023/11/27/techno_optimism.html
Joe Carlsmith. How Much Computational Power Does It Take to Match the Human Brain? url: https://www.openphilanthropy.org/research/how-much-computational-power-does-it-take-to-match-the-human-brain/
Nathan Helm-Burger. Contra Roger Penrose on estimates of brain compute. url: https://www.lesswrong.com/posts/uPi2YppTEnzKG3nXD/nathan-helm-burger-s-shortform?commentId=qCSJ2nPsNXC2PFvBW
Example: https://www.convergentresearch.org/blog/far-uvc-roadmap
Jobst Heitzig. Announcing vodle, a web app for consensus-aiming collective decisions. url: https://forum.effectivealtruism.org/posts/tfjLzxMZYhLD9Qx2M/announcing-vodle-a-web-app-for-consensus-aiming-collective
Joe Carlsmith. On the limits of idealized values. url: https://joecarlsmith.com/2021/06/21/on-the-limits-of-idealized-values
Wikipedia. Uplift (science fiction). url: https://en.wikipedia.org/wiki/Uplift_(science_fiction)
Nate Soares. Sentience Matters. url: https://www.lesswrong.com/posts/Htu55gzoiYHS6TREB/sentience-matters
Nayeli Ellen. The Difference in Sentience vs Sapience. url: https://academichelp.net/humanities/philosophy/sentience-vs-sapience.html
Samuel Spector Erik Cohen. Transhumanism and cosmic travel. url: https://doi.org/10.1080/02508281.2019.1679984
Max Roser. The short history of global living conditions and why it matters that we know it. url: https://ourworldindata.org/a-history-of-global-living-conditions