I disagree, from my experience of engaging with the public debate, doubt is mostly about AI capability, not about misalignment. Most people easily believe AI to be misaligned to them, but they have trouble believing it will be powerful enough to take over the world any time soon. I don’t think alignment research will do that much here.
I would say that the power of AI will continue to visibly massively expand (although underestimation of further developments will continue to be a big problem), but that will increase both ‘fear AI disaster’ and ‘get AI first’ elements. My read is that that the former is in a very difficult position now when its policy recommendations conflict with the latter. I see this in the Congressional hearings and rejection of the pause letter.
Even if experts would agree that increasing the power of the aligned AI is good and necessary, and that expansion in space would be required for that, I think it will take a long time to convince the general public and/or decision makers, if it’s at all possible. And in any remotely democratic alignment plan, that’s a necessary step.
When that kind of AI is available, it would mean by the same token that such expansion could break down MAD in short order as such explosive growth could give the power to safely disarm international rivals if not matched or stopped. And AI systems and developers will be able to demonstrate this. So the options would be verifying/trusting deals with geopolitical and ideological rivals to hold back or doing fast AI/industrial expansion. If dealmaking fails, then all options would look scary and abrupt.
I think the assumption that safe, aligned AI can’t defend against a later introduction of misaligned AI is false, or rather depends on the assumption of profound alignment failures so that the ‘aligned AI’ really isn’t. AI that is aligned enough to do AI research and operate industry and security forces can expand its capabilities to the technological frontier and grow an industrial base claiming unclaimed resources in space. Then any later AI introduced faces an insurmountable balance of capabilities just from the gap in resources, even if it catches up technologically. That would not violate the sovereignty of any state, although it could be seen as a violation of the Outer Space Treaty if not backed by the international community with treaty revision.Advanced AI-enabled tech and industry can block bioweapons completely through physical barriers, detection, and sterilization. Vast wealth can find with high probability any zero-days that could be discovered with tiny wealth, and produce ultra-secure systems, so cyberattacks do not produce a vulnerable world. Even nuclear weapons lose their MAD element in the face of millions of drones/interceptors/defenses for each attempted attack (and humans can move to a distance in space, back up their minds, etc).If it turns out there is something like the ability to create a vacuum collapse that enables one small actor to destroy a much larger AI-empowered civilization, then the vast civilization will find out first, and could safely enforce a ban if a negotiated AI-enforced treaty could not be struck.If I understand correctly memes about pivotal acts to stop anyone from making misaligned AI stem from the view that we won’t be able to make AI that could be trusted to undergo intelligence explosion and industrial expansion for a long time after AI could enable some other ‘pivotal act.’ I.e. the necessity for enforcing a ban even after AGI development is essentially entirely about failures of technical alignment.Furthermore, the biggest barrier to extreme regulatory measures like a ban is doubt (both reasonable and unreasonable) about the magnitude of misalignment risk, so research that studies and demonstrates high risk (if it is present) is perhaps the most leveraged possible tool to change the regulatory/governmental situation.
No. Short version is that the prior for the combination of technologies and motives for aliens (and worse for magic, etc) is very low, and the evidence distribution is familiar from deep dives in multiple bogus fields (including parapsychology, imaginary social science phenomena, and others), with understandable data-generating processes so not much likelihood ratio.
We’ve agreed to make a 25:1 bet on this. John will put the hash of the bet amount/terms below.
As we’ve discussed and in short, I think aligned AI permits dialing up many of the processes that make science or prediction markets imperfectly self-correcting: tremendously cheaper, in parallel, on the full panoply of questions (including philosophy and the social sciences), with robust consistency, cross-examination, test sets, and forecasting. These sorts of things are an important part of scalable supervision for alignment, but if they can be made to work I expect them to drive strong epistemic convergence.
The thing was already an obscene 7 hours with a focus on intelligence explosion and mechanics of AI takeover (which are under-discussed in the discourse and easy to improve on, so I wanted to get concrete details out). More detail on alignment plans and human-AI joint societies are planned focus areas for the next times I do podcasts.
I’m interested in my $250k against your $10k.
I assign that outcome low probability (and consider that disagreement to be off-topic here).
Thank you for the clarification. In that case my objections are on the object-level.
This post is an answer to the question of why an AI that was truly indifferent to humanity (and sentient life more generally), would destroy all Earth-originated sentient life.
This does exclude random small terminal valuations of things involving humans, but leaves out the instrumental value for trade and science, uncertainty about how other powerful beings might respond. I know you did an earlier post with your claims about trade for some human survival, but as Paul says above it’s a huge point for such small shares of resources. Given that kind of claim much of Paul’s comment still seems very on-topic (e.g. hsi bullet point .
Insofar as you’re arguing that I shouldn’t say “and then humanity will die” when I mean something more like “and then humanity will be confined to the solar system, and shackled forever to a low tech level”, I agree, and
Yes, close to this (although more like ‘gets a small resource share’ than necessarily confinement to the solar system or low tech level, both of which can also be avoided at low cost). I think it’s not off-topic given all the claims made in the post and the questions it purports to respond to. E.g. sections of the post purport to respond to someone arguing from how cheap it would be to leave us alive (implicitly allowing very weak instrumental reasons to come into play, such as trade), or making general appeals to ‘there could be a reason.’
Separate small point:
And disassembling us for spare parts sounds much easier than building pervasive monitoring that can successfully detect and shut down human attempts to build a competing superintelligence, even as the humans attempt to subvert those monitoring mechanisms. Why leave clever antagonists at your rear?
The costs to sustain multiple superintelligent AI police per human (which can double in supporting roles for a human habitat/retirement home and controlling the local technical infrastructure) is not large relative to the metabolic costs of the humans, let alone a trillionth of the resources. It just means some replications of the same impregnable AI+robotic capabilities ubiquitous elsewhere in the AI society.
Most people care a lot more about whether they and their loved ones (and their society/humanity) will in fact be killed than whether they will control the cosmic endowment. Eliezer has been going on podcasts saying that with near-certainty we will not see really superintelligent AGI because we will all be killed, and many people interpret your statements as saying that. And Paul’s arguments do cut to the core of a lot of the appeals to humans keeping around other animals.If it is false that we will almost certainly be killed (which I think is right, I agree with Paul’s comment approximately in full), and one believes that, then saying we will almost certainly be killed would be deceptive rhetoric that could scare people who care less about the cosmic endowment into worrying more about AI risk. Since you’re saying you care much more about the cosmic endowment, and in practice this talk is shaped to have the effect of persuading people to do the thing you would prefer it’s quite important whether you believe the claim for good epistemic reasons. That is important to disclaiming the hypothesis that this is something being misleadingly presented or drifted into because of its rhetorical convenience without vetting it (where you would vet it if it were rhetorically inconvenient).I think being right on this is important for the same sorts of reasons climate activists should not falsely say that failing to meet the latest emissions target on time will soon thereafter kill 100% of humans.
A world of pure Newtonian mechanics wouldn’t actually support apples and grass as we know them existing, I think. They depend on matter capable of supporting organic chemistry, nuclear reactions, the speed of light, ordered causality, etc. Working out that sort of thing in simulation to get an Occam prior over coherent laws of physics producing life does seem to be plenty to favor QM+GR over Newtonian mechanics as physical laws.I agree the possibility or probability of an AI finding itself in simulations without such direct access to ‘basement level’ physical reality limits the conclusions that could be drawn, although conclusions ‘conditional on this being direct access’ may be what’s in mind in the original post.
In general human cognitive enhancement could help AGI alignment if it were at scale before AGI, but the cognitive enhancements on offer seem like we probably won’t get very much out of them before AGI, and they absolutely don’t suffice to ‘keep up’ with AGI for more than a few weeks or months (as AI R&D efforts rapidly improve AI while human brains remain similar, rendering human-AI cyborg basically AI systems). So benefit from those channels, especially for something like BCI, has to add value mainly by making better initial decisions, like successfully aligning early AGI, rather than staying competitive. On the other hand, advanced AGI can quickly develop technologies like whole brain emulation (likely more potent than BCI by far).
BCI as a direct tool for alignment I don’t think makes much sense. Giving advanced AGI read-write access to human brains doesn’t seem like the thing to do with an AI that you don’t trust. On the other hand, an AGI that is trying to help you will have a great understanding of what you’re trying to communicate through speech. Bottlenecks look to me more like they lie in human thinking speeds, not communication bandwidth.BCI might provide important mind-reading or motivational changes (e.g. US and PRC leaders being able to verify they were respectively telling the truth about an AGI treaty), but big cognitive enhancement through that route seems tricky in developed adult brains: much of the variation in human cognitive abilities goes through early brain development (e.g. genes expressed then).Genetic engineering sorts of things would take decades to have an effect, so are only relevant for bets on long timelines for AI.Human brain emulation is an alternative path to AGI, but suffers from the problem that understanding pieces of the brain (e.g. the algorithms of cortical columns) could enable neuroscience-inspired AGI before emulation of specific human minds. That one seems relatively promising as a thing to try to do with early AGI, and can go further than the others (as emulations could be gradually enhanced further into enormous superintelligent human-derived minds, and at least sped up and copied with more computing hardware).
What level of taxation do you think would delay timelines by even one year?
With effective compute for AI doubling more than once per year, a global 100% surtax on GPUs and AI ASICs seems like it would be a difference of only months to AGI timelines.
This is the terrifying tradeoff, that delaying for months after reaching near-human-level AI (if there is safety research that requires studying AI around there or beyond) is plausibly enough time for a capabilities explosion (yielding arbitrary economic and military advantage, or AI takeover) by a more reckless actor willing to accept a larger level of risk, or making an erroneous/biased risk estimate. AI models selected to yield results while under control that catastrophically take over when they are collectively capable would look like automating everything was largely going fine (absent vigorous probes) until it doesn’t, and mistrust could seem like paranoia.
I’d very much like to see this done with standard high-quality polling techniques, e.g. while airing counterarguments (like support for expensive programs that looks like majority but collapses if higher taxes to pay for them is mentioned). In particular, how the public would react given different views coming from computer scientists/government commissions/panels.
This is like saying there’s no value to learning about and stopping a nuclear attack from killing you because you might get absolutely no benefit from not being killed then, and being tipped off about a threat trying to kill you, because later the opponent might kill you with nanotechnology before you can prevent it.Removing intentional deception or harm greatly increases the capability of AIs that can be worked with without getting killed, to further improve safety measures. And as I said actually being able to show a threat to skeptics is immensely better for all solutions, including relinquishment, than controversial speculation.
I agree that some specific leaders you cite have expressed distaste for model scaling, but it seems not to be a core concern. In a choice between more politically feasible measures that target concerns they believe are real vs concerns they believe are imaginary and bad, I don’t think you get the latter. And I think arguments based on those concerns get traction on measures addressing the concerns, but less so on secondary wishlist items of leaders .I think that’s the reason privacy advocacy in legislation and the like hasn’t focused on banning computers in the past (and would have failed if they tried). For example:
If privacy and data ownership movements take their own claims seriously (and some do), they would push for banning the training of ML models on human-generated data or any sensor-based surveillance that can be used to track humans’ activities.
AGI working with AI generated data or data shared under the terms and conditions of web services can power the development of highly intelligent catastrophically dangerous systems, and preventing AI from reading published content doesn’t seem close to the core motives there, especially for public support on privacy. So taking the biggest asks they can get based on privacy arguments I don’t think blocks AGI.
People like Divya Siddarth, Glen Weyl, Audrey Tang, Jaron Lanier and Daron Acemoglu have repeatedly expressed their concerns about how current automation of work through AI models threatens the empowerment of humans in their work, creativity, and collective choice-making.
It looks this kind of concern at scale naturally goes towards things like compensation for creators (one of Lanier’s recs), UBI, voting systems, open-source AI, and such.Jaron Lanier has written a lot dismissing the idea of AGI or work to address it. I’ve seen a lot of such dismissal from Glen Weyl. Acemoglu I don’t think wants to restrict AI development? I don’t know Siddarth or Tang’s work well.
Note that I have not read any writings from Gebru that “AGI risk” is not a thing. More the question of why people are then diverting resources to AGI-related research while assuming that the development general AI is inevitable and beyond our control.
They’re definitely living in a science fiction world where everyone who wants to save humanity has to work on preventing the artificial general intelligence (AGI) apocalypse...Agreed but if that urgency is in direction of “we need to stop evil AGI & LLMs are AGI” then it does the opposite by distracting from types of harms perpetuated & shielding those who profit from these models from accountability. I’m seeing a lot of that atm (not saying from you)...What’s the open ai rationale here? Clearly it’s not the same as mine, creating a race for larger & larger models to output hateful stuff? Is it cause y’all think they have “AGI”?...Is artificial general intelligence (AGI) apocalypse in that list? Cause that’s what him and his cult preach is the most important thing to focus on...The thing is though our AGI superlord is going to make all of these things happen once its built (any day now) & large language models are a way to get to it...Again, this movement has so much of the $$ going into “AI safety.” You shouldn’t worry about climate change as much as “AGI” so its most important to work on that. Also what Elon Musk was saying around 2015 when he was backing of Open AI & was yapping about “AI” all the time.
That reads to me as saying concerns about ‘AGI apocalypse’ are delusional nonsense but pursuit of a false dream of AGI incidentally cause harms like hateful AI speech through advancing weaker AI technology, while the delusions should not be an important priority.
What do you mean here with a “huge lift”?
I gave the example of barring model scaling above a certain budget.
I touched on reasons here why interpretability research does not and cannot contribute top long-term AGI safety.
I disagree extremely strongly with that claim. It’s prima facie absurd to think that, e.g. that using interpretability tools to discover that AI models were plotting to overthrow humanity would not help to avert that risk. For instance, that’s exactly the kind of thing that would enable a moratorium on scaling and empowering those models to improve the situationn.
As another example, your idea of Von Neuman Probes with error correcting codes, referred to by Christiano here, cannot soundly work for AGI code (as self-learning new code for processing inputs into outputs, and as introducing errors through interactions with the environment that cannot be detected and corrected). This is overdetermined. An ex-Pentagon engineer has spelled out the reasons to me. See a one-page summary by me here.
This is overstating what role error-correcting codes play in that argument. They mean the same programs can be available and evaluate things for eons (and can evaluate later changes with various degrees of learning themselves), but don’t cover all changes that could derive from learning (although there are other reasons why those could be stable in preserving good or terrible properties).
I agree there is some weak public sentiment in this direction (with the fear of AI takeover being weaker). Privacy protections and redistribution don’t particularly favor measures to avoid AI apocalypse. I’d also mention this YouGov survey:
But the sentiment looks weak compared to e.g. climate change and nuclear war, where fossil fuel production and nuclear arsenals continue, although there are significant policy actions taken in hopes of avoiding those problems. The sticking point is policymakers and the scientific community. At the end of the Obama administration the President asked scientific advisors what to make of Bostrom’s Superintelligence, and concluded not to pay attention to it because it was not an immediate threat. If policymakers and their advisors and academia and the media think such public concerns are confused, wrongheaded, and not politically powerful they won’t work to satisfy them against more pressing concerns like economic growth and national security. This is a lot worse than the situation for climate change, which is why it seems better regulation requires that the expert and elite debate play out differently, or the hope that later circumstances such as dramatic AI progress drastically change views (in favor of AI safety, not the central importance of racing to AI).
Do you think there is a large risk of AI systems killing or subjugating humanity autonomously related to scale-up of AI models?A movement pursuing antidiscrimination or privacy protections for applications of AI that thinks the risk of AI autonomously destroying humanity is nonsense seems like it will mainly demand things like the EU privacy regulations, not bans on using $10B of GPUs instead of $10M in a model. It also seems like it wouldn’t pursue measures targeted at the kind of disaster it denies, and might actively discourage them (this sometimes happens already). With a threat model of privacy violations restrictions on model size would be a huge lift and the remedy wouldn’t fit the diagnosis in a way that made sense to policymakers. So I wouldn’t expect privacy advocates to bring them about based on their past track record, particularly in China where privacy and digital democracy have not had great success.If it in fact is true that there is a large risk of almost everyone alive today being killed or subjugated by AI, then establishing that as scientific consensus seems like it would supercharge a response dwarfing current efforts for things like privacy rules, which would aim to avert that problem rather than deny it and might manage such huge asks, including in places like China. On the other hand, if the risk is actually small, then it won’t be possible to scientifically demonstrate high risk, and it would play a lesser role in AI policy. I don’t see a world where it’s both true the risk is large and knowledge of that is not central to prospects for success with such huge political lifts.