You may be interested in some recent empirical experiments, demonstrating objective robustness failures/inner misalignment, including ones predicted in the risks from learned optimization paper.
There is at least one firm doing drone delivery in China and they just approved a standard for it.
Mainly such complete (and irreversible!) delegation to such incompetent systems being necessary or executed. If AI is so powerful that the nuclear weapons are launched on hair-trigger without direction from human leadership I expect it to not be awful at forecasting that risk.You could tell a story where bargaining problems lead to mutual destruction, but the outcome shouldn’t be very surprising on average, i.e. the AI should be telling you about it happening with calibrated forecasts.
The US and China might well wreck the world by knowingly taking gargantuan risks even if both had aligned AI advisors, although I think they likely wouldn’t.But what I’m saying is really hard to do is to make the scenarios in the OP (with competition among individual corporate boards and the like) occur without extreme failure of 1-to-1 alignment (for both companies and governments). Competitive pressures are the main reason why AI systems with inadequate 1-to-1 alignment would be given long enough leashes to bring catastrophe. I would cosign Vanessa and Paul’s comments about these scenarios being hard to fit with the idea that technical 1-to-1 alignment work is much less impactful than cooperative RL or the like.
In more detail, I assign a ≥10% chance to a scenario where two or more cultures each progressively diminish the degree of control they exercise over their tech, and the safety of the economic activities of that tech to human existence, until an involuntary human extinction event. (By comparison, I assign at most around a ~3% chance of a unipolar “world takeover” event, i.e., I’d sell at 3%.)
If this means that a ‘robot rebellion’ would include software produced by more than one company or country, I think that that is a substantial possibility, as well as the alternative, since competitive dynamics in a world with a few giant countries and a few giant AI companies (and only a couple leading chip firms) can mean that the way safety tradeoffs work is by one party introducing rogue AI systems that outcompete by not paying an alignment tax (and intrinsically embodying in themselves astronomically valuable and expensive IP), or cascading alignment failure in software traceable to a leading company/consortium or country/alliance. But either way reasonably effective 1-to-1 alignment methods (of the ‘trying to help you and not lie to you and murder you with human-level abilities’ variety) seem to eliminate a supermajority of the risk.[I am separately skeptical that technical work on multi-agent RL is particularly helpful, since it can be done by 1-to-1 aligned systems when they are smart, and the more important coordination problems seem to be earlier between humans in the development phase.]
I think I disagree with you on the tininess of the advantage conferred by ignoring human values early on during a multi-polar take-off. I agree the long-run cost of supporting humans is tiny, but I’m trying to highlight a dynamic where fairly myopic/nihilistic power-maximizing entities end up quickly out-competing entities with other values, due to, as you say, bargaining failure on the part of the creators of the power-maximizing entities.
Right now the United States has a GDP of >$20T, US plus its NATO allies and Japan >$40T, the PRC >$14T, with a world economy of >$130T. For AI and computing industries the concentration is even greater.These leading powers are willing to regulate companies and invade small countries based on reasons much less serious than imminent human extinction. They have also avoided destroying one another with nuclear weapons.If one-to-one intent alignment works well enough that one’s own AI will not blatantly lie about upcoming AI extermination of humanity, then superintelligent locally-aligned AI advisors will tell the governments of these major powers (and many corporate and other actors with the capacity to activate governmental action) about the likely downside of conflict or unregulated AI havens (meaning specifically the deaths of the top leadership and everyone else in all countries).
All Boards wish other Boards would stop doing this, but neither they nor their CEOs manage to strike up a bargain with the rest of the world stop it.
Within a country, one-to-one intent alignment for government officials or actors who support the government means superintelligent advisors identify and assist in suppressing attempts by an individual AI company or its products to overthrow the government.Internationally, with the current balance of power (and with fairly substantial deviations from it) a handful of actors have the capacity to force a slowdown or other measures to stop an outcome that will otherwise destroy them. They (and the corporations that they have legal authority over, as well as physical power to coerce) are few enough to make bargaining feasible, and powerful enough to pay a large ‘tax’ while still being ahead of smaller actors. And I think they are well enough motivated to stop their imminent annihilation, in a way that is more like avoiding mutual nuclear destruction than cosmopolitan altruistic optimal climate mitigation timing.That situation could change if AI enables tiny firms and countries to match the superpowers in AI capabilities or WMD before leading powers can block it.So I agree with others in this thread that good one-to-one alignment basically blocks the scenarios above.
I think they are fighting each other all the time, though mostly in very prosaic ways (e.g. McDonald’s and Burger King’s marketing AIs are directly competing for customers). Are there some particular conflicts you imagine that are suppressed in the story?
I think the one that stands out the most is ‘why isn’t it possible for some security/inspector AIs to get a ton of marginal reward by whistleblowing against the efforts required for a flawless global camera grab?’ I understand the scenario says it isn’t because the demonstrations are incomprehensible, but why/how?
That is the opposite error, where one cuts off the close election cases. The joint probability density function over vote totals is smooth because of uncertainty (which you can see from polling errors), so your chance of being decisive scales proportionally with the size of the electorate and the margin of error in polling estimation.
The error is a result of assuming the coin is exactly 50%, in fact polling uncertainties mean your probability distribution over its ‘weighting’ is smeared over at least several percentage points. E.g. if your credence from polls/538/prediction markets is smeared uniformly from 49% to 54%, then the chance of the election being decided by a single vote is one divided by 5% of the # of voters.You can see your assumption is wrong because it predicts that tied elections should be many orders of magnitude more common than they are. There is a symmetric error where people assume that the coin has a weighting away from 50%, so the chances of your vote mattering approach zero. Once you have a reasonable empirical distribution over voting propensities fit to reproduce actual election margins both these errors go away.See Andrew Gelman’s papers on this.
As I said, the story was in combination with one-boxing decision theories and our duplicate counterparts.
I suppose by ‘the universe’ I meant what you would call the inflationary multiverse, that is including distant regions we are now out of contact with. I personally tend not to call regions separated by mere distance separate universes.”and the only impact of our actions with infinite values is the number of black holes we create.”Yes, that would be the infinite impact I had in mind, doubling the number would double the number of infinite branching trees of descendant universes.Re simulations, yes, there is indeed a possibility of influencing other levels, although we would be more clueless, and it is a way for us to be in a causally connected patch with infinite future.
Thanks David, this looks like a handy paper!
Given all of this, we’d love feedback and discussion, either as comments here, or as emails, etc.
I don’t agree with the argument that infinite impacts of our choices are of Pascalian improbability, in fact I think we probably face them as a consequence of one-boxing decision theory, and some of the more plausible routes to local infinite impact are missing from the paper:
The decision theory section misses the simplest argument for infinite value: in an infinite inflationary universe with infinite copies of me, then my choices are multiplied infinitely. If I would one-box on Newcomb’s Problem, then I would take the difference between eating the sandwich and not to be scaled out infinitely. I think this argument is in fact correct and follows from our current cosmological models combine with one-boxing decision theories.
Under ‘rejecting physics’ I didn’t see any mention of baby universes, e.g. Lee Smolin’s cosmological natural selection. If that picture were right, or anything else in which we can affect the occurrence of new universes/inflationary bubbles forming, then that would permit infinite impacts.
The simulation hypothesis is a plausible way for our physics models to be quite wrong about the world in which the simulation is conducted, and further there would be reason to think simulations would be disproportionately conducted under physical laws that are especially conducive to abundant computation
Little reaction to the new strain news, or little reaction to new strains outpacing vaccines and getting a large chunk of the population over the next several months?
These projections in figure 4 seem to falsely assume training optimal compute scales linearly with model size. It doesn’t, you also need to show more data points to the larger models so training compute grows superlinearly, as discussed in OAI scaling papers. That changes the results by orders of magnitude (there is uncertainty about which of two inconsistent scaling trends to extrapolate further out, as discussed in the papers).
Maybe the real problem is just that it would add too much to the price of the car?
Yes. GPU/ASICs in a car will have to sit idle almost all the time, so the costs of running a big model on it will be much higher than in the cloud.
I’m not a utilitarian, although I am closer to that than most people (scope sensitivity goes a long way in that direction), and find it a useful framework for highlighting policy considerations (but not the only kind of relevant normative consideration).
And no, Nick did not assert an estimate of x-risk as simultaneously P and <P.
This can prevent you from being able to deduct the interest as investment interest expense on your taxes due to interest tracing rules (you have to show the loan was not commingled with non-investment funds in an audit), and create a recordkeeping nightmare at tax time.