One major update from the Chinchilla paper against the NN timelines that this post doesn’t capture (inspired by this comment by Rohin):
Based on Kaplan scaling laws, we might’ve expected that raw parameter count was the best predictor of capabilities. Chinchilla scaling laws introduced a new component, data quantity, that was not incorporated in the original report.
Chinchilla scaling laws provide the compute-optimal trade off between datapoints and parameters, but not the cost-optimal trade off (assuming that costs come from both using more compute, and observing more datapoints). In biological systems, the marginal cost of doubling the amount of data is very high, since that requires doubling the organism’s lifespan or doubling its neuron throughput, which are basically hard constraints. This means that human brains may be very far from “compute optimal” in the zero-datapoint-cost limit suggested by Chinchilla, implying ANN models much smaller than brain-size (estimated at 10T-parameters) may achieve human-level performance given compute-optimal quantities of data.
In other words, the big takeaway is that we should update away from human-level FLOPS as a good bio-anchor independent of the number of training datapoints, since we have reason to believe that human brains face other constraints which suboptimally inflate the number of FLOPS brains use to attain a given level of performance.
Hey Rob, thanks for your reply. If it makes you guys feel better, you can rationalize the following as my expression of the Bargaining stage of grief.
Consider me completely convinced that alignment is hard, and that a lot of people aren’t taking it seriously enough, or are working on the wrong parts of the problem. That is fundamentally different from saying that it’s unlikely to be solved even if we get 100× as many people working on it (albeit for a shorter time), especially if you believe that geniuses are outliers and thus that the returns on sampling for more geniuses remain large even after drawing many samples (especially if we’ve currently sampled <500 over the lifetime of the field). To get down to <1% probability of success, you need a fundamentally different argument structure. Here are some examples.
“We have evidence that alignment will absolutely necessitate a lot of serial research. This means that even if lots more people join, the problem by its nature cannot be substantially accelerated by dramatically increasing the number of researchers (and consequently with high probability increasing the average quality of the top 20 researchers).”
I would love to see the structure of such an argument.
“We have a scheme for comprehensively dividing up all plausible alignment approaches. For each class of approach, we have hardness proofs, or things that practically serve as hardness proofs such that we do not believe 100 smart people thinking about it for a decade are at all likely to make more progress than we have in the previous decade.”
Needless to say, if you had such a taxonomy (even heuristically) it would be hugely valuable to the field—if for no other reason than that it would serve as an excellent communication mechanism to skeptics about the flaws in their approaches.
This would also be massively important from a social-coordination perspective. Consider how much social progress ELK made in building consensus around the hardness of the ontology-mismatch problem. What if we did that, but for every one of your hardness pseudo-results, and made the prize $10M for each hardness result instead of $50k and broadcasted it to the top 200 CS departments worldwide? It’d dramatically increase the salience of alignment as a real problem that no one seems able to solve, since if someone already could they’d have made $10M.
“We are the smartest the world has to offer; even if >50% of theoretical computer scientists and >30% of physicists and >30% of pure mathematicians at the top 100 American universities were to start working on these problems 5 years from now, they would be unlikely to find much we haven’t found.”
I’m not going to tell you this is impossible, but I haven’t seen the argument made yet. From an outside-view, the thing that makes Eliezer Yudkowsky get to where he is is (1) being dramatic-outlier-good at generalist reasoning, and (2) being an exceptional communicator to a certain social category (nerds). Founding the field is not, by itself, a good indicator of being dramatic-outlier-exceptional at inventing weird higher-level math. Obviously still MIRI are pretty good at it! But the best? Of all those people out there?
It would be really really helpful to have a breakdown of why MIRI is so pessimistic, beyond just “we don’t have any good ideas about how to build an off-switch; we don’t know how to solve ontology-mismatch; we don’t know how to prevent inner misalignment; also even if you solve them you’re probably wrong in some other way, based on our priors about how often rockets explode”. I agree those are big real unsolved problems. But, like, I myself have thought of ideas previously-unmentioned and near the research frontier on inner misalignment, and it wasn’t that hard! It did not inspire me with confidence that no amount of further thinking by newbs is likely to make any headway on these problems. Also, “alignment is like building a rocket except we only get one shot” was just as true a decade ago; why were you more optimistic before? Is it all just the hardness of the off-switch problem specifically?
I agree that proliferation would spell doom, but the supposition that the only possible way to prevent proliferation is via building an ASI and YOLOing to take over the world is, to my mind, a pretty major reduction of the options available. Arguably the best option is compute governance; if you don’t have extremely short (<10 year) timelines, it seems probable (>20%) that it will take many, many chips to train an AGI, let alone an ASI. In any conceivable world, these chips are coming from a country under either the American or Chinese nuclear umbrella. (This is because fabs are comically expensive and complex, and EUV lithography machines expensive and currently a one-firm monopoly, though a massively-funded Chinese competitor could conceivably arise someday. To appreciate just how strong this chokepoint is, the US military itself is completely incapable of building its own fabs, if the Trusted Foundry Program is any indication.) If China and NATO were worried that randos training large models had a 10% chance of ending the world, they would tell them to quit it. The fears about “Facebook AI Research will just build an AGI” sound much less plausible if you have 15/20-year timelines, because if the US government tells Facebook they can’t do that, Facebook stops. Any nuclear-armed country outside China/NATO can’t be controlled this way, but then they just won’t get any chips. “Promise you won’t build AGI, get chips, betray US/China by building AGI anyway and hope to get to ASI fast enough to take over the world” is hypothetically possible, but the Americans and Chinese would know that and could condition the sale of chips on as many safeguards as we could think of. (Or just not sell the chips, and make India SSH into US-based datacenters.)
Addressing possible responses:
It’s impossible to know where compute goes once it leaves the fabs.
Impossible? Or just, like, complicated and would require work? I will grant that it’s impossible to know where consumer compute (like iPhones) ends up, but datacenter-scale compute seems much more likely to be trackable. Remember that in this world, the Chinese government is selling you chips and actually doesn’t want you building AGI with them. If you immediately throw your hands up and say you are confident there is no logistical way to do that, I think you are miscalibrated.
Botnets (a la Gwern):
You will note that in the Gwern story, the AGI had to build its own botnet; the initial compute needed to “ascend and break loose” was explicitly sanctioned by the government, despite a history of accidents. What if those two governments could be convinced about the difficulty of AI alignment, and actually didn’t let anyone run random code connected to the internet?
What if the AGI is trained on an existing botnet, a la Folding@Home, or some darknet equivalent run by a terrorist group/nation state? It’s possible; we should be thinking of monitoring techniques. The capabilities of botnets to undetectably leverage hypercompute are not infinite, and with real political will, I don’t know why it would be intractable to make it hard.
We don’t trust the US/Chinese governments to be able to correctly assess alignment approaches, when the time comes. The incentives are too extreme in favor of deployment.
This is a reasonable concern. But the worst version of this, where the governments just okay something dumb with clear counterarguments, is only possible if you believe there remains a lack of consensus around the even-minor possibility of a catastrophic alignment problem. No American or Chinese leader has, in their lifetimes, needed to make a direct decision that had even a 10% chance of killing ten million Americans. (COVID vaccine buildout is a decent response, but sins of omission and commission are different to most people.)
Influencing the government is impossible.
We’re really only talking about convincing 2 bureaucracies; we might fail, but “it’s impossible” is an unfounded assumption. The climate people did it, and that problem has way more powerful entrenched opponents. (They didn’t get everything they want yet, but they’re asking for a lot more than we would be, and it’s hard to argue the people in power don’t think climate science is real.)
As of today in the US, “don’t build AGI until you’re sure it won’t turn on you and kill everyone” has literally no political opponents, other than optimistic techno-futurists, and lots of supporters for obvious and less-obvious (labor displacement) reasons. I struggle to see where the opposition would come from in 10 years, either, especially considering that this would be regulation of a product that didn’t exist yet and thus had no direct beneficiaries.
While Chinese domestic sentiments may be less anti-AI, the CCP doesn’t actually make decisions based on what its people think. It is an engineering-heavy elite dictatorship; if you convince enough within-China AI experts, there is plenty of reason to believe you could convince the CCP.
This isn’t a stable equilibrium; something would go wrong and someone would push the button eventually.
That’s probably true! If I had to guess, I think it could probably last for a decade, and probably not for two. That’s why it matters a lot whether the alignment problem is “too hard to make progress on in 2 decades with massive investment” or just “really hard and we’re not on track to solve it.”
You may also note that the only data point we have about “Will a politician push a button that with some probability ends the world, and the rest of the time their country becomes a hegemon?” is the Cuban Missile Crisis. There was no mature second strike capability; if Kennedy had pushed the button, he wasn’t sure the other side could have retaliated. Do I want to replay the 1950s-60s nuclear standoff? No thank you. Would I trade that for racing to build an unaligned superintelligence first and then YOLOing? Yes please.
You will note that every point I’ve made here has a major preceding causal variable: enough people taking the hardness of the alignment problem seriously that we can do big-kid interventions. I really empathize with the people who feel burnt out about this. You have literally been doing your best to save the world, and nobody cares, so it feels intuitively likely that nobody will care. But there are several reasons I think this is pessimism, rather than calibrated realism:
The actual number of people you need to convince is fairly small. Why? Because this is a technology-specific question and the only people who will make the relevant decisions are technical experts or the politicians/leaders/bureaucrats they work with, who in practice will defer to them when it comes to something like “the hardness of alignment”.
The fear of “politicians will select those experts which recite convenient facts” is legitimate. However, this isn’t at all inevitable; arguably the reason this both-sidesing happened so much within climate science is that the opponents’ visibility was heavily funded by entrenched interests—which, again, don’t really exist for AGI.
Given that an overwhelming majority of people dismiss the alignment problem primarily on the basis that their timelines are really long, every capability breakthrough makes shorter timelines seem more likely (and also makes the social cost of shorter timelines smaller, as everyone else updates on the same information). You can already see this to some extent with GPT-3 converting people; I for one had very long timelines before then. So strategies that didn’t work 10 years ago are meaningfully more likely to work now, and that will become even more true.
Social influence is not a hard technical problem! It is hard, but there are entire industries of professionals who are actually paid to convince people of stuff. AI alignment is not funding constrained; all we’d need is money!
On the topic of turning money into social influence, people really fail to appreciate how much money there is out there for AI alignment, especially if you could convince technical AI researchers. Guess who really doesn’t like an AI apocalypse? Every billionaire with a family office who doesn’t like giving to philanthropy! Misaligned AI is one of the only things that could meaningfully hurt the expected value of billionaires’ children; if scientists start telling billionaires this is real, it is very likely you can unlock orders of magnitude more money than the ~$5B that FTX + OpenPhil seem on track to spend. On that note, money can be turned into social influence in lots of ways. Give the world’s thousand most respected AI researchers $1M each to spend 3 months working on AI alignment, with an extra $100M if by the end they can propose a solution alignment researchers can’t shoot down. I promise you that other than like 20 industry researchers who are paid silly amounts, every one of them would take the million. They probably won’t make any progress, but from then on when others ask them whether they think alignment is a real unsolved problem, they will be way more likely to say yes. That only costs you a billion dollars! I literally think I could get someone reading this the money to do this (at least at an initially moderate scale) - all it needs is a competent person to step up.
The other point that all of my arguments depend on, is that we have, say, at least until 2035. If not, a lot of these ideas become much less likely to work, and I start thinking much more that “maybe it really will just be DeepMind YOLOing ASI” and dealing with attendant strategies. So again, if Eliezer has private information that makes him really confident relative to everyone else, that >50% of the probability mass is on sooner than 2030, it sure would be great if I knew how seriously to take that, and whether he thinks a calibrated actor would abandon the other strategies and focus on Hail Marys.