Eli Tyre comments on AI as a science, and three obstacles to alignment strategies

Eli Tyre 29 Oct 2023 5:13 UTC
13 points
4
It looks to me like we’re on track for some people to be saying “look how rarely my AI says bad words”, while someone else is saying “our evals are saying that it can’t deceive humans yet”, while someone else is saying “our AI is acting very submissive, and there’s no reason to expect AIs to become non-submissive, that’s just anthropomorphizing”, and someone else is saying “we’ll just direct a bunch of our AIs to help us solve alignment, while arranging them in a big bureaucracy”, and someone else is saying “we’ve set up the game-theoretic incentives such that if any AI starts betraying us, some other AI will alert us first”, and this is a different sort of situation.
And not one that looks particularly survivable, to me.

And if you ask bureaucrats to distinguish which teams should be allowed to move forward (and how far) in that kind of circus, full of claims, promises, and hunches and poor in theory, then I expect that they basically just can’t.
I’m reminded of a draft post that I started but, never finished or published, about the Manhattan project and the relevance for AI alignment and AI coordination, based on my reading of The Making of the Atomic Bomb.

The historical context: There was a two year period between when the famous Einstein-Szilard letter was delivered to Roosevelt, and when the Manhattan project got started in earnest, during which not much happened. In that period, Szilard and some of the other physicists kept insisting that proceeding with research was both urgent and of upmost importance, that the Nazis might beat the Allies to the bomb. But the the government officials in charge of giving them the resources they needed kept dragging their feet and punting on making any serious commitments. Even small and relatively trivial expenditures, that would have allowed some of the physicists to start work on the project, were delayed or reduced. They commission review committee after review committee to advise on the issue.

There’s an almost-comical series of different physicists trying to get the government to recognize the urgency of the situation, and the government repeatedly dismissing them.
An excerpt from that draft:
Looking at the comedy of errors that took place in this two year period, it’s natural to bemoan the incompetence and lack of foresight of the government officials.
But I think (again, as a non-historian) that having Vanevar Bush as the head of the umbrella organization of the Uranium Committee, was actually pretty lucky, all things considered. I suspect that he was much more capable to make decisions about these matters than the majority who might have been in his place.
He was pretty technical, having been both an engineer and an inventor, before becoming vice president of MIT.
But more than that, he was possessed of substantial imagination and vision. For instance, after the war he wrote the essay, titled “As We May Think”, in which he imagined how a machine called a “memex”, something like what we now know as a personal computer (in contrast to the giant calculating machines of his day), could transform how humans think and work. In particular, the essay suggests the importance of what we now call hypertext, and presaged the World Wide Web.
Bush could clearly understand the import of weird, new, ideas. He wasn’t just a hard-headed bureaucrat dismissive of anything radical. It wasn’t that he was just stupid.
Putting ourselves in his shoes, he had a legitimately hard assessment problem. There were a number of physicists that were telling him, repeatedly, that atomic weapons were a huge deal, and that the government should be putting its weight behind them.
But from his perspective, subtracting the benefit of hindsight, it isn’t obvious that these scientists weren’t getting overexcited and being swept away by the speculative possibilities of their field.
Remember that it isn’t just a question of whether atom bombs are possible in principle, but whether they are feasible in practice, and how quickly they could be developed and manufactured. From Bush’s perspective, it might be that the physicists are on to something, but they are inflating the risk in their own head, because they are misestimating how practical their clever idea actually is.
Further, there’s a bit of a principal-agent problem, in which all of the people who are telling him that atom bombs matter are wanting him to fund their projects.
Usually in situations like this, we would urge the administrator to “consult experts”. But in this case, the idea of fission was so new, there weren’t clear third party experts. We see in the successive review committees that Bush orders that he keeps adding new people with different expertise. In particular, he keeps adding engineers, presumably in the hopes that they can give him feasibility and cost estimates.
Importantly, this didn’t actually work.
- Compton was distressed to discover he could not move the engineers on the review committee—the practical souls Bush had insisted be added to bring the NAS reviews down to earth—to estimate either how much time it would take to build a bomb or how much the enterprise would cost:
  With one accord they refused. . . . There weren’t enough data. The fact was that they had before them all the relevant information that existed, and some kind of answer was needed, however rough it might be, for otherwise our recommendation could not be acted upon. After some discussion, I suggested a total time of between three and five years, and a total cost . . . of some hundreds of millions of dollars. None of the committee members objected.
- So the American numbers came out of a scientist’s hat, as the British numbers had. Atomic energy was still too new for engineering.” (Rhodes, chapter 12)
We can see with the benefit of hindsight that Bush should have prioritized atomic bomb work sooner, but that was hardly obvious at the time.
And given this uncertainty, his response, “lets wait to see the results of some inexpensive experiments to see if we can get a supercritical chain reaction going at all”, seems like a pretty good choice as far as I can see.
Relevance to AI policy work
I think that this is likewise applicable to AI policy work. I’m imagining some person in an equivalent role to Bush’s some number of decades from now.
Suppose that it is well understood that AI risk is at least plausible. Now posit that there is a smallish cadre of Computer Scientists, who are panickly insisting that AGI could be developed within a few years (probably), and it is crucial that it be designed safely. Luckily, they have the outline of a program of development that seems like it will produce aligned AI, unlike the default trajectory. They are desperately petitioning for the government to put massive resources behind that project because we must, must, get to aligned AI before unaligned AI.
Unfortunately, their arguments are kind of speculative, and they can’t give any exact numbers for how long until AGI, or how much it will cost.
And it is totally possible that they are actually mistaken, and there is some fundamental blocker that they haven’t realized yet, and maybe the world is not on the verge of AGI. Or maybe there’s something wrong with their safety schema, and their proposed plan, ultimately, won’t be able to deliver the goods.
And furthermore, there are other x-risk concerned, smart, technical, AI safety researchers who think that these claims are overblown, or claim that the safety approach described is doomed to fail.
From the details, given, it is far from clear who the administrator should trust. What could he possibly do in this situation, except identify some smaller experiments that would be suggestive one way or the other, and to make decisions on that basis?

Eli Tyre comments on AI as a science, and three obstacles to alignment strategies

Relevance to AI policy work