Eliezer seems to be relatively confident that AI systems will be very alien and will understand many things about the world that humans don’t, rather than understanding a similar profile of things (but slightly better), or having weaker understanding but enjoying other advantages like much higher serial speed. I think this is very unclear and Eliezer is wildly overconfident. It seems plausible that AI systems will learn much of how to think by predicting humans even if human language is a uselessly shallow shadow of human thought, because of the extremely short feedback loops. It also seems quite possible that most of their knowledge about science will be built by an explicit process of scientific reasoning and inquiry that will proceed in a recognizable way to human science even if their minds are quite different. Most importantly, it seems like AI systems have huge structural advantages (like their high speed and low cost) that suggest they will have a transformative impact on the world (and obsolete human contributions to alignment retracted) well before they need to develop superhuman understanding of much of the world or tricks about how to think, and so even if they have a very different profile of abilities to humans they may still be subhuman in many important ways.
It seems to me that this claim is approximately equivalent to “takeoff will be soft, not hard”. In hard takeoff world, it seems straightforward that AI systems will understand huge important part/dynamics of the world, in ways that humans don’t, even a little?
Early transformative AI systems will probably do impressive technological projects by being trained on smaller tasks with shorter feedback loops and then composing these abilities in the context of large collaborative projects (initially involving a lot of humans but over time increasingly automated). When Eliezer dismisses the possibility of AI systems performing safer tasks millions of times in training and then safely transferring to “build nanotechnology” (point 11 of list of lethalities) he is not engaging with the kind of system that is likely to be built or the kind of hope people have in mind.
It seems like Paul is imagining something CAIS-like, where you compose a bunch of AI abilities that are fairly robust in their behavior, and then conglomerate them into large projects that do big things, much like human organizations.(Unless I’m misunderstanding, in which case the rest of this comment is obviated.)
It seems like this working depends on two factors:
First of all, it needs to be the case that conglomerations like this are competitive with giant models that are a single unified brain.
On first pass, this assumption seems pretty untrue? The communication bandwidth, and ability to operate as a unit, of people in an organization, is much much lower than that of the sub-modules of a person’s brain.
Second it supposes that when you compose a bunch of AI systems, to do something big and novel like design APM systems, that each individual component will still be operating within it’s training distribution, as opposed to this requiring some AIs in the engineering project being fed inputs that are really weird, and might produce unanticipated behavior.
This seems like a much weaker concern though. For one thing, it seems like you ought to be able to put checks on whether a given AI component is being fed out-of-distribution inputs, and raising a flag for oversight whenever that happens.
When I get stuck on a problem (e.g. what is the type signature of human values?), I do not stay stuck. I notice I am stuck, I run down a list of tactics, I explicitly note what works, I upweight that for next time.
What tactics in particular?
Realistically I think the core issue is that Eliezer is very skeptical about the possibility of competitive AI alignment. That said, I think that even on Eliezer’s pessimistic view he should probably just be complaining about competitiveness problems rather than saying pretty speculative stuff about what is needed for a pivotal act.
Isn’t the core thing here that Eliezer expects that a local, hard-takeoff is possible? He thinks that a single AI system can rapidly gain enormous power relative to the rest of the world (either by recursive self improvement, or by seizing compute, or by just deploying on more computers)
If this is possible thing for an AGI system to do, it seems like ensuring a human future requires that you’re able to prevent an unaligned AGI from undergoing a hard takeoff.
If you have aligned systems that are competitive in a number of different domains, that doesn’t matter if 1) local hard takeoff is on the table and 2) you aren’t able to produce systems whose alignment is robust to a hard takeoff.
It seems like the pivotal act ideology is a natural consequence of 1) expecting hard takeoff and 2) thinking that alignment is hard, full stop. Whether or not aligned systems will be competitive doesn’t come into it. Or by “competitive” do you mean, specifically “competitive, even across the huge relative capability gain of a hard takeoff”? It seems like Eliezer’s chain of argument is:
[Hard takeoff is likely]
[You need a pivotal act to preempt unaligned superintelligence]
[Your safe AI design needs to be able to do something concrete that can enable a pivotal act in order to be of strategic relevance.]
[When doing AI safety work, you need to be thinking about the concrete actions that your system will do]
The notion of an AI-enabled “pivotal act” seems misguided. Aligned AI systems can reduce the period of risk of an unaligned AI by advancing alignment research, convincingly demonstrating the risk posed by unaligned AI, and consuming the “[free energy](https://www.lesswrong.com/posts/yPLr2tnXbiFXkMWvk/an-equilibrium-of-no-free-energy)”that an unaligned AI might have used to grow explosively. No particular act needs to be pivotal in order to greatly reduce the risk from unaligned AI, and the search for single pivotal acts leads to unrealistic stories of the future and unrealistic pictures of what AI labs should do.
On the face of it, this seems true, and it seems like a pretty big clarification to my thinking. You can buy more time or more safety, at little bit at a time, instead of all at once, in sort of the way that you want to achieve life extension escape velocity.
But it seems like this largely depends on whether you expect takeoff to be hard or soft. If AI takeoff is hard, you need pretty severe interventions, because they either need to prevent the deployment of AGI or be sufficient to counter the actions of a superintelligece. Generally, it seems like the sharper takeoff is, the more good outcomes flow through pivotal acts, and the smoother takeoff is the more we should expect good outcomes to flow through incremental improvements.Are there any incremental actions that add up to a “pivotal shift” in a hard takeoff world?
[Ngo][17:31] And some deep principles governing engines, but not really very crucial ones to actually building (early versions of) those engines[Yudkowsky][17:31] that’s… not historically true at all?getting a grip on quantities of heat and their flow was critical to getting steam engines to workit didn’t happen until the math was there
And some deep principles governing engines, but not really very crucial ones to actually building (early versions of) those engines
that’s… not historically true at all?
getting a grip on quantities of heat and their flow was critical to getting steam engines to work
it didn’t happen until the math was there
Checking very quickly, this article, at least, disagrees with the claim that thermodynamics was developed a century after the invention of the steam engine.Maybe Eliezer is referring to something more basic than thermodynamics? Or this just an error?
This comment seems to me to be pointing at something very important which I had not hitherto grasped.
My (shitty) summary:There’s a big difference between gains from improving the architecture / abilities of a system (the genome, for human agents) and gains from increasing knowledge developed over the course of an episode (or lifetime). In particular they might differ in how easy to “get the alignment in”. If the AGI is doing consequentialist reasoning while it is still mostly getting gains from gradient descent as opposed to from knowledge collected over an episode, then we have more ability to steer it’s trajectory.
I like this format!
Today, I was reading Mistakes with Conservation of Expected Evidence. For some reason, I was under the impression that the post was written by Rohin Shah; but it turns out it was written by Abram Demski.
In retrospect, I should have been surprised that “Rohin” kept talking about what Eliezer says in the Sequences. I wouldn’t have guessed that Rohin was that “culturally rationalist” or that he would be that interested in what Eliezer wrote in the sequences. And indeed, I was updating that Rohin was more of a rationalist, with more rationalist interests, than I had thought. If I had been more surprised, I could have noticed my surprise / confusion, and made a better prediction.
But on the other hand, was my surprise so extreme that it should have triggered an error message (confusion), instead of merely an update? Maybe this was just fine reasoning after all?
From a Bayesian perspective, I should have observed this evidence, and increased my credence in both Rohin being more rationalist-y than I thought, and also in the hypothesis that this wasn’t written by Rohin. But practically, I would have needed to generate the second hypothesis, and I don’t think that I had strong enough reason to.
I feel like there’s a semi-interesting epistemic puzzle here. What’s the threshold for a surprising enough observation that you should be confused (much less notice your confusion)?
Noting for myself: I didn’t make an explicit prediction, but I emotionally expected John to be vindicated by this experiment. My emotional prediction was wrong, and that seems good to notice, even if I don’t do much further reflection.
This is a great comment. The graphs helped a lot.
I just want to say that I found this comment personally helpful.
This is the problem with how the rationalist community approaches the concept of what it means to “make a rational decision” perfectly demonstrated in a single debate. You do not make a “rational decision” in the real world by reasoning in a vacuum.
Something about this seems on point to me. Rationalists, in general, are much more likely to be mathematicians, than (for instance) mechanical engineers. It does seem right to me that when I look around, I see people drawn to abstract analyses, very plausibly at the expense of neglecting contextualized details that are crucial for making good calls. This seems like it could very well be a bias of my culture.
For instance, it’s fun and popular to talk about civilizational inadequacy, or how the world is mad. I think that is pointing at something true and important, but I wonder how much of that is basically overlooking the fact that it is hard to do things in the real world with a bunch of different stakeholders and a confusing mistakes. In a lot of cases, civilizational inadequacy can be the result of engineers (broadly construed) who understand that “the perfect is the enemy of the good”, pushing projects through to completion anyway. The outcome is sometimes so muddled to be worse than having done nothing, but also, shipping things under constraints, even though they could be much better on some axes is how civilization runs.
Anyway, this makes me think that I should attempt to do more engineering projects, or otherwise find ways to operate in domains where the goal is to get “good enough”, within a bunch of not-always crisply-defined constraints.
Actually, my more specific question is “is verification still easier than generation, if the generation is adversarial?” That seems like a much more specific problem space than just “generation and verification in general.”
I think you can lump them together for this conversation
Why do you think this?
It seems to me that reading books about deep learning is a just fine thing to do, but that publishing papers that push forward the frontier of deep learning is plausibly quite bad. These seem like such different activities that I’m not at all inclined to lump them together for the purposes of this question.
Eliezer seems to argue that humans couldn’t verify pivotal acts proposed by AI systems (e.g. contributions to alignment research), and that this further makes it difficult to safely perform pivotal acts. In addition to disliking his concept of pivotal acts, I think that this claim is probably wrong and clearly overconfident. I think it doesn’t match well with pragmatic experience in R&D in almost any domain, where verification is much, much easier than generation in virtually every domain.
I, personally, would like 5 or 10 examples, from disparate fields, of verification being easier than generation.
And also counterexamples, if anyone has any.
I personally found this to be a very helpful comment for visualizing how things could go.
Is this “sharp left turn” a crux for your overall view, or your high probability of failure?Naively, it seems to me that if capability gains are systematically gradual, that improvements are iterative, and occur at little at a time, we’re in a much better situation with regard to alignment.
If capabilities gains are gradual, we can continuously feed training data to our system and keep its alignment in step with its capabilities. As soon as it starts to enter a distributional shift, and some of its outputs are (or would be) unaligned, those alignment failures are immediately corrected. You can keep reinforcing corrigibility as capabilities generalize, so that it correctly generalizes the corrigibility concept. Similarly, the more gradually capabilities grow, the more reliable oversight schemes will be.
(On the other hand, this doesn’t solve the problem that there’s some capability threshold beyond which the outputs of an AI system are illegible to humans, and we can’t tell whether or not the outputs are aligned or not, in order to give it corrective training data.
Also, if one could, in principle, increase capabilities gradually, but someone else can throw caution to the wind and turn up the capability dial to 11, the unilateralist’s curse kills us.)
How much would finding out that there’s not going to be a sharp left turn impact the rest of your model?
Or, suppose we could magically scale up our systems as gradually as you, Nate, would like, slowing down as we start to see a super-linear improvement, how much safer does is humanity?
A great Rob Miles introduction to this concept:
I’m not compelled by that analogy. There are lots of things that money can’t buy, but that (sufficient) intelligence can.
There are theoretical limits to what cognition is able to do, but those are so far from the human range that they’re not really worth mentioning. The question is: “are there practical limits to what an intelligence can do, that leave even a super-intelligence uncommunicative with human civilization?”It seems to me that as an example, you could just take a particularly impressive person (Elon Musk or John Von Neuman are popular exemplars) and ask “What if there was a nation of only people who were that capable?” It seems that if a nation of say 300,000,000 Elon Musks went to war with the United States, the United States would loose handily. Musktopia would just have a huge military-technological advantage: they would do fundamental science faster, and develop engineering innovations faster, and have better operational competence than the US, on ~ all levels. (I think this is true for a much smaller number than 300,000,000, having a number that high makes the point straightforward.)
Does that seem right to you? If not, why not?
Or alternatively, what do you make of vignettes like That Alien Message?
I shudder to imagine the future we might have had if 10 full-Eliezers and 50 semi-Eliezers had been working on that problem full time for the last fifteen years.
That sounds obviously amazing. Are you under the impression that recruitment succeeded so enormously that there are 10 people that can produce intellectual content as relevant and compelling as the original sequences, but that they’ve been working at MIRI (or something) instead? Who are you thinking of?
I don’t think we got even a single Eliezer-substitute, even though that was one of the key goals of the writing the sequences.