Research Associate at the Transformative Futures Institute, formerly of the MTAIR project and Center on Long-term Risk, Graduate researcher at Kings and AI MSc at Edinburgh. Interested in philosophy, longtermism and AI Alignment.
Sammy Martin
The Value Definition Problem
If you stumbled upon this and didn’t realize morality wasn’t essential, well, um, I’m not going to try to convince you of that.
Perhaps this makes little difference to the rest of your post, but it’s worth noting that the mind-dependence morality isn’t all-or-nothing. A common view is that there are facts about the right and wrong ways to aggregate preferences or turn non-preferences into preferences without there being unconditional facts about what we should do.
And if I want to find out if morality really does not exist as an essential property of the universe, it’s worthwhile to try to take it out of my language and see if it comes up missing.
Moral language being reducible to non-moral language is a separate (though entangled) question to whether there are such things as moral facts. A lot of moral antirealists would say there is something special and indispensable about moral language, and that it means something more than liking or disliking, e.g. prescriptivist. Or take this from Three Worlds Collide (written by an antirealist):
The Babyeaters strive to do the baby-eating thing to do, the Superhappies output the Super Happy thing to do. None of that tells us anything about the right thing to do. They are not asking the same question we are—no matter what word of their language the translator links to our ‘should’. If you’re confused at all about that, my lord, I might be able to clear it up.”
Even if moral realism is true and even if moral claims are special in some way, I still think this part is true at least of aesthetic claims (which all your examples were)
there’s no sense in which something can “look good” if there is no observer to assess the quality, so it seems through language we casually mistake preferences for essences.
Thanks for pointing that out to me; I had not come across your work before! I’ve had a look through your post and I agree that we’re saying similar things. I would say that my ‘Value Definition Problem’ is an (intentionally) vaguer and broader question about what our research program should be—as I argued in the article, this is mostly an axiological question. Your final statement of the Alignment Problem (informally) is:
A must learn the values of H and H must know enough about A to believe A shares H’s values
while my Value Definition Problem is
“Given that we are trying to solve the Intent Alignment problem for our AI, what should we aim to get our AI to want/target/decide/do, to have the best chance of a positive outcome?”
I would say the VDP is about what our ‘guiding principle’ or ‘target’ should be in order to have the best chance of solving the alignment problem. I used Christiano’s ‘intent alignment’ formulation but yours actually fits better with the VDP, I think.
I appreciate the summary, though the way you state the VDP isn’t quite the way I meant it.
what should our AI system <@try to do@>(@Clarifying “AI Alignment”@), to have the best chance of a positive outcome?
To me, this reads like, ‘we have a particular AI, what should we try to get it to do’, wheras I meant it as ‘what Value Definition should we be building our AI to pursue’. So, that’s why I stated it as ′ what should we aim to get our AI to want/target/decide/do’ or, to be consistent with your way of writing it ‘what should we try to get our AI system to do to have the best chance of a positive outcome’, not ‘what should our AI system try to do to have the best chance of a positive outcome’. Aside from that minor terminological difference, that’s a good summary of what I was trying to say.
I fall more on the side of preferring indirect approaches, though by that I mean that we should delegate to future humans, as opposed to defining some particular value-finding mechanism into an AI system that eventually produces a definition of values.
I think your opinion is probably the majority opinion—my major point with the ‘scale of directness’ was to emphasize that our ‘particular value-finding mechanisms’ can have more or fewer degrees of freedom, since from a certain perspective ‘delegate everything to a simulation of future humans’ is also a ‘particular mechanism’ just with a lot more degrees of freedom, so even if you strongly favour indirect approaches you will still have to make some decisions about the nature of the delegation.
The original reason that I wrote this post was to get people to explicitly notice the point that we will probably have to do some philosophical labour ourselves at some point, and then I discovered Stuart Armstrong had already made a similar argument. I’m currently working on another post (also based on the same work at EA Hotel) with some more specific arguments about why we should construct a particular value-finding mechanism that doesn’t fix us to any particular normative ethical theory, but does fix us to an understanding of what values are—something I call a Coherent Extrapolated Framework (CEF). But again, Stuart Armstrong anticipated a lot (but not all!) of what I was going to say.
The scenario where every human gets an intent-aligned AGI, and each AGI learns their own particular values would be a case where each individual AGI is following something like ‘Distilled Human Preferences’, or possibly just ‘Ambitious Learned Value Function’ as its Value Definition, so a fairly Direct scenario. However, the overall outcome would be more towards the indirect end—because a multipolar world with lots of powerful Humans using AGIs and trying to compromise would (you anticipate) end up converging on our CEV, or Moral Truth, or something similar. I didn’t consider direct vs indirect in the context of multipolar scenarios like this (nor did Bostrom, I think) but it seems sufficient to just say that the individual AGIs use a fairly direct Value Definition while the outcome is indirect.
From one of the linked articles, Christiano talking about takeoff speeds:
I believe that before we have incredibly powerful AI, we will have AI which is merely very powerful. This won’t be enough to create 100% GDP growth, but it will be enough to lead to (say) 50% GDP growth. I think the likely gap between these events is years rather than months or decades.
In particular, this means that incredibly powerful AI will emerge in a world where crazy stuff is already happening (and probably everyone is already freaking out). If true, I think it’s an important fact about the strategic situation.
and in your post:
Still, the general strategy of “dealing with things as they come up” is much more viable under continuous takeoff. Therefore, if a continuous takeoff is more likely, we should focus our attention on questions which fundamentally can’t be solved as they come up.
I agree that the continuous/slow takeoff is more likely than fast takeoff, though I have low confidence in that belief (and in most of my beliefs about AGI timelines) but the world of a continuous/slow takeoff, badly managed still seems like an extreme danger and a case where it would be too late to deal with many problems in e.g. the same year that they arise. Are you imagining something like this?
Summary of my response: chimps are nearly useless because they aren’t optimized to be useful, not because evolution was trying to make something useful and wasn’t able to succeed until it got to humans.
So far as I can tell, the best one-line summary for why we should expect a continuous and not a fast takeoff comes from the interview Paul Christiano gave on the 80k podcast: ‘I think if you optimize AI systems for reasoning, it appears much, much earlier.’ Which is to say, the equivalent of the ‘chimp’ milestone on the road to human-level AI does not have approximately the economic utility of a chimp, but a decent fraction of the utility of something that is ‘human-level’. This strikes me as an important argument that he’s repeated here, and discussed here last april but other than that it seems to have gone largely unnoticed and I’m wondering why.
I have a theory about why this didn’t get discussed earlier—there is a much more famous bad argument against AGI being an existential risk, the ‘intelligence isn’t a superpower’ argument that sounds similar. From Chollet vs Yudkowsky:
Intelligence is not a superpower; exceptional intelligence does not, on its own, confer you with proportionally exceptional power over your circumstances.
…said the Homo sapiens, surrounded by countless powerful artifacts whose abilities, let alone mechanisms, would be utterly incomprehensible to the organisms of any less intelligent Earthly species.
I worry that in arguing against the claim that general intelligence isn’t a meaningful concept or can’t be used to compare different animals, some people have been implicitly assuming that evolution has been putting a decent amount of effort into optimizing for general intelligence. Alternatively, that arguing for one sounds like another, or that a lot of people have been arguing for both together and haven’t distinguished between them.
Claiming that you can meaningfully compare evolved minds on the generality of their intelligence needs to be distinguished from claiming that evolution has been optimizing for general intelligence reasonably hard before humans came about.
MIRI thinks that the fact Evolution hasn’t been putting much effort into optimizing for general intelligence is a reason to expect discontinuous progress? Apparently, Paul’s point is that once we realize evolution has been putting little effort into optimizing for general intelligence, we realize we can’t tell much about the likely course of AGI development from evolutionary history, which leaves us in the default position of ignorance. Then, he further argues that the default case is that progress is continuous.
So far as I can tell, Paul’s point is that absent specific reasons to think otherwise, the prima facie case that any time we are trying hard to optimize for some criteria, we should expect the ‘many small changes that add up to one big effect’ situation.
Then he goes on to argue that the specific arguments that AGI is a rare case where this isn’t true (like nuclear weapons) are either wrong or aren’t strong enough to make discontinuous progress plausible.
From what you just wrote, it seems like the folks at MIRI agree that we should have the prima facie expectation of continuous progress, and I’ve read elsewhere that Eliezer thinks the case for recursive self-improvement leading to a discontinuity is weaker or less central than it first seemed. So, are MIRI’s main reasons for disagreeing with Paul down to other arguments (hence the switch from the intelligence explosion hypothesis to the general idea of rapid capability gain)?
I would think the most likely place to disagree with Paul (if not on the intelligence explosion hypothesis) would be if you expected the right combination of breakthroughs exceeds to a ‘generality threshold’ (or ‘secret sauce’ as Paul calls it) that leads to a big jump in capability, but inadequate achievement on any one of the breakthroughs won’t do.
Stuart Russell gives a list of the elements he thinks will be necessary for the ‘secret sauce’ of general intelligence in Human Compatible: human-like language comprehension, cumulative learning, discovering new action sets and managing its own mental activity. (I would add that somebody making that list 30 years ago would have added perception and object recognition, and somebody making it 60 years ago would have also added efficient logical reasoning from known facts). Let’s go with Russell’s list, so we can be a bit more concrete. Perhaps this is your disagreement:
An AI with (e.g.) good perception and object recognition, language comprehension, cumulative learning capability and ability to discover new action sets but a merely adequate or bad ability to manage its mental activity would be (Paul thinks) reasonably capable compared to an AI that is good at all of these things, but (MIRI thinks) it would be much less capable. MIRI has conceptual arguments (to do with the nature of general intelligence) and empirical arguments (comparing human/chimp brains and pragmatic capabilities) in favour of this hypothesis, and Paul thinks the conceptual arguments are too murky and unclear to be persuasive and that the empirical arguments don’t show what MIRI thinks they show. Am I on the right track here?
- Will AI undergo discontinuous progress? by 21 Feb 2020 22:16 UTC; 26 points) (
- 22 Feb 2020 10:09 UTC; 1 point) 's comment on Will AI undergo discontinuous progress? by (
The biggest disagreement between me and more pessimistic researchers is that I think gradual takeoff is much more likely than discontinuous takeoff (and in fact, the first, third and fourth paragraphs above are quite weak if there’s a discontinuous takeoff).
It’s been argued before that Continuous is not the same as Slow by any normal standard, so the strategy of ‘dealing with things as they come up’, while more viable under a continuous scenario, will probably not be sufficient.
It seems to me like you’re assuming longtermists are very likely not required at all in a case where progress is continuous. I take continuous to just mean that we’re in a world where there won’t be sudden jumps in capability, or apparently useless systems suddenly crossing some threshold and becoming superintelligent, not where progress is slow or easy to reverse. We could still pick a completely wrong approach that makes alignment much more difficult and set ourselves on a likely path towards disaster, even if the following is true:
So far as I can tell, the best one-line summary for why we should expect a continuous and not a fast takeoff comes from the interview Paul Christiano gave on the 80k podcast: ‘I think if you optimize AI systems for reasoning, it appears much, much earlier.’
So far as I can tell, Paul’s point is that absent specific reasons to think otherwise, the prima facie case that any time we are trying hard to optimize for some criteria, we should expect the ‘many small changes that add up to one big effect’ situation.
Then he goes on to argue that the specific arguments that AGI is a rare case where this isn’t true (like nuclear weapons) are either wrong or aren’t strong enough to make discontinuous progress plausible.
In a world where continuous but moderately fast takeoff is likely, I can easily imagine doom scenarios that would require long term strategy or conceptual research early on to avoid, even if none of them involve FOOM. Imagine that the accepted standard for aligned AI is follows some particular research agenda, like Cooperative Inverse Reinforcement Learning, but it turns out that CIRL starts to behave pathologically and tries to wirehead itself as it gets more and more capable, and that its a fairly deep flaw that we can only patch and not avoid.
Let’s say that over the course of a couple of years failures of CIRL systems start to appear and compound very rapidly until they constitute an Existential disaster. Maybe people realize what’s going on, but by then it would be too late, because the right approach would have been to try some other approach to AI alignment but the research to do that doesn’t exist and can’t be done anywhere near fast enough. Like Paul Christiano’s what failure looks like
First off, if we have specific evidence (an answer to Objection 2) then the historical analogy in Objection 1 looks a lot weaker, as any real evidence of WFLL1 arising now would suggest that the historical cases of other algorithms that gave pathological results just aren’t representative. I think they aren’t representative.
(I think the discontinuity-based arguments largely do make the “this time is different” case, roughly because general intelligence seems clearly game-changing. WFLL2 seems somewhere in between these, and I’m unsure where my beliefs fall on that.)
The key difference ‘this time’ (before we get anywhere near WFLL2 or AGI), as I see it, is that those early algorithms give recommendations to people that they could implement or avoid, so the ‘exploratory phase’ where we poked around to find out what they were capable of was pretty much risk-free, while WFLL1 implies that the systems have some degree of autonomy and actually have a chance to do unexpected things without humans realizing straight away. Danzig’s linear optimization leading to a catastrophe would have required more carelessness and stupidity than (current or very near-future) deep RL, because deep RL’s mistakes are subtler and because it has to be loosed on some environment to achieve results and give us useful information on its behaviour. As for evidence, Stuart Russell thinks that we are already seeing WFLL1 in social media ad algorithms:
“Consider the so-called ‘filter bubble’ of social media. The reinforcement learning algorithm is trying to maximize click throughs. From the view of the human, the purpose of the machine is to maximize clickthroughs. But from the view of the machine, it is changing the state of the world to maximize clicks. It is changing you to make you more predictable. A raving fascist or communist is more predictable and will lap up raving content. The machines can change our mind about our objective function so we are easier to satisfy. Advertisers have done this for decades.” [I argued with him about this feedback loop, and Yann Le Cun says this changed at Facebook a while ago]
“The reinforcement learning algorithm in social media has destroyed the EU, NATO and democracy. And that’s just 50 lines of code.”I wonder if this hypothesis was in Paul’s mind when he wrote the essay. If Russell is right about any of this that suggests that one of the first times we gave deep RL any ability to influence the world it succumbed to a failure scenario almost immediately. That’s not a good track record.
Eyeballing the graphs you produced, it looks like the singularities you keep getting are hyperbolic growth, which we already have in real life (compare log(world GDP) to your graph of log(projects completed) - their shapes are almost identical).
So far as I can tell, what you’ve shown is that you almost always get a big speedup of hyperbolic growth as AI advances but without discontinuities, which is what the ‘continuous takeoff’ people like Christiano already say they are expecting.
AI is just another, faster step in the hyperbolic growth we are currently experiencing, which corresponds to a further increase in rate but not a discontinuity (or even a discontinuity in rate).
So perhaps this is evidence of continuous takeoff still being quite fast.
Will AI undergo discontinuous progress?
I’m glad this changed someone’s mind about the connection between old/new views! The links in the text are references, and links before quotes go to the location of that quote—though there should be more and I’ll add more.
To clarify that section in particular, evolution is always optimizing for fitness (tautologically) but what specific traits evolution is promoting change all the time as selection pressures shift. What Paul Christiano argued is that evolution basically was not trying to make general intelligence until very recently, and that as soon as it did try it made continuous progress.
If we compare humans and chimps at the tasks chimps are optimized for, humans are clearly much better but the difference is not nearly as stark. Compare to the difference between chimps and gibbons, gibbons and lemurs, or lemurs and squirrels.
Relatedly, evolution changes what it is optimizing for over evolutionary time: as a creature and its environment change, the returns to different skills can change, and they can potentially change very quickly. So it seems easy for evolution to shift from “not caring about X” to “caring about X,” but nothing analogous will happen for AI projects. (In fact a similar thing often does happen while optimizing something with SGD, but it doesn’t happen at the level of the ML community as a whole.)
That argument was the one thing I researched that was most surprising to me, and I’m not sure why it hasn’t been more commonly discussed.
This is something I mentioned in the last section—if there is a significant lead time (on the order of years), then it is still totally possible for a superintelligence to appear out of nowhere and surprise everyone on the continuous progress model. The difference is that on discontinuous progress that outcome is essentially guaranteed.
This is an issue I referenced in the intro, though I did kind of skip past it. What I would say is that continuous/discontinuous is a high-level and vague description of the territory—is what is happening a continuation of already existing trends? Since that’s how we define it, it makes much more sense as a way to think about predictions, than a way to understand the past.
One way of knowing if progress was discontinuous is to actually look at the inner workings of the AGI during the takeoff. If this
some systems “fizzle out” when they try to design a better AI, generating a few improvements before running out of steam, while others are able to autonomously generate more and more improvements
is what, in fact, happens as you try to build better and better AI then we have a discontinuity, so we have discontinuous progress.
In your scenario, the fact that we went from a world like now to a godlike superintelligence swallowing up the whole Earth with tiny self-replicating bots feeding on sunlight or something means the progress was discontinuous, because it meant that quote I gave above was probably a correct description of reality.
If there was some acceleration of progress that then blew up—like, near-human systems that could automate most tasks suddenly started showing up over a year or two and getting scarily smart over a couple of weeks before the end, and then all of a sudden a godlike superintelligence annihilates the Earth and starts flinging von neumman probes to other stars, then… maybe progress was continuous? It would depend on more detailed facts (not facts about if the AGI halted to do garbage collection, but facts about the dynamics of its capability gain). Continuous/discontinous and fast/slow are two (not entirely independent) axes you could use to describe various AI takeoff trajectories—a qualitative description.
There is an additional wrinkle in that what you call continuous might depend on your own reading of historical trends—are we on hyperbolic growth or not? Here’s Scott Alexander:
In other words, the singularity got cancelled because we no longer have a surefire way to convert money into researchers. The old way was more money = more food = more population = more researchers. The new way is just more money = send more people to college, and screw all that.
But AI potentially offers a way to convert money into researchers. Money = build more AIs = more research.
If this were true, then once AI comes around – even if it isn’t much smarter than humans – then as long as the computational power you can invest into researching a given field increases with the amount of money you have, hyperbolic growth is back on. Faster growth rates means more money means more AIs researching new technology means even faster growth rates, and so on to infinity.
Presumably you would eventually hit some other bottleneck, but things could get very strange before that happens.
If he is right about that, then ‘return to hyperbolic growth’ looks like part of an already existing trend, otherwise not so much.
Not that it’s an essential part of any particular argument, but my understanding was that literal grey goo (independently operating nanomachines breaking down inert matter and converting the whole Earth’s mass in a matter of hours) is probably ruled out by the laws of thermodynamics, because there is no nanoscale way to dissipate heat or generate enough energy to power transformations millions of times faster than biological processes. It also seems like nanomachines would be very vulnerable to heat or radiation because of the square-cube law.
However, less extreme replicators are clearly physically possible because cell division and ribosomes exist. The fact that a literal grey goo scenario is probably ruled out by basic physics does not imply that the ultimate limits for non-biological replicators are close to those for biological replication (which are themselves pretty impressive). Assuming that all small-scale replicators can’t go much faster than Bamboo without a specific reason would be the harmless supernova fallacy. For a scenario that isn’t close to grey goo, but is still much scarier than anything biology can do, see e.g. this.
This is something I mentioned in the last section—if there is a significant lead time (on the order of years), then it is still totally possible for a superintelligence to appear out of nowhere and surprise everyone, even given the continuous progress model. The difference is that with discontinuous progress that outcome is essentially guaranteed, so discontinuities are informative because they give us good evidence about what takeoff speeds are possible.
Like you say, if there are no strong discontinuities we might expect lots of companies to start working hard on AIs with capability enhancement/recursive improvement, but the first AI with anything like those abilities will be the one made the quickest, so likely isn’t very good at self-improvement and gets poor returns on optimization, and the next one that comes out is a little better (I didn’t discuss the notion of Recalcitrance in Bostrom’s work, but we could model this setup as each new self-improving AI design having a shallower and shallower Recalcitrance curve), making progress continuous even with rapid capability gain. Again, if that’s not going to happen then it will be either because one project goes quiet while it gets a few steps ahead of the competition, or because there is a threshold below which improvements ‘fizzle out’ and don’t generate returns, but adding one extra component takes you over such a threshold and returns on investment explode, which takes you to the conceptual question of whether intelligence has such a threshold built in.
The sad/good thing is that this article represents progress. I recall that in Human Compatible Stuart Russell said that there was a joint declaration from some ML researchers that AGI is completely impossible, and its clear from this article that Oren is at least thinking about it as a real possibility that isn’t hundreds of years away. Automatically forming learning problems sounds a lot like automatically discovering actions, which is something Stuart Russell also mentioned in a list of necessary breakthroughs to reach AGI, so maybe there’s some widespread agreement about what is still missing.
That aside, even by some of Oren’s own metrics, we’ve made quite substantial progress—he mentions the Winograd schemas as a good test of when we’re approaching human-like language understanding and common sense, but what he may not know is that GPT-2 actually bridged a significant fraction of the gap on Winograd schema performance between the best existing language models and humans, which is a good object lesson in how the speed of progress can surprise you—from 63% to 71%, with humans at about 92% accuracy according to deepmind.
It’s always nice to take in a bit of Joy in the Merely Real. I have sometimes found it a useful exercise to consider just which things would or would not be shocking to people in the past. For example, anyone from before 1500 would be utterly shocked and mystified by any page of printed text or any piece of clothing from 1800 or after.