My main objection is that securing positive outcomes doesn’t seem to inherently require solving hard philosophical problems (in your sense). It might in principle, but I don’t see how we can come to be confident about it or even why it should be much more likely than not. I also remain unconvinced about the conceptual difficulty and fundamental nature of the problems, and don’t understand the cause for confidence on those counts either.
To make things more concrete: could you provide a hard philosophical problem (of the kind for which feedback is impossible) together with an argument that this problem must be resolved before human-level AGI arrives? What do you think is the strongest example?
To try to make my point clearer (though I think I’m repeating myself): we can aim to build machine intelligences which pursue the outcomes we would have pursued if we had thought longer (including machine intelligences that allow human owners to remain in control of the situation and make further choices going forward, or bootstrap to more robust solutions). There are questions about what formalization of “thought longer” we endorse, but of course we must face these with or without machine intelligence. For the most part, the questions involved in building such an AI are empirical though hard-to-test ones—would we agree that the AI basically followed our wishes, if we in fact thought longer?---and these don’t seem to be the kinds of questions that have proved challenging, and probably don’t even count as “philosophical” problems in the sense you are using the term.
I don’t think it’s clear or even likely that we necessarily have to resolve issues like metaethics, anthropics, the right formalization of logical uncertainty, decision theory, etc. prior to building human-level AI. No doubt having a better grasp of these issues is helpful for understanding our goals, and so it seems worth doing, but we can already see plausible ways to get around them.
In general, one reason that doing X probably doesn’t require impossible step Y is that there are typically many ways to accomplish X, and without a strong reason it is unlikely that they will all require solving Y. This view seems to be supported by a reasonable empirical record. A lot of things have turned out to be possible.
(Note: in case it’s not obvious, I disagree with Eliezer on many of these points.)
I suspect I also object to your degree of pessimism regarding philosophical claims, but I’m not sure and that is probably secondary at any rate.
It’s hard for me to argue with multiple people simultaneously. When I argue with someone I tend to adopt most of their assumptions in order to focus on what I think is the core disagreement, so to argue with someone else I have to “swap in” a different set of assumptions and related arguments. The OP was aimed mostly at Eliezer, so it assumed that intelligence explosion is relatively easy. (Would you agree that if intelligence explosion was easy, then it would be hard to achieve a good outcome in the way that you imagine, by incrementally solving “the AI control problem”?)
If we instead assume that intelligence explosion isn’t so easy, then I think the main problem we face is value drift and Malthusian outcomes caused by competitive evolution (made worse by brain emulations and AGIs that can be easily copied), which can only be prevented by building a singleton. (A secondary consideration involves other existential risks related to technological progress, such as physics/nanotech/biotech disasters.) I don’t think humanity as a whole is sufficiently strategic to solve this problem before it’s too late (meaning a lot of value drift has already occurred or building a singleton becomes impossible due to space colonization). I think the fact that you are much more optimistic about this accounts for much of our disagreement on overall strategy, and I wonder if you can explain why. I don’t mean to put the burden of proof on you, but perhaps you have some ready explanation at hand?
I don’t think that fast intelligence explosion ---> you have to solve the kind of hard philosophical problems that you are alluding to. You seem to grant that there are no particular hard philosophical problems we’ll have to solve, but you think that nevertheless every approach to the problem will require solving such problems. Is it easy to state why you expect this? Is it because approaches we can imagine in detail today involve solving hard problems?
Regarding the hardness of defining “remain in control,” it is not the case that you need to be able to define X formally in order to accomplish X. Again, perhaps such approaches require solving hard philosophical problems, but I don’t see why you would be confident (either about this particular approach or more broadly). Regarding my claim that we need to figure this out anyway, I mean that we need to implicitly accept some process of reflection and self-modification as we go on reflecting and self-modifying.
I don’t see why a singleton is necessary to avert value drift in any case; they seem mostly orthogonal. Is there a simple argument here? See e.g. Carl’s post on this and mine. I agree there is a problem to be solved, but it seems to involve faithfully transmitting hard-to-codify values (again, perhaps implicitly).
I’ll just respond to part of your comment since I’m busy today. I’ll respond to the rest later or when we meet.
I don’t see why a singleton is necessary to avert value drift in any case; they seem mostly orthogonal. Is there a simple argument here?
Not sure if this argument is original to me. I may have read it from Nick Bostrom or someone else. When I said “value drift” I meant value drift of humanity in aggregate, not necessarily value drift of any individual. Different people have values that have different levels of difficulty of transmitting. Or some just think that their values are easy to transmit, for example those who think they should turn the universe into hedonium, or should maximize “complexity”. Competitive evolution will favor (in the sense of maximizing descendents/creations of) such people since they can take advantage of new AGI or other progress more quickly than those who think their values are harder to transmit.
I think there’s an additional argument that says people who have shorter planning horizons will take advantage of new AGI progress more quickly because they don’t particularly mind not transmitting their values into the far future, but just care about short term benefits like gaining academic fame.
Yes, if it is impossible to remain in control of AIs then you will have value drift, and yes a singleton can help with this in the same way they can help with any technological risk, namely by blocking adoption of the offending technology. So I concede they aren’t completely orthogonal, in the sense that any risk of progress can be better addressed by a singleton + slow progress. (This argument is structurally identical to the argument for danger from biology progress, physics progress, or even early developments in conventional explosives.) But this is a very far cry from “can only be prevented by building a singleton.”
To restate how the situation seems to me: you say “the problems are so hard that any attempt to solve them is obviously doomed,” and I am asking for some indication that this is the case besides intuition and a small number of not-very-representation-examples, which seems unlikely to yield a very confident solution. Eliezer makes a similar claim, with you two disagreeing about how likely Eliezer is to solve the problems but not about how likely the problems are to get solved by people who aren’t Eliezer. I don’t understand either of your arguments too well; it seems like both of you are correct to disagree with the mainstream by identifying a problem and noticing that it may be an unusually challenging one, but I don’t see why either of you is so confident.
To isolate a concrete disagreement, if there was an intervention that sped up the onset of serious AI safety work twice as much as it sped up the arrival of AI, I would tentatively consider that a positive (and if it sped up the onset of serious AI work ten times as much as it sped up the arrival of AI it would seem like a clear win; I previously argued that 1.1x as much would also be a big win, but Carl convinced me to increase the cutoff with a very short discussion). You seem to be saying that you would consider it a loss at any ratio, because speeding up the arrival of AI is so much worse than speeding up the onset of serious thought about AI safety, because it is so confidently doomed.
Yes, if it is impossible to remain in control of AIs then you will have value drift
Wait, that’s not my argument. I was saying that while people like you are trying to develop technologies that let you “remain in control”, others with shorter planning horizons or think they have simple, easy to transmit values will already be deploying new AGI capabilities, so you’ll fall behind with every new development. This is what I’m suggesting only a singleton can prevent.
You could try to minimize this kind of value drift by speeding up “AI control” progress but it’s really hard for me to see how you can speed it up enough to not lose a competitive race with those who do not see a need to solve this problem, or think they can solve a much easier problem. The way I model AGI development in a slow-FOOM scenario is that AGI capability will come in spurts along with changing architectures, and it’s hard to do AI safety work “ahead of time” because of dependencies on AI architecture. So each time there is a big AGI capability development, you’ll be forced to spend time to develop new AI safety tech for that capability/architecture, while others will not wait to deploy it. Even a small delay can lead to a large loss since AIs can be easily copied and more capable but uncontrolled AIs would quickly take over economic niches occupied by existing humans and controlled AIs. Even assuming secure rights for what you already own on Earth, your share of the future universe will become smaller and smaller as most of the world’s new wealth goes to uncontrolled AIs or AIs with simple values.
Where do you see me going wrong here? If you think I’m just too confident in this model, what alternative scenario can you suggest, where people like you and I (or our values) get to keep a large share of the future universe just by speeding up the onset of serious AI safety work?
could you provide a hard philosophical problem (of the kind for which feedback is impossible) together with an argument that this problem must be resolved before human-level AGI arrives?
I can’t provide a single example because it depends on the FAI design. I think multiple design approaches are possible but each involves its own hard philosophical problems.
To try to make my point clearer (though I think I’m repeating myself): we can aim to build machine intelligences which pursue the outcomes we would have pursued if we had thought longer (including machine intelligences that allow human owners to remain in control of the situation and make further choices going forward, or bootstrap to more robust solutions). There are questions about what formalization of “thought longer” we endorse, but of course we must face these with or without machine intelligence.
At least one hard problem here is, at you point out, how to formalize “thought longer”, or perhaps “remain in control”. Obviously an AGI will inevitably influence the options we have and the choices we end up making, so what does “remain in control” mean? I don’t understand your last point here, that “we must face these with or without machine intelligence”. If people weren’t trying to build AGI and thereby forcing us to solve these kinds of problems before they succeed, we’d have much more time to work on them and hence a much better chance of getting the answers right.
For the most part, the questions involved in building such an AI are empirical though hard-to-test ones—would we agree that the AI basically followed our wishes, if we in fact thought longer?---and these don’t seem to be the kinds of questions that have proved challenging, and probably don’t even count as “philosophical” problems in the sense you are using the term.
If we look at other empirical though hard-to-test questions (e.g., what security holes exist in this program) I don’t see much reason to be optimistic either. What examples are you thinking of, that makes you say “these don’t seem to be the kinds of questions that have proved challenging”?
I suspect I also object to your degree of pessimism regarding philosophical claims, but I’m not sure and that is probably secondary at any rate.
I’m suspecting that even the disagreement we’re current discussing isn’t the most important one between us, and I’m still trying to figure out how to express what I think may be the most important disagreement. Since we’ll be meeting soon for the decision theory workshop, maybe we’ll get a chance to talk about it in person.
My main objection is that securing positive outcomes doesn’t seem to inherently require solving hard philosophical problems (in your sense). It might in principle, but I don’t see how we can come to be confident about it or even why it should be much more likely than not. I also remain unconvinced about the conceptual difficulty and fundamental nature of the problems, and don’t understand the cause for confidence on those counts either.
To make things more concrete: could you provide a hard philosophical problem (of the kind for which feedback is impossible) together with an argument that this problem must be resolved before human-level AGI arrives? What do you think is the strongest example?
To try to make my point clearer (though I think I’m repeating myself): we can aim to build machine intelligences which pursue the outcomes we would have pursued if we had thought longer (including machine intelligences that allow human owners to remain in control of the situation and make further choices going forward, or bootstrap to more robust solutions). There are questions about what formalization of “thought longer” we endorse, but of course we must face these with or without machine intelligence. For the most part, the questions involved in building such an AI are empirical though hard-to-test ones—would we agree that the AI basically followed our wishes, if we in fact thought longer?---and these don’t seem to be the kinds of questions that have proved challenging, and probably don’t even count as “philosophical” problems in the sense you are using the term.
I don’t think it’s clear or even likely that we necessarily have to resolve issues like metaethics, anthropics, the right formalization of logical uncertainty, decision theory, etc. prior to building human-level AI. No doubt having a better grasp of these issues is helpful for understanding our goals, and so it seems worth doing, but we can already see plausible ways to get around them.
In general, one reason that doing X probably doesn’t require impossible step Y is that there are typically many ways to accomplish X, and without a strong reason it is unlikely that they will all require solving Y. This view seems to be supported by a reasonable empirical record. A lot of things have turned out to be possible.
(Note: in case it’s not obvious, I disagree with Eliezer on many of these points.)
I suspect I also object to your degree of pessimism regarding philosophical claims, but I’m not sure and that is probably secondary at any rate.
It’s hard for me to argue with multiple people simultaneously. When I argue with someone I tend to adopt most of their assumptions in order to focus on what I think is the core disagreement, so to argue with someone else I have to “swap in” a different set of assumptions and related arguments. The OP was aimed mostly at Eliezer, so it assumed that intelligence explosion is relatively easy. (Would you agree that if intelligence explosion was easy, then it would be hard to achieve a good outcome in the way that you imagine, by incrementally solving “the AI control problem”?)
If we instead assume that intelligence explosion isn’t so easy, then I think the main problem we face is value drift and Malthusian outcomes caused by competitive evolution (made worse by brain emulations and AGIs that can be easily copied), which can only be prevented by building a singleton. (A secondary consideration involves other existential risks related to technological progress, such as physics/nanotech/biotech disasters.) I don’t think humanity as a whole is sufficiently strategic to solve this problem before it’s too late (meaning a lot of value drift has already occurred or building a singleton becomes impossible due to space colonization). I think the fact that you are much more optimistic about this accounts for much of our disagreement on overall strategy, and I wonder if you can explain why. I don’t mean to put the burden of proof on you, but perhaps you have some ready explanation at hand?
I don’t think that fast intelligence explosion ---> you have to solve the kind of hard philosophical problems that you are alluding to. You seem to grant that there are no particular hard philosophical problems we’ll have to solve, but you think that nevertheless every approach to the problem will require solving such problems. Is it easy to state why you expect this? Is it because approaches we can imagine in detail today involve solving hard problems?
Regarding the hardness of defining “remain in control,” it is not the case that you need to be able to define X formally in order to accomplish X. Again, perhaps such approaches require solving hard philosophical problems, but I don’t see why you would be confident (either about this particular approach or more broadly). Regarding my claim that we need to figure this out anyway, I mean that we need to implicitly accept some process of reflection and self-modification as we go on reflecting and self-modifying.
I don’t see why a singleton is necessary to avert value drift in any case; they seem mostly orthogonal. Is there a simple argument here? See e.g. Carl’s post on this and mine. I agree there is a problem to be solved, but it seems to involve faithfully transmitting hard-to-codify values (again, perhaps implicitly).
I’ll just respond to part of your comment since I’m busy today. I’ll respond to the rest later or when we meet.
Not sure if this argument is original to me. I may have read it from Nick Bostrom or someone else. When I said “value drift” I meant value drift of humanity in aggregate, not necessarily value drift of any individual. Different people have values that have different levels of difficulty of transmitting. Or some just think that their values are easy to transmit, for example those who think they should turn the universe into hedonium, or should maximize “complexity”. Competitive evolution will favor (in the sense of maximizing descendents/creations of) such people since they can take advantage of new AGI or other progress more quickly than those who think their values are harder to transmit.
I think there’s an additional argument that says people who have shorter planning horizons will take advantage of new AGI progress more quickly because they don’t particularly mind not transmitting their values into the far future, but just care about short term benefits like gaining academic fame.
Yes, if it is impossible to remain in control of AIs then you will have value drift, and yes a singleton can help with this in the same way they can help with any technological risk, namely by blocking adoption of the offending technology. So I concede they aren’t completely orthogonal, in the sense that any risk of progress can be better addressed by a singleton + slow progress. (This argument is structurally identical to the argument for danger from biology progress, physics progress, or even early developments in conventional explosives.) But this is a very far cry from “can only be prevented by building a singleton.”
To restate how the situation seems to me: you say “the problems are so hard that any attempt to solve them is obviously doomed,” and I am asking for some indication that this is the case besides intuition and a small number of not-very-representation-examples, which seems unlikely to yield a very confident solution. Eliezer makes a similar claim, with you two disagreeing about how likely Eliezer is to solve the problems but not about how likely the problems are to get solved by people who aren’t Eliezer. I don’t understand either of your arguments too well; it seems like both of you are correct to disagree with the mainstream by identifying a problem and noticing that it may be an unusually challenging one, but I don’t see why either of you is so confident.
To isolate a concrete disagreement, if there was an intervention that sped up the onset of serious AI safety work twice as much as it sped up the arrival of AI, I would tentatively consider that a positive (and if it sped up the onset of serious AI work ten times as much as it sped up the arrival of AI it would seem like a clear win; I previously argued that 1.1x as much would also be a big win, but Carl convinced me to increase the cutoff with a very short discussion). You seem to be saying that you would consider it a loss at any ratio, because speeding up the arrival of AI is so much worse than speeding up the onset of serious thought about AI safety, because it is so confidently doomed.
Wait, that’s not my argument. I was saying that while people like you are trying to develop technologies that let you “remain in control”, others with shorter planning horizons or think they have simple, easy to transmit values will already be deploying new AGI capabilities, so you’ll fall behind with every new development. This is what I’m suggesting only a singleton can prevent.
You could try to minimize this kind of value drift by speeding up “AI control” progress but it’s really hard for me to see how you can speed it up enough to not lose a competitive race with those who do not see a need to solve this problem, or think they can solve a much easier problem. The way I model AGI development in a slow-FOOM scenario is that AGI capability will come in spurts along with changing architectures, and it’s hard to do AI safety work “ahead of time” because of dependencies on AI architecture. So each time there is a big AGI capability development, you’ll be forced to spend time to develop new AI safety tech for that capability/architecture, while others will not wait to deploy it. Even a small delay can lead to a large loss since AIs can be easily copied and more capable but uncontrolled AIs would quickly take over economic niches occupied by existing humans and controlled AIs. Even assuming secure rights for what you already own on Earth, your share of the future universe will become smaller and smaller as most of the world’s new wealth goes to uncontrolled AIs or AIs with simple values.
Where do you see me going wrong here? If you think I’m just too confident in this model, what alternative scenario can you suggest, where people like you and I (or our values) get to keep a large share of the future universe just by speeding up the onset of serious AI safety work?
Did you talk about this at the recent workshop? If you’re willing to share publicly, I’d be curious about the outcome of this discussion.
A singleton (even if it is a world government) is argued to be a good thing for humanity by Bostrom here and here
I can’t provide a single example because it depends on the FAI design. I think multiple design approaches are possible but each involves its own hard philosophical problems.
At least one hard problem here is, at you point out, how to formalize “thought longer”, or perhaps “remain in control”. Obviously an AGI will inevitably influence the options we have and the choices we end up making, so what does “remain in control” mean? I don’t understand your last point here, that “we must face these with or without machine intelligence”. If people weren’t trying to build AGI and thereby forcing us to solve these kinds of problems before they succeed, we’d have much more time to work on them and hence a much better chance of getting the answers right.
If we look at other empirical though hard-to-test questions (e.g., what security holes exist in this program) I don’t see much reason to be optimistic either. What examples are you thinking of, that makes you say “these don’t seem to be the kinds of questions that have proved challenging”?
I’m suspecting that even the disagreement we’re current discussing isn’t the most important one between us, and I’m still trying to figure out how to express what I think may be the most important disagreement. Since we’ll be meeting soon for the decision theory workshop, maybe we’ll get a chance to talk about it in person.
If you get anywhere, please share your conclusions here.