This question is not directed at anyone in particular, but I’d want to hear some alignment researchers answer it. As a rough guess, how much would it affect your research—in the sense of changing your priorities, or altering your strategy of impact, and method of attack on the problem—if you made any of the following epistemic updates?
(Feel free to disambiguate anything here that’s ambiguous or poorly worded.)
You update to think that AI takeoff will happen twice as slowly as your current best-estimate. e.g. instead of the peak-rate of yearly GWP growth being 50% during the singularity, you learn it’s only going to be 25%. Alternatively, update to think that AI takeoff will happen twice as quickly, e.g. the peak-rate of GWP growth will be 100% rather than 50%.
You learn that transformative AI will arrive in half the time you currently think it will take, from your current median, e.g. in 15 years rather than 30. Alternatively, you learn that transformative AI will arrive in twice the time you currently think it will take.
You learn that power differentials will be twice as imbalanced during AI takeoff compared to your current median. That is, you learn that if we could measure the relative levels of “power” for agents in the world, the gini coefficient for this power distribution will be twice as unequal than your current median scenario, in the sense that world dynamics will look more unipolar than multipolar; local, rather than global. Alternatively, you learn the opposite.
You learn that the cost of misalignment is half as much as you thought, in the sense that slightly misaligned AI software impose costs that are half as much (ethically, or economically), compared to what you used to think. Alternatively, you learn the opposite.
You learn that the economic cost of aligning an AI to your satisfaction is half as much as you thought, for a given AI, e.g. it will take a team of 20 full-time workers writing test-cases, as opposed to 40 full-time workers of equivalent pay. Alternatively, you learn the opposite.
You learn that the requisite amount of “intelligence” needed to discover a dangerous x-risk inducing technology is half as much as you thought, e.g. someone with an IQ of 300, as opposed to 600 (please interpret charitably) could by themselves figure out how to deploy full-scale molecular nanotechnology, of the type required to surreptitiously inject botulinum toxin into the bloodstreams of everyone, making us all fall over and die. Alternatively, you learn the opposite.
You learn that either government or society in general will be twice as risk-averse, in the sense of reacting twice as strongly to potential AI dangers, compared to what you currently think. Alternatively, you learn the opposite.
You learn that the AI paradigm used during the initial stages of the singularity—when the first AGIs are being created—will be twice as dissimilar from the current AI paradigm, compared to what you currently think. Alternatively, you learn the opposite.
In all cases, the real answer is “the actual impact will depend a ton on the underlying argument that led to the update; that argument will lead to tons of other updates across the board”.
I imagine that the spirit of the questions is that I don’t perform a Bayesian update and instead do more of a “causal intervention” on the relevant node and propagate downstream. In that case:
I’m confused by the question. If the peak rate of GWP growth ever is 25%, it seems like the singularity didn’t happen? Nonetheless, to the extent this question is about updates on the quality or duration of the singularity (as opposed to the leadup to it), I don’t think this affects my actions at all.
I’m often acting based on my 10%-timelines, so if you tell me that TAI comes at exactly the midway point between now and my current median, that can counterintuitively have the same effect as lengthening my timelines. (Also I start sketching out far more concrete plans given this implausibly precise knowledge of when TAI comes.) So I’m instead going to answer the question where we imagine that my entire probability distribution is compressed / stretched along the time axis by a factor of 2. If compressed (shorter timelines), probably not much changes; I care more about having influence on AGI actors but I’m already in a great place for that. If stretched (longer timelines), I maybe focus more on weirder alignment theory, e.g. perhaps I work at ARC.
Not much effect. In more unipolar worlds, I spend more time predicting which labs will develop AGI, so that I can be there at crunch time; in more multipolar worlds I spend less time doing that.
No effect. Averting 50% of an existential catastrophe is still really good.
Similar effects as (2), but with much smaller magnitude. (Lower cost ⇒ more focus on weird alignment theory, since influence becomes less useful.)
This one particularly feels like I would be making some big Bayesian update (e.g. I think Eliezer’s view predicts this much more strongly?) but if I ignore that then no effect.
Similar effects as (2), similar magnitudes as well. (More risk-averse ⇒ getting the right solution becomes more important than having influence.)
More dissimilar ⇒ more focus on weirder but more general work (e.g. less language models, more ELK). (This isn’t the same as 2, 5, and 7 because this doesn’t change how much I care about influence.)
I’m confused by the question. If the peak rate of GWP growth ever is 25%, it seems like the singularity didn’t happen?
I’m a little confused by your confusion. Let’s say you currently think the singularity will proceed at a rate of R. The spirit of what I’m asking is: what would you change if you learned that it will proceed at a rate of one half R. (Maybe plucking specific numbers about the peak-rate of growth just made things more confusing). For me at least, I’d probably expect a lot more oversight, as people have more time to adjust to what’s happening in the world around them.
No effect. Averting 50% of an existential catastrophe is still really good.
I’m also a little confused about this. My exact phrasing was, “You learn that the cost of misalignment is half as much as you thought, in the sense that slightly misaligned AI software impose costs that are half as much (ethically, or economically), compared to what you used to think.” I assume you don’t think that slightly misaligned software will, by default, cause extinction, especially if it’s acting alone and is economically or geographically isolated.
We could perhaps view this through an analogy. War is really bad: so bad that maybe it will even cause our extinction (if say, we have some really terrible nuclear winter). But by default, I don’t expect war to cause humanity to go extinct. And so, if someone asked me about a scenario in which the costs of war are only half as much as I thought, it would probably significantly update me away from thinking we need to take actions to prevent war. The magnitude of this update might not be large, but understanding exactly how much we’d update and change our strategy in light of this information is type of thing I’m asking for.
Let’s say you currently think the singularity will proceed at a rate of R.
What does this mean? On my understanding, singularities don’t proceed at fixed rates?
I agree that in practice there will be some maximum rate of GDP growth, because there are fundamental physical limits (and more tight in-practice limits that we don’t know), but it seems like they’ll be way higher than 25% per year. Or to put it a different way, at 25% max rate I think it stops deserving the term “singularity”, it seems like it takes decades and maybe centuries to reach technological maturity at that rate. (Which could totally happen! Maybe we will move very slowly and cautiously! I don’t particularly expect it though.)
If you actually mean halving the peak rate of GDP growth during the singularity, and a singularity actually happens, then I think it doesn’t affect my actions at all; all of the relevant stuff happened well before we get to the peak rate.
If you ask me to imagine “max rates at orders of magnitude where Rohin would say there was no singularity”, then I think I pivot my plan for impact into figuring out how exactly we are going to manage to coordinate to move slowly even when there’s tons of obvious value lying around, and then trying to use the same techniques to get tons and tons of oversight on the systems we build.
You learn that the cost of misalignment is half as much as you thought, in the sense that slightly misaligned AI software impose costs that are half as much (ethically, or economically), compared to what you used to think.
Hmm, I interpreted “cost of misalignment” as “expected cost of misalignment”, as the standard way to deal with probabilistic things, but it sounds like you want something else.
Let’s say for purposes of argument I think 10% chance of extinction, and 90% chance of “moderate costs but nothing terrible”. Which of the following am I supposed to have updated to?
5% extinction, 95% moderate costs
5% extinction, 45% moderate costs, 50% perfect world
I was imagining (4), but any of (1) - (3) would also not change my actions; it sounds like you want me to imagine (5) but in that case I just completely switch out of AI alignment and work on something else, but that’s because you moved p(extinction) from 10% to 0%, which is a wild update to have made and not what I would call “the cost of misalignment is half as much as I thought”.
I think you sufficiently addressed my confusion, so you don’t need to reply to this comment, but I still had a few responses to what you said.
What does this mean? On my understanding, singularities don’t proceed at fixed rates?
No, I agree. But growth is generally measured over an interval. In the original comment I proposed the interval of one year during the peak rate of economic growth. To allay your concern that a 25% growth rate indicates we didn’t experience a singularity, I meant that we were halving the growth rate during the peak economic growth year in our future, regardless of whether that rate was very fast.
I agree that in practice there will be some maximum rate of GDP growth, because there are fundamental physical limits (and more tight in-practice limits that we don’t know), but it seems like they’ll be way higher than 25% per year.
The 25% figure was totally arbitrary. I didn’t mean it as any sort of prediction. I agree that an extrapolation from biological growth implies that we can and should see >1000% growth rates eventually, though it seems plausible that we would coordinate to avoid that.
If you actually mean halving the peak rate of GDP growth during the singularity, and a singularity actually happens, then I think it doesn’t affect my actions at all; all of the relevant stuff happened well before we get to the peak rate.
That’s reasonable. A separate question might be about whether the rate of growth during the entire duration from now until the peak rate will cut in half.
Let’s say for purposes of argument I think 10% chance of extinction, and 90% chance of “moderate costs but nothing terrible”. Which of the following am I supposed to have updated to?
I think the way you’re bucketing this into “costs if we go extinct” and “costs if we don’t go extinct” is reasonable. But one could also think that the disvalue of extinction is more continuous with disvalue in non-extinction scenarios, which makes things a bit more tricky. I hope that makes sense.
But one could also think that the disvalue of extinction is more continuous with disvalue in non-extinction scenarios, which makes things a bit more tricky.
I’m happy to use continuous notions (and that’s what I was doing in my original comment) as long as “half the cost” means “you update such that the expected costs of misalignment according to your probability distribution over the future are halved”. One simple way to imagine this update is to take all the worlds where there was any misalignment, halve their probability, and distribute the extra probability mass to worlds with zero costs of misalignment. At which point I reason “well, 10% extinction changes to 5% extinction, I don’t need to know anything else to know that I’m still going to work on alignment, and given that, none of my actions are going to change (since the relative probabilities of different misalignment failure scenarios remain the same, which is what determines my actions within alignment)”.
I got the sense from your previous comment that you wanted me to imagine some different form of update and I was trying to figure out what.
This question is not directed at anyone in particular, but I’d want to hear some alignment researchers answer it. As a rough guess, how much would it affect your research—in the sense of changing your priorities, or altering your strategy of impact, and method of attack on the problem—if you made any of the following epistemic updates?
(Feel free to disambiguate anything here that’s ambiguous or poorly worded.)
You update to think that AI takeoff will happen twice as slowly as your current best-estimate. e.g. instead of the peak-rate of yearly GWP growth being 50% during the singularity, you learn it’s only going to be 25%. Alternatively, update to think that AI takeoff will happen twice as quickly, e.g. the peak-rate of GWP growth will be 100% rather than 50%.
You learn that transformative AI will arrive in half the time you currently think it will take, from your current median, e.g. in 15 years rather than 30. Alternatively, you learn that transformative AI will arrive in twice the time you currently think it will take.
You learn that power differentials will be twice as imbalanced during AI takeoff compared to your current median. That is, you learn that if we could measure the relative levels of “power” for agents in the world, the gini coefficient for this power distribution will be twice as unequal than your current median scenario, in the sense that world dynamics will look more unipolar than multipolar; local, rather than global. Alternatively, you learn the opposite.
You learn that the cost of misalignment is half as much as you thought, in the sense that slightly misaligned AI software impose costs that are half as much (ethically, or economically), compared to what you used to think. Alternatively, you learn the opposite.
You learn that the economic cost of aligning an AI to your satisfaction is half as much as you thought, for a given AI, e.g. it will take a team of 20 full-time workers writing test-cases, as opposed to 40 full-time workers of equivalent pay. Alternatively, you learn the opposite.
You learn that the requisite amount of “intelligence” needed to discover a dangerous x-risk inducing technology is half as much as you thought, e.g. someone with an IQ of 300, as opposed to 600 (please interpret charitably) could by themselves figure out how to deploy full-scale molecular nanotechnology, of the type required to surreptitiously inject botulinum toxin into the bloodstreams of everyone, making us all fall over and die. Alternatively, you learn the opposite.
You learn that either government or society in general will be twice as risk-averse, in the sense of reacting twice as strongly to potential AI dangers, compared to what you currently think. Alternatively, you learn the opposite.
You learn that the AI paradigm used during the initial stages of the singularity—when the first AGIs are being created—will be twice as dissimilar from the current AI paradigm, compared to what you currently think. Alternatively, you learn the opposite.
In all cases, the real answer is “the actual impact will depend a ton on the underlying argument that led to the update; that argument will lead to tons of other updates across the board”.
I imagine that the spirit of the questions is that I don’t perform a Bayesian update and instead do more of a “causal intervention” on the relevant node and propagate downstream. In that case:
I’m confused by the question. If the peak rate of GWP growth ever is 25%, it seems like the singularity didn’t happen? Nonetheless, to the extent this question is about updates on the quality or duration of the singularity (as opposed to the leadup to it), I don’t think this affects my actions at all.
I’m often acting based on my 10%-timelines, so if you tell me that TAI comes at exactly the midway point between now and my current median, that can counterintuitively have the same effect as lengthening my timelines. (Also I start sketching out far more concrete plans given this implausibly precise knowledge of when TAI comes.) So I’m instead going to answer the question where we imagine that my entire probability distribution is compressed / stretched along the time axis by a factor of 2. If compressed (shorter timelines), probably not much changes; I care more about having influence on AGI actors but I’m already in a great place for that. If stretched (longer timelines), I maybe focus more on weirder alignment theory, e.g. perhaps I work at ARC.
Not much effect. In more unipolar worlds, I spend more time predicting which labs will develop AGI, so that I can be there at crunch time; in more multipolar worlds I spend less time doing that.
No effect. Averting 50% of an existential catastrophe is still really good.
Similar effects as (2), but with much smaller magnitude. (Lower cost ⇒ more focus on weird alignment theory, since influence becomes less useful.)
This one particularly feels like I would be making some big Bayesian update (e.g. I think Eliezer’s view predicts this much more strongly?) but if I ignore that then no effect.
Similar effects as (2), similar magnitudes as well. (More risk-averse ⇒ getting the right solution becomes more important than having influence.)
More dissimilar ⇒ more focus on weirder but more general work (e.g. less language models, more ELK). (This isn’t the same as 2, 5, and 7 because this doesn’t change how much I care about influence.)
Thanks for your response. :)
I’m a little confused by your confusion. Let’s say you currently think the singularity will proceed at a rate of R. The spirit of what I’m asking is: what would you change if you learned that it will proceed at a rate of one half R. (Maybe plucking specific numbers about the peak-rate of growth just made things more confusing). For me at least, I’d probably expect a lot more oversight, as people have more time to adjust to what’s happening in the world around them.
I’m also a little confused about this. My exact phrasing was, “You learn that the cost of misalignment is half as much as you thought, in the sense that slightly misaligned AI software impose costs that are half as much (ethically, or economically), compared to what you used to think.” I assume you don’t think that slightly misaligned software will, by default, cause extinction, especially if it’s acting alone and is economically or geographically isolated.
We could perhaps view this through an analogy. War is really bad: so bad that maybe it will even cause our extinction (if say, we have some really terrible nuclear winter). But by default, I don’t expect war to cause humanity to go extinct. And so, if someone asked me about a scenario in which the costs of war are only half as much as I thought, it would probably significantly update me away from thinking we need to take actions to prevent war. The magnitude of this update might not be large, but understanding exactly how much we’d update and change our strategy in light of this information is type of thing I’m asking for.
What does this mean? On my understanding, singularities don’t proceed at fixed rates?
I agree that in practice there will be some maximum rate of GDP growth, because there are fundamental physical limits (and more tight in-practice limits that we don’t know), but it seems like they’ll be way higher than 25% per year. Or to put it a different way, at 25% max rate I think it stops deserving the term “singularity”, it seems like it takes decades and maybe centuries to reach technological maturity at that rate. (Which could totally happen! Maybe we will move very slowly and cautiously! I don’t particularly expect it though.)
If you actually mean halving the peak rate of GDP growth during the singularity, and a singularity actually happens, then I think it doesn’t affect my actions at all; all of the relevant stuff happened well before we get to the peak rate.
If you ask me to imagine “max rates at orders of magnitude where Rohin would say there was no singularity”, then I think I pivot my plan for impact into figuring out how exactly we are going to manage to coordinate to move slowly even when there’s tons of obvious value lying around, and then trying to use the same techniques to get tons and tons of oversight on the systems we build.
Hmm, I interpreted “cost of misalignment” as “expected cost of misalignment”, as the standard way to deal with probabilistic things, but it sounds like you want something else.
Let’s say for purposes of argument I think 10% chance of extinction, and 90% chance of “moderate costs but nothing terrible”. Which of the following am I supposed to have updated to?
5% extinction, 95% moderate costs
5% extinction, 45% moderate costs, 50% perfect world
10% extinction, 90% mild costs
10% outcome-half-as-bad-as-extinction, 90% mild costs
0% extinction, 100% mild costs
I was imagining (4), but any of (1) - (3) would also not change my actions; it sounds like you want me to imagine (5) but in that case I just completely switch out of AI alignment and work on something else, but that’s because you moved p(extinction) from 10% to 0%, which is a wild update to have made and not what I would call “the cost of misalignment is half as much as I thought”.
I think you sufficiently addressed my confusion, so you don’t need to reply to this comment, but I still had a few responses to what you said.
No, I agree. But growth is generally measured over an interval. In the original comment I proposed the interval of one year during the peak rate of economic growth. To allay your concern that a 25% growth rate indicates we didn’t experience a singularity, I meant that we were halving the growth rate during the peak economic growth year in our future, regardless of whether that rate was very fast.
The 25% figure was totally arbitrary. I didn’t mean it as any sort of prediction. I agree that an extrapolation from biological growth implies that we can and should see >1000% growth rates eventually, though it seems plausible that we would coordinate to avoid that.
That’s reasonable. A separate question might be about whether the rate of growth during the entire duration from now until the peak rate will cut in half.
I think the way you’re bucketing this into “costs if we go extinct” and “costs if we don’t go extinct” is reasonable. But one could also think that the disvalue of extinction is more continuous with disvalue in non-extinction scenarios, which makes things a bit more tricky. I hope that makes sense.
Cool, that all makes sense.
I’m happy to use continuous notions (and that’s what I was doing in my original comment) as long as “half the cost” means “you update such that the expected costs of misalignment according to your probability distribution over the future are halved”. One simple way to imagine this update is to take all the worlds where there was any misalignment, halve their probability, and distribute the extra probability mass to worlds with zero costs of misalignment. At which point I reason “well, 10% extinction changes to 5% extinction, I don’t need to know anything else to know that I’m still going to work on alignment, and given that, none of my actions are going to change (since the relative probabilities of different misalignment failure scenarios remain the same, which is what determines my actions within alignment)”.
I got the sense from your previous comment that you wanted me to imagine some different form of update and I was trying to figure out what.
Good to hear! What are your 10% timelines?
Idk, maybe 2030 for x-risky systems?