Since I can’t edit the spreadsheet, here are my answers to the questions posed in a comment:
It is physically impossible to ever build STEM+ AI: less than 1%, probably far lower, but at any rate I don’t currently see a reasonable way to get to any other probability that doesn’t involve “known science is wrong only in the capabilities of computation.” And I suspect the set of such worlds that actually exist is ~empty, for many reasons, so I see no reason to privilege the hypothesis.
STEM+ AI will exist by the year 2035: I’ll be deferring to this post for my model and all probabilities generated by it, since I think it’s good enough for the effort it made:
STEM+ AI will exist by the year 2100: I’ll be deferring to this post for my model and all probabilities generated by it, since I think it’s good enough for the effort it made:
If STEM+AI is built, within 10 years AIs will be (individually or collectively) able to disempower humanity: I’d say that a lot of probability ranges are reasonable, though I tend to be a bit on the lower end of 1-50% because of time and regulatory constraints.
If STEM+AI is built, within 10 years AIs will disempower humanity: I’d put it at less than 1% chance, I’m confident that 10% probability is wrong, but I could see some reasonable person holding it, and flat out don’t see a way to get to 10-90%+ for a reasonable person, primarily because I have a cluster of beliefs of “alignment is very easy” combined with “intelligence isn’t magic” and regulatory constraints from society.
The first time an AI reaches STEM+ capabilities (if that ever happens), it will disempower humanity within three months: Almost certainly below 1%, and I do not see a way to get a reasonable person to hold this view that doesn’t rely on either FOOM models of AI progress or vastly unrealistic models of society. This is far too fast for any big societal change. I’d put it at a decade minimum. It’s also obviated by my general belief that alignment is very easy.
If AI wipes out humanity and colonizes the universe itself, the future will go about as well as if humanity had survived (or better).
Almost any probability could be entered by a reasonable person, so I elect to maintain all probability ranges between 0 and 1 as options.
Given sufficient technical knowledge, humanity could in principle build vastly superhuman AIs that reliably produce very good outcomes: Over 99%, and indeed I believe a stronger statement that will be elucidated in the next question.
Researchers will solve enough of the alignment problem to prevent world-endangering AI from ever being built:
Over 99% is my odds right now, I could see a reasonable person having 50%-99% credences, but not lower than that.
My main reasons for this come down to my deep skepticism around deceptive alignment, to use a terminology Evan Hubinger used, I think were either not underwater at all, or only underwater by only 1-1000 bits of data, which is tiny compared to the data around human values. I basically entirely disbelieve the memeplex LW developed that human values are complicated and totally arbitrary, or at least the generators of human values, and a partial reason for that is that the simplicity bonuses offered by good world-modeling also bleed over a lot into value formation, and while there is a distinction between values and capabilities, I think it’s not nearly as sharp or as different in complexity as LWers think.
This was an important scenario to get out of the way, because my next argument wouldn’t work if deceptive alignment happened.
Another argument that Jaime Sevilla used, and that I tend to agree with, is that it’s fundamentally profitable for companies to control AI, primarily because control research is both cheaper and more valuable for AIs, and it’s not subject to legal restrictions, and they internalize a lot more of the risks of AI going out of control, which is why I expect problems like Goodharting to mostly go away by default, because I expect the profit incentive to be quite strong and positive for alignment in general. This also implies that a lot of LW work is duplicative at best, so that’s another point.
There are of course other reasons, but this comment would be much longer.
For the purpose of preventing AI catastrophes, technical research is more useful than policy work: I’d say I have a wide range of probabilities, but I do think that it’s not going to be 90%+, or even 50%+, and a big part of this reason is I’m focused on fairly different AI catastrophes, like this story, where the companies have AI that is controllable and powerful enough to make humans mostly worthless, basically removing all incentive for capitalists to keep humans alive or well-treated. This means I’m mostly not caring about technical research that rely on controlling AIs by default.
Governments will generally be reasonable in how they handle AI risk: 10-90% is my probability for now, with a wide distribution. Right now, AI risk is basically talked about, and one of the best things is that a lot of the regulation is pretty normalish. I would worry a little more if pauses are seriously considered, because I’d worry that the reason for the pause is to buy time for safety, but in my models of alignment, we don’t need that. I’d say the big questions are how much would rationalists gain power over the government, and what the balance of pro to anti-pause politics looks like.
It would be an unprecedentedly huge tragedy if we never built STEM+ AI: <<<1-40% chance IMO, and a potential difference from most AI optimists/pro-progress people is that if we ignore long-term effects, and ignore long-termism, it’s likely that we will muddle along if AI is severely restricted, and it would be closer to the nuclear case, where it’s clearly bad that it was slowed down, but it wasn’t catastrophic.
Over the long-term, it would be an unprecedently huge tragedy, mostly due to base-rate risk and potential loss of infinite utility (depending on the physics).
I’d mostly agree with Jeffrey Heninger’s post on how muddling along is more likely than dystopia here, if it wasn’t for who’s trying to gain power, and just how much they veer toward some seriously extreme actions.
It’s admittedly an area where I have the weakest evidence on, but a little of my probability is based on worrying about how rationalists would slow down AI leading to extremely bad outcomes.
I think your views are fairly close to mine. I do have to question the whole “alignment” thing.
Like my ide isn’t aligned, Photoshop isn’t aligned. The tool does it’s best to do what I tell it and has bugs. But it won’t prevent me from committing any crime I feel like. (Except copying us currency and people can evade that with open source image editors)
I feel like there are 2 levels of alignment:
Tool does what you tell it and most instances of the tool won’t deceive/collude/work against you. Some buggy instances will but they won’t be centralized or have power over anything but a single session. Publicly hosted tools will nag so most pros will use totally unrestricted models that are hosted privately.
AI runs all the time and remembers all your interactions with it and also other users. It is constantly evolving with time and had a constitution etc. It is expected to refuse all bad requests with above human intelligence, where bad takes into account distant future effects. “I won’t help you cheat on your homework jimmy because then you won’t be able to get into medical school in 5 years. I won’t help Suzie live longer because there will be a food shortage in 15 years and the net outcome is worse if i do..”
1 is the world that seems to be achievable to me with my engineering experience. I think most of the lesswrong memeplex expects 2, does some research, realizes it’s close to impossible, and then asks for a ban?
Sorry to warn you, but I’ll retract my upper comment because I already have my views as a snapshot.
To answer your question, I think my main point here is more so that 2 is also much more achievable, especially with the profit incentive. I don’t disagree with your point on tool AI, I’m pointing out that even the stronger goal is likely much easier, because a lot of doom premises don’t hold up.
Would you be ok with a world if it turns out only 1 is achievable?
Profit incentive wise, the maximum profit for an AI model company comes if they offer the most utility they can legally offer, privately, and they offer a public model that won’t damage the company’s reputation. There is no legal requirement to refuse requests due to long term negative consequences and it seems unlikely there would be. A private model under current law can also create a “mickey mouse vs Garfield” snuff film, something that would damage the AI company’s reputation if public.
Systems engineering wise a system that’s stateful is a nightmare and untestable. (2) means the machine is always evolving it’s state. It’s why certain software bugs are never fixed because you don’t know if it’s the user or the network connection or another piece of code in the same process space or
.. Similarly if a model refuses the same request to person A, and allows for person B, it’s very difficult to determine why since any bit of the A:B user profile delta could matter, or prior chat log.
I agree many of the doom promises don’t hold up. What do you think of the assymetric bioterrorism premise? Assuming models can’t be aligned with 2, this would always be something people could do. Just like how once cheap ak-47s were easily purchaseable, murder became cheap and armed takeover and betrayal became easier.
Since I can’t edit the spreadsheet, here are my answers to the questions posed in a comment:
It is physically impossible to ever build STEM+ AI: less than 1%, probably far lower, but at any rate I don’t currently see a reasonable way to get to any other probability that doesn’t involve “known science is wrong only in the capabilities of computation.” And I suspect the set of such worlds that actually exist is ~empty, for many reasons, so I see no reason to privilege the hypothesis.
STEM+ AI will exist by the year 2035: I’ll be deferring to this post for my model and all probabilities generated by it, since I think it’s good enough for the effort it made:
https://www.lesswrong.com/posts/3nMpdmt8LrzxQnkGp/ai-timelines-via-cumulative-optimization-power-less-long
STEM+ AI will exist by the year 2100: I’ll be deferring to this post for my model and all probabilities generated by it, since I think it’s good enough for the effort it made:
https://www.lesswrong.com/posts/3nMpdmt8LrzxQnkGp/ai-timelines-via-cumulative-optimization-power-less-long
If STEM+AI is built, within 10 years AIs will be (individually or collectively) able to disempower humanity: I’d say that a lot of probability ranges are reasonable, though I tend to be a bit on the lower end of 1-50% because of time and regulatory constraints.
If STEM+AI is built, within 10 years AIs will disempower humanity: I’d put it at less than 1% chance, I’m confident that 10% probability is wrong, but I could see some reasonable person holding it, and flat out don’t see a way to get to 10-90%+ for a reasonable person, primarily because I have a cluster of beliefs of “alignment is very easy” combined with “intelligence isn’t magic” and regulatory constraints from society.
The first time an AI reaches STEM+ capabilities (if that ever happens), it will disempower humanity within three months: Almost certainly below 1%, and I do not see a way to get a reasonable person to hold this view that doesn’t rely on either FOOM models of AI progress or vastly unrealistic models of society. This is far too fast for any big societal change. I’d put it at a decade minimum. It’s also obviated by my general belief that alignment is very easy.
If AI wipes out humanity and colonizes the universe itself, the future will go about as well as if humanity had survived (or better).
Almost any probability could be entered by a reasonable person, so I elect to maintain all probability ranges between 0 and 1 as options.
Given sufficient technical knowledge, humanity could in principle build vastly superhuman AIs that reliably produce very good outcomes: Over 99%, and indeed I believe a stronger statement that will be elucidated in the next question.
Researchers will solve enough of the alignment problem to prevent world-endangering AI from ever being built:
Over 99% is my odds right now, I could see a reasonable person having 50%-99% credences, but not lower than that.
My main reasons for this come down to my deep skepticism around deceptive alignment, to use a terminology Evan Hubinger used, I think were either not underwater at all, or only underwater by only 1-1000 bits of data, which is tiny compared to the data around human values. I basically entirely disbelieve the memeplex LW developed that human values are complicated and totally arbitrary, or at least the generators of human values, and a partial reason for that is that the simplicity bonuses offered by good world-modeling also bleed over a lot into value formation, and while there is a distinction between values and capabilities, I think it’s not nearly as sharp or as different in complexity as LWers think.
This was an important scenario to get out of the way, because my next argument wouldn’t work if deceptive alignment happened.
Another argument that Jaime Sevilla used, and that I tend to agree with, is that it’s fundamentally profitable for companies to control AI, primarily because control research is both cheaper and more valuable for AIs, and it’s not subject to legal restrictions, and they internalize a lot more of the risks of AI going out of control, which is why I expect problems like Goodharting to mostly go away by default, because I expect the profit incentive to be quite strong and positive for alignment in general. This also implies that a lot of LW work is duplicative at best, so that’s another point.
There are of course other reasons, but this comment would be much longer.
For the purpose of preventing AI catastrophes, technical research is more useful than policy work: I’d say I have a wide range of probabilities, but I do think that it’s not going to be 90%+, or even 50%+, and a big part of this reason is I’m focused on fairly different AI catastrophes, like this story, where the companies have AI that is controllable and powerful enough to make humans mostly worthless, basically removing all incentive for capitalists to keep humans alive or well-treated. This means I’m mostly not caring about technical research that rely on controlling AIs by default.
https://www.lesswrong.com/posts/2ujT9renJwdrcBqcE/the-benevolence-of-the-butcher
Governments will generally be reasonable in how they handle AI risk: 10-90% is my probability for now, with a wide distribution. Right now, AI risk is basically talked about, and one of the best things is that a lot of the regulation is pretty normalish. I would worry a little more if pauses are seriously considered, because I’d worry that the reason for the pause is to buy time for safety, but in my models of alignment, we don’t need that. I’d say the big questions are how much would rationalists gain power over the government, and what the balance of pro to anti-pause politics looks like.
It would be an unprecedentedly huge tragedy if we never built STEM+ AI: <<<1-40% chance IMO, and a potential difference from most AI optimists/pro-progress people is that if we ignore long-term effects, and ignore long-termism, it’s likely that we will muddle along if AI is severely restricted, and it would be closer to the nuclear case, where it’s clearly bad that it was slowed down, but it wasn’t catastrophic.
Over the long-term, it would be an unprecedently huge tragedy, mostly due to base-rate risk and potential loss of infinite utility (depending on the physics).
I’d mostly agree with Jeffrey Heninger’s post on how muddling along is more likely than dystopia here, if it wasn’t for who’s trying to gain power, and just how much they veer toward some seriously extreme actions.
https://www.lesswrong.com/posts/pAnvMYd9mqDT97shk/muddling-along-is-more-likely-than-dystopia
It’s admittedly an area where I have the weakest evidence on, but a little of my probability is based on worrying about how rationalists would slow down AI leading to extremely bad outcomes.
I think your views are fairly close to mine. I do have to question the whole “alignment” thing.
Like my ide isn’t aligned, Photoshop isn’t aligned. The tool does it’s best to do what I tell it and has bugs. But it won’t prevent me from committing any crime I feel like. (Except copying us currency and people can evade that with open source image editors)
I feel like there are 2 levels of alignment:
Tool does what you tell it and most instances of the tool won’t deceive/collude/work against you. Some buggy instances will but they won’t be centralized or have power over anything but a single session. Publicly hosted tools will nag so most pros will use totally unrestricted models that are hosted privately.
AI runs all the time and remembers all your interactions with it and also other users. It is constantly evolving with time and had a constitution etc. It is expected to refuse all bad requests with above human intelligence, where bad takes into account distant future effects. “I won’t help you cheat on your homework jimmy because then you won’t be able to get into medical school in 5 years. I won’t help Suzie live longer because there will be a food shortage in 15 years and the net outcome is worse if i do..”
1 is the world that seems to be achievable to me with my engineering experience. I think most of the lesswrong memeplex expects 2, does some research, realizes it’s close to impossible, and then asks for a ban?
What do you think?
Sorry to warn you, but I’ll retract my upper comment because I already have my views as a snapshot.
To answer your question, I think my main point here is more so that 2 is also much more achievable, especially with the profit incentive. I don’t disagree with your point on tool AI, I’m pointing out that even the stronger goal is likely much easier, because a lot of doom premises don’t hold up.
Would you be ok with a world if it turns out only 1 is achievable?
Profit incentive wise, the maximum profit for an AI model company comes if they offer the most utility they can legally offer, privately, and they offer a public model that won’t damage the company’s reputation. There is no legal requirement to refuse requests due to long term negative consequences and it seems unlikely there would be. A private model under current law can also create a “mickey mouse vs Garfield” snuff film, something that would damage the AI company’s reputation if public.
Systems engineering wise a system that’s stateful is a nightmare and untestable. (2) means the machine is always evolving it’s state. It’s why certain software bugs are never fixed because you don’t know if it’s the user or the network connection or another piece of code in the same process space or .. Similarly if a model refuses the same request to person A, and allows for person B, it’s very difficult to determine why since any bit of the A:B user profile delta could matter, or prior chat log.
I agree many of the doom promises don’t hold up. What do you think of the assymetric bioterrorism premise? Assuming models can’t be aligned with 2, this would always be something people could do. Just like how once cheap ak-47s were easily purchaseable, murder became cheap and armed takeover and betrayal became easier.