You’re describing Type 3 people; “Wanting to feel traction / in-control” is absolutely insincere as “belief in Type 2 plan”. I don’t claim to understand the Type 3 mindset, I call for people to think about it. “They just don’t share some core intuitions re: Alignment Is Hard” is not a respectable position. A real scientist would be open to debate.
I think “want to feel traction/in-control” is more obviously a bias (and people vary in whether they read to me as having this bias.).
I think the attitude of “don’t share core intuitions isn’t a respectable position” is, well, idk you have that attitude if you want but I don’t think it’d going to help you understand or persuade people.
There is no clear line between Type 2 and Type 3 people, it can be true that people both have earnest intellectual positions you find frustrating but it’s fundamentally an intellectual disagreement and also they can have biases that you-and-they would both agree would be bad, and the percent of causal impact of the intellectual-positions and biases can range from like 99% to 1% in either direction.
Even among people who do seem to have followed the entire Alignment-Is-Hard arguments and understand them all, a bunch just say “yeah I don’t buy that as obvious” to stop that seems obvious to me (or in some cases, which seems obviously like ‘>50% likely’ to me). And they seem sincere to me.
This results in sad conversations where you’re like ‘but, clearly, I am picking up that you’ve got biased cope in you!’ and they’re like ’but, clearly, I can tell that my thinking here is not just cope, I know what my copey-thinking feels like, and the particular argument we’re talking about doesn’t feel like that. (and both are correct – there was cope but it lay elsewhere).
I think the attitude of “don’t share core intuitions isn’t a respectable position” is, well, idk you have that attitude if you want but I don’t think it’d going to help you understand or persuade people.
Of course it won’t help me persuade people. I absolutely think it will help me understand people, relative to your apparent position. Yes, you have to understand that they are not doing the “have a respectable position” thing. This is important.
There is no clear line between Type 2 and Type 3 people, it can be true that people both have earnest intellectual positions you find frustrating but it’s fundamentally an intellectual disagreement and also they can have biases that you-and-they would both agree would be bad, and the percent of causal impact of the intellectual-positions and biases can range from like 99% to 1% in either direction.
(Almost always true and usually not worth saying.)
Maybe I should just check, are you consciously trying to deny a conflict-type stance, and consciously trying to (conflictually) assert the mistake-type stance, as a strategy?
Yes, you have to understand that they are not doing the “have a respectable position” thing.
I think this is false for the particular people I have in mind (who to be clear are filtered for “are willing to talk to me”, but, they seem like relatively central members of a significant class of people).
Maybe I’m not sure what you mean by “have a respectable position.”
(I think a large chunk of the problem is “the full argument is complicated, people aren’t tracking all the pieces.” Which is maybe not “intellectually respectable”, though IMO understandable, and importantly different from ‘biased.’ But, when I sit someone down and make sure to lay out all the pieces and make sure they understand each piece and understand how the pieces fit together, we still hit a few pieces where they are like ‘yeah I just don’t buy that.’)
Maybe I should just check, are you consciously trying to deny a conflict-type stance, and consciously trying to (conflictually) assert the mistake-type stance, as a strategy?
I’m saying you seem to be conflict-stanced in a way that is inaccurate to me (i.e. you are making a mistake in your conflict)
I think it’s current to be conflict-stanced, but, you need like a good model who/what the enemy is (“sniper mindset”), and the words you’re saying sound to me like you don’t (in a way that seems more tribally biased than you usually seem to me)
who to be clear are filtered for “are willing to talk to me”
In particular, they’re filtered for not being cruxy for whether AGI capabilities research continues. (ETA: … which is partly but very much not entirely due to anti-correlation with claiming to be “AI safety”.)
Maybe I’m not sure what you mean by “have a respectable position.”
I’m not sure either, but for example if a scientist publishes an experiment, and then another scientist with a known track record of understanding things publishes a critique, the first scientist can’t respectably dismiss the critique unsubstantially.
you need like a good model who/what the enemy is
Ok good, yes, we agree on what the question/problem is.
(in a way that seems more tribally biased than you usually seem to me)
Not more tribally biased, if anything less. I’m more upset though, because why can’t we discuss the conflict landscape? I mean why can’t the people in these LW conversations say something like “yeah the lion’s share of the relevant power is held by people who don’t sincerely hold A or B / Type 1/2”?
Maybe I’m not sure what you mean by “have a respectable position.”
I’m not sure either, but for example if a scientist publishes an experiment, and then another scientist with a known track record of understanding things publishes a critique, the first scientist can’t respectably dismiss the critique unsubstantially.
I think:
there isn’t consensus on what counts as a good track record of understanding things, or a good critique
(relatedly, there’s disagreement about which epistemic norms are important)
A few points haven’t really had a critique that interlocutors consider very substantive or clear, just sort of frustratedly rehashing the same arguments they found unpersuasive the first time.
And for at least some of those points, I’m personally like “my intuitions lean in the other direction as y’all Camp B people, but, I don’t feel like I can really confidently stand by it, I don’tthink the argument has been made very clearly.”
Things I have in mind.
On “How Hard is Success?”
“How anti-natural is corrigibility?” (i.e. “sure, I see some arguments for thinking corrigibility might get hard as you dial up capabilities. But, can’t we just…not dial up capabilities past that point? It seems like humans understand corrigibility pretty easily when they try, it seems like Claude-et-al currently actually understand corrigibility reasonable well and if focused on training that I don’t see why it wouldn’t basically work?”)
“How likely is FOOM?” (i.e. “if I believed FOOM was very likely, I’d agree we had to be a lot more careful about ramping up capabilities and being scared the next training run would be our last. But, I don’t see reason to think FOOM is particularly likely, and I see reasons to think it’s not.”)
“What capabilities are needed to make a pivotally-scary demo or game-changing coordination tech?” (i.e. you maybe don’t need to actually do anything that complicated to radically change how much coordination is possible for a proper controlled takeoff)
On “How Bad is Failure?”
“How nice is AI likely to be?” (i.e. it really only needs to be very slightly nice to give us the solar system, and it seems weird for the niceness to be “zero”)
“How likely is whatever ends up being created to have moral value?”. (i.e. consciousness is pretty confusing, seems pretty plausible that whatever ends up getting created would at least be a pretty interesting successor species)
For all of those, like, I know the arguments against, but I my own current take is not like >75% on any of these given model uncertainty, and meanwhile, if your probabilities are below 50% on the relevant MIRI-ish argument, you also have to worry about...
...
Other geopolitical concerns and considerations
The longer a pause went on, the more likely it is that things get unstable and something goes wrong
If you think alignment isn’t that hard, or that sticking to a safe-but-high power level isn’t that hard, you do have to take more seriously the risk of serious misuse.
You might think buy-in for a serious pause or controlled takeoff is basically impossible until we have seriously scary demos, and the “race to build them, then use them to rally world leaders and then burn the lead” plan might seem better than “try to pause now.”
The sorts of things necessary for a pause seem way more likely to go badly than well (i.e. it’s basically guaranteed to create a molochian bureaucratic hellscape that stifles wide ranging innovation and makes it harder to do anything sensible with AI development)
Part of what I’m saying is that it’s not respectable for someone to both
claim to have substantial reason to think that making non-lethal AGI is tractable, and also
not defend this position in public from strong technical critiques.
It sounds like you’re talking about non-experts. Fine, of course a non-expert will be generally less confident about conclusions in the field. I’m saying that there is a camp which is treated as expert in terms of funding, social cachet, regulatory influence, etc., but which is not expert in my sense of having a respectable position.
I mean why can’t the people in these LW conversations say something like “yeah the lion’s share of the relevant power is held by people who don’t sincerely hold A or B / Type 1/2”?
Here are some groups who I think are currently relevant (not in any particular order, without quite knowing where I’m going with this yet)
nVidia (I hadn’t realized how huge they were till recently)
Sam Altman in particular
Demis Hassabis in particular
Dario Amodei in particular
OpenAI, Google, Google DeepMind, Anthropic, and maybe by now other leading labs (in slightly less particular than Sam/Demis/Dario)
The cluster of AI industry leaders of whom Sam/Demis/Dario are representative of.
People at labs who are basically AI researchers (who might have ever said the words “I do alignment research” because that’s those are the mouthwords the culture at their company said, but, weren’t meaningfully involved with safety efforts)
Anthropic safety engineers and similar
Eliezer in particular
cluster including MIRI / Lightcone / relatively highly bought in friends
Oliver Habryka in particular
OpenPhil
Dustin Moskowitz in particular
Jaan Tallin
“Constellation”
“AI Safety” egregore
“The EAgregore”
Future of Life Institute
Maybe Max Tegmark in particular, I’m not sure
Trump
MAGA
Elon Musk
Okay writing that out turned out to be most of the time I felt like spending right now, but, the next questions I have in mind are “who has power, here, over what?”, or “what is the ‘relevant’” power?
But, roughly:
a) a ton of the above are “fake” in some sense
b) On the worldscale, the OpenPhil/Constellation/Anthropic cluster is relatively weak.
c) within OpenPhil/Constellation/Anthropic, there are people more like Dario, Holden, Jack Clark, and Dustin, and people who are more rank-and-file-EA/AI-ish. I think the latter are fake the way I think you think things are fake. I think the former are differently fake from the way I think you think things are fake.
d) there are ton of vague EA/AI-safety people that I think are fake in the way you think they are fake but they don’t really matter except for The Median Researcher Problem
Nice, thank you! In a much more gray area of [has some non-trivial identity as “AI safety person”], I assume there’d be lots of people, some with relevant power. This would include some heads of research at big companies. Maybe you meant that by “People at labs who are basically AI researchers”, but I mean, they would be less coded as “AI safety” but still would e.g.
Pay lip service to safety;
Maybe even bring it up sometimes internally;
Happily manufacture what boils down to lies for funders regarding technical safety;
Internally think of themselves as doing something good and safe for humanity;
Internally think of themselves as being reasonable and responsible regarding AI.
Further, this would include many AI researchers in academia. People around Silicon Valley / LW / etc. tend to discount academia, but I don’t.
You’re describing Type 3 people; “Wanting to feel traction / in-control” is absolutely insincere as “belief in Type 2 plan”. I don’t claim to understand the Type 3 mindset, I call for people to think about it. “They just don’t share some core intuitions re: Alignment Is Hard” is not a respectable position. A real scientist would be open to debate.
I think “want to feel traction/in-control” is more obviously a bias (and people vary in whether they read to me as having this bias.).
I think the attitude of “don’t share core intuitions isn’t a respectable position” is, well, idk you have that attitude if you want but I don’t think it’d going to help you understand or persuade people.
There is no clear line between Type 2 and Type 3 people, it can be true that people both have earnest intellectual positions you find frustrating but it’s fundamentally an intellectual disagreement and also they can have biases that you-and-they would both agree would be bad, and the percent of causal impact of the intellectual-positions and biases can range from like 99% to 1% in either direction.
Even among people who do seem to have followed the entire Alignment-Is-Hard arguments and understand them all, a bunch just say “yeah I don’t buy that as obvious” to stop that seems obvious to me (or in some cases, which seems obviously like ‘>50% likely’ to me). And they seem sincere to me.
This results in sad conversations where you’re like ‘but, clearly, I am picking up that you’ve got biased cope in you!’ and they’re like ’but, clearly, I can tell that my thinking here is not just cope, I know what my copey-thinking feels like, and the particular argument we’re talking about doesn’t feel like that. (and both are correct – there was cope but it lay elsewhere).
Of course it won’t help me persuade people. I absolutely think it will help me understand people, relative to your apparent position. Yes, you have to understand that they are not doing the “have a respectable position” thing. This is important.
(Almost always true and usually not worth saying.)
Maybe I should just check, are you consciously trying to deny a conflict-type stance, and consciously trying to (conflictually) assert the mistake-type stance, as a strategy?
I think this is false for the particular people I have in mind (who to be clear are filtered for “are willing to talk to me”, but, they seem like relatively central members of a significant class of people).
Maybe I’m not sure what you mean by “have a respectable position.”
(I think a large chunk of the problem is “the full argument is complicated, people aren’t tracking all the pieces.” Which is maybe not “intellectually respectable”, though IMO understandable, and importantly different from ‘biased.’ But, when I sit someone down and make sure to lay out all the pieces and make sure they understand each piece and understand how the pieces fit together, we still hit a few pieces where they are like ‘yeah I just don’t buy that.’)
I’m saying you seem to be conflict-stanced in a way that is inaccurate to me (i.e. you are making a mistake in your conflict)
I think it’s current to be conflict-stanced, but, you need like a good model who/what the enemy is (“sniper mindset”), and the words you’re saying sound to me like you don’t (in a way that seems more tribally biased than you usually seem to me)
In particular, they’re filtered for not being cruxy for whether AGI capabilities research continues. (ETA: … which is partly but very much not entirely due to anti-correlation with claiming to be “AI safety”.)
I’m not sure either, but for example if a scientist publishes an experiment, and then another scientist with a known track record of understanding things publishes a critique, the first scientist can’t respectably dismiss the critique unsubstantially.
Ok good, yes, we agree on what the question/problem is.
Not more tribally biased, if anything less. I’m more upset though, because why can’t we discuss the conflict landscape? I mean why can’t the people in these LW conversations say something like “yeah the lion’s share of the relevant power is held by people who don’t sincerely hold A or B / Type 1/2”?
I think:
there isn’t consensus on what counts as a good track record of understanding things, or a good critique
(relatedly, there’s disagreement about which epistemic norms are important)
A few points haven’t really had a critique that interlocutors consider very substantive or clear, just sort of frustratedly rehashing the same arguments they found unpersuasive the first time.
And for at least some of those points, I’m personally like “my intuitions lean in the other direction as y’all Camp B people, but, I don’t feel like I can really confidently stand by it, I don’t think the argument has been made very clearly.”
Things I have in mind.
On “How Hard is Success?”
“How anti-natural is corrigibility?” (i.e. “sure, I see some arguments for thinking corrigibility might get hard as you dial up capabilities. But, can’t we just… not dial up capabilities past that point? It seems like humans understand corrigibility pretty easily when they try, it seems like Claude-et-al currently actually understand corrigibility reasonable well and if focused on training that I don’t see why it wouldn’t basically work?”)
“How likely is FOOM?” (i.e. “if I believed FOOM was very likely, I’d agree we had to be a lot more careful about ramping up capabilities and being scared the next training run would be our last. But, I don’t see reason to think FOOM is particularly likely, and I see reasons to think it’s not.”)
“What capabilities are needed to make a pivotally-scary demo or game-changing coordination tech?” (i.e. you maybe don’t need to actually do anything that complicated to radically change how much coordination is possible for a proper controlled takeoff)
On “How Bad is Failure?”
“How nice is AI likely to be?” (i.e. it really only needs to be very slightly nice to give us the solar system, and it seems weird for the niceness to be “zero”)
“How likely is whatever ends up being created to have moral value?”. (i.e. consciousness is pretty confusing, seems pretty plausible that whatever ends up getting created would at least be a pretty interesting successor species)
For all of those, like, I know the arguments against, but I my own current take is not like >75% on any of these given model uncertainty, and meanwhile, if your probabilities are below 50% on the relevant MIRI-ish argument, you also have to worry about...
...
Other geopolitical concerns and considerations
The longer a pause went on, the more likely it is that things get unstable and something goes wrong
If you think alignment isn’t that hard, or that sticking to a safe-but-high power level isn’t that hard, you do have to take more seriously the risk of serious misuse.
You might think buy-in for a serious pause or controlled takeoff is basically impossible until we have seriously scary demos, and the “race to build them, then use them to rally world leaders and then burn the lead” plan might seem better than “try to pause now.”
The sorts of things necessary for a pause seem way more likely to go badly than well (i.e. it’s basically guaranteed to create a molochian bureaucratic hellscape that stifles wide ranging innovation and makes it harder to do anything sensible with AI development)
Part of what I’m saying is that it’s not respectable for someone to both
claim to have substantial reason to think that making non-lethal AGI is tractable, and also
not defend this position in public from strong technical critiques.
It sounds like you’re talking about non-experts. Fine, of course a non-expert will be generally less confident about conclusions in the field. I’m saying that there is a camp which is treated as expert in terms of funding, social cachet, regulatory influence, etc., but which is not expert in my sense of having a respectable position.
Here are some groups who I think are currently relevant (not in any particular order, without quite knowing where I’m going with this yet)
nVidia (I hadn’t realized how huge they were till recently)
Sam Altman in particular
Demis Hassabis in particular
Dario Amodei in particular
OpenAI, Google, Google DeepMind, Anthropic, and maybe by now other leading labs (in slightly less particular than Sam/Demis/Dario)
The cluster of AI industry leaders of whom Sam/Demis/Dario are representative of.
People at labs who are basically AI researchers (who might have ever said the words “I do alignment research” because that’s those are the mouthwords the culture at their company said, but, weren’t meaningfully involved with safety efforts)
Anthropic safety engineers and similar
Eliezer in particular
cluster including MIRI / Lightcone / relatively highly bought in friends
Oliver Habryka in particular
OpenPhil
Dustin Moskowitz in particular
Jaan Tallin
“Constellation”
“AI Safety” egregore
“The EAgregore”
Future of Life Institute
Maybe Max Tegmark in particular, I’m not sure
Trump
MAGA
Elon Musk
Okay writing that out turned out to be most of the time I felt like spending right now, but, the next questions I have in mind are “who has power, here, over what?”, or “what is the ‘relevant’” power?
But, roughly:
a) a ton of the above are “fake” in some sense
b) On the worldscale, the OpenPhil/Constellation/Anthropic cluster is relatively weak.
c) within OpenPhil/Constellation/Anthropic, there are people more like Dario, Holden, Jack Clark, and Dustin, and people who are more rank-and-file-EA/AI-ish. I think the latter are fake the way I think you think things are fake. I think the former are differently fake from the way I think you think things are fake.
d) there are ton of vague EA/AI-safety people that I think are fake in the way you think they are fake but they don’t really matter except for The Median Researcher Problem
Nice, thank you! In a much more gray area of [has some non-trivial identity as “AI safety person”], I assume there’d be lots of people, some with relevant power. This would include some heads of research at big companies. Maybe you meant that by “People at labs who are basically AI researchers”, but I mean, they would be less coded as “AI safety” but still would e.g.
Pay lip service to safety;
Maybe even bring it up sometimes internally;
Happily manufacture what boils down to lies for funders regarding technical safety;
Internally think of themselves as doing something good and safe for humanity;
Internally think of themselves as being reasonable and responsible regarding AI.
Further, this would include many AI researchers in academia. People around Silicon Valley / LW / etc. tend to discount academia, but I don’t.