ETA: I think this comment is missing some important things and I endorse Habryka’s reply more than I endorse this comment
Like, the most important thing to estimate when evaluating a political candidate is their trustworthiness and integrity! It’s the thing that would flip the sign on whether supporting someone is good or bad for the world.
I agree that this is an important thing that deserved more consideration in Eric’s analysis (I wrote a note about it on Oct 22 but then I forgot to include it in my post yesterday). But I don’t think it’s too hard to put into a model (although it’s hard to find the right numbers to use). The model I wrote down in my note is
30% chance Bores would oppose an AI pause / strong AI regulations (b/c it’s too “anti-innovation” or something)
40% chance Bores would support strong regulations
30% chance he would vote for strong regulations but not advocate for them
90% chance Bores would support weak/moderate AI regulations
My guess is that 2⁄3 of the EV comes from strong regulations and 1⁄3 from weak regulations (which I just came up with a justification for earlier today but it’s too complicated to fit in this comment), so these considerations reduce the EV to 37% (i.e., roughly divide EV by 3).
FWIW I wouldn’t say “trustworthiness” is the most important thing, more like “can be trusted to take AI risk seriously”, and my model is more about the latter. (A trustworthy politician who is honest about the fact that they don’t care about AI safety will not be getting any donations from me.)
FWIW I wouldn’t say “trustworthiness” is the most important thing, more like “can be trusted to take AI risk seriously”, and my model is more about the latter.
No. Bad. Really not what I support. Strong disagree. Bad naive consequentialism.
Yes, of course I care about whether someone takes AI risk seriously, but if someone is also untrustworthy, in my opinion this serves as a multiplier of their negative impact on the world. I do not want to create scheming and untrustworthy stakeholders that start doing sketchy stuff around AI risk. That’s how really a lot of bad stuff in the past has already happened.
I think political donations to trustworthy and reasonable politicians who are open to AI X-risk, but don’t have an opinion on it are much better for the world (indeed, infinitely better due to inverted sign), than untrustworthy ones that do seem interested.
That said, I agree that you could put this in the model! I am not against quantitatively estimating integrity and trustworthiness, and think the model would be a bunch better for considering it.
Yes, of course I care about whether someone takes AI risk seriously, but if someone is also untrustworthy, in my opinion this serves as a multiplier of their negative impact on the world. I do not want to create scheming and untrustworthy stakeholders that start doing sketchy stuff around AI risk. That’s how really a lot of bad stuff in the past has already happened.
No-true-Scotsman-ish counterargument: no-one who actually gets AI risk would engage in this kind of tomfoolery. This is the behavior of someone who almost got it, but then missed the last turn and stumbled into the den of the legendary Black Beast of Aaargh. In the abstract, I think “we should be willing to consider supporting literal Voldemort if we’re sure he has the correct model of AI X-risk” goes through.
The problem is that it just totally doesn’t work in practice, not even on pure consequentialist grounds:
You can never tell whether Voldemorts actually understand and believe your cause, or whether they’re just really good at picking the right things to say to get you to support them. No, not even if you’ve considered the possibility that they’re lying and you still feel sure they’re not. Your object-level evaluations just can’t be trusted. (At least, if they’re competent at their thing. And if they’re not just evil, but also bad at it, so bad you can tell when they’re being honest, why would you support them?)
Voldemorts and their plans are often more incompetent than they seem,[1] and when their evil-but-”effective” plan predictably blows up, you and your cause are going to suffer reputational damage and end up in a worse position than your starting one. (You’re not gonna find an Altman, you’ll find an SBF.)
Voldemorts are naturally predisposed to misunderstanding the AI risk in precisely the ways that later make them engage in sketchy stuff around it. They’re very tempted to view ASI as a giant pile of power they can grab. (They hallucinate the Ring when they look into the Black Beast’s den, if I’m to mix my analogies.)
In general, if you’re considering giving power to a really effective but untrustworthy person because they seem credibly aligned with your cause, despite their general untrustworthiness (they also don’t want to die to ASI!), you are almost certainly just getting exploited. These sorts of people should be avoided like wildfire. (Even in cases where you think you can keep them in check, you’re going to have to spend so much effort paranoidally looking over everything they do in search of gotchas that it almost certainly wouldn’t be worth it.)
Probably because of that thing where if a good person dramatically abandons their morals for the greater good, they feel that it’s a monumental enough sacrifice for the universe to take notice and make it worth it.
A lot of Paranoia: A Beginner’s Guide is actually trying to set up a bunch of the prerequisites for making this kind of argument more strongly. In particular, a feature of people who act in untrustworthy ways, and surround themselves with unprincipled people, is that they end up sacrificing most of their sanity on the altar of paranoia.
Like, fiction HPMoR Voldemort happened to not have any adversaries who could disrupt his OODA loop, but that was purely a fiction. A world with two Voldemort-level competent players results in two people nuking their sanity as they try to get one over each other, and at that point, you can’t really rely on them having good takes, or sane stances on much of anything (or, if they are genuinely smart enough, them making an actually binding alliance, which via utilization of things like unbreakable vows is surprisingly doable in the HPMoR universe, but which in reality runs into many more issues).
Tone note: I really don’t like people responding to other people’s claims with content like “No. Bad… Bad naive consequentialism” (I’m totally fine with “Really not what I support. Strong disagree.”). It reads quite strongly to me as trying to scold someone or socially punish them using social status for a claim that you disagree with; they feel continuous with some kind of frame that’s like “habryka is the arbiter of the Good”
It sounds like scolding someone because it is! Like, IDK, sometimes that’s the thing you want to do?
I mean, I am not the “arbiter of the good”, but like, many things are distasteful and should be reacted to as such. I react similarly to people posting LLM slop on LW (usually more in the form of “wtf, come on man, please at least write a response yourself, don’t copy paste from an LLM”) and many other things I see as norm violations.
I definitely consider the thing I interpreted Michael to be saying a norm violation of LessWrong, and endorse lending my weight to norm enforcement of that (he then clarified in a way that I think largely diffused the situation, but I think I was pretty justified in my initial reaction). Not all spaces I participate in are places where I feel fine participating in norm enforcement, but of course LessWrong is one such place!
Now, I think there are fine arguments to be made that norm enforcement should also happen at the explicit intellectual level and shouldn’t involve more expressive forms of speech. IDK, I am a bit sympathetic to that, but feel reasonably good about my choices here, especially given that Michael’s comment started with “I agree”, therefore implying that the things he was saying were somehow reflective of my personal opinion. It seems eminently natural that when you approach someone and say “hey, I totally agree with you that <X>” where X is something they vehemently disagree with (like, IDK imagine someone coming to you and saying “hey, I totally agree with you that child pornography should be legal” when you absolutely do not believe this), that they respond the kind of way I did.
Overall, feedback is still appreciated, but I think I would still write roughly the same comment in a similar situation!
Michael’s comment started with “I agree”, therefore implying that the things he was saying were somehow reflective of my personal opinion
Michael’s comment started with a specific point he agreed with you on.
I agree that this is an important thing that deserved more consideration in Eric’s analysis
He specifically phrased the part you were objecting to as his opinion, not as a shared point of view.
FWIW I wouldn’t say “trustworthiness” is the most important thing, more like “can be trusted to take AI risk seriously”, and my model is more about the latter.
I am pretty sure Michael thought he was largely agreeing with me. He wasn’t saying “I agree this thing is important, but here is this totally other thing that I actually think is more important”. He said (and meant to say) “I agree this thing is important, and here is a slightly different spin on it”. Feel free to ask him!
I claim you misread his original comment, as stated. Then you scolded him based on that misreading. I made the case you misread him via quotes, which you ignored, instead inviting me to ask him about his intentions. That’s your responsibility, not mine! I’d invite you to check in with him about his meaning yourself, and to consider doing that in the future before you scold.
I mean, I think his intention in communicating is the ground truth! I was suggesting his intentions as a way to operationalize the disagreement. Like, I am trying to check that you agree that if that was his intention, and I read it correctly, then you agree that you were wrong to say that I misread him. If that isn’t the case then we have a disagreement about the nature of communication on our hand, which I mean, we can go into, but doesn’t sound super exciting.
I do happen to be chatting with Michael sometime in the next few days, so I can ask. Happy to bet about what he says about what he intended to communicate! Like, I am not overwhelmingly confident, but you seem to present overwhelming confidence, so presumably you would be up for offering me a bet at good odds.
I would generally agree, but a mitigating factor here is that that MichaelDickens is presenting himself as agreeing with habryka. It seems more reasonable for habryka to strongly push back against statements that make claims about his own beliefs.
Yeah I pretty much agree with what you’re saying. But I think I misunderstood your comment before mine, and the thing you’re talking about was not captured by the model I wrote in my last comment; so I have some more thinking to do.
I didn’t mean “can be trusted to take AI risk seriously” as “indeterminate trustworthiness but cares about x-risk”, more like “the conjunction of trustworthy + cares about x-risk”.
ETA: I think this comment is missing some important things and I endorse Habryka’s reply more than I endorse this comment
I agree that this is an important thing that deserved more consideration in Eric’s analysis (I wrote a note about it on Oct 22 but then I forgot to include it in my post yesterday). But I don’t think it’s too hard to put into a model (although it’s hard to find the right numbers to use). The model I wrote down in my note is
30% chance Bores would oppose an AI pause / strong AI regulations (b/c it’s too “anti-innovation” or something)
40% chance Bores would support strong regulations
30% chance he would vote for strong regulations but not advocate for them
90% chance Bores would support weak/moderate AI regulations
My guess is that 2⁄3 of the EV comes from strong regulations and 1⁄3 from weak regulations (which I just came up with a justification for earlier today but it’s too complicated to fit in this comment), so these considerations reduce the EV to 37% (i.e., roughly divide EV by 3).
FWIW I wouldn’t say “trustworthiness” is the most important thing, more like “can be trusted to take AI risk seriously”, and my model is more about the latter. (A trustworthy politician who is honest about the fact that they don’t care about AI safety will not be getting any donations from me.)
No. Bad. Really not what I support. Strong disagree. Bad naive consequentialism.
Yes, of course I care about whether someone takes AI risk seriously, but if someone is also untrustworthy, in my opinion this serves as a multiplier of their negative impact on the world. I do not want to create scheming and untrustworthy stakeholders that start doing sketchy stuff around AI risk. That’s how really a lot of bad stuff in the past has already happened.
I think political donations to trustworthy and reasonable politicians who are open to AI X-risk, but don’t have an opinion on it are much better for the world (indeed, infinitely better due to inverted sign), than untrustworthy ones that do seem interested.
That said, I agree that you could put this in the model! I am not against quantitatively estimating integrity and trustworthiness, and think the model would be a bunch better for considering it.
No-true-Scotsman-ish counterargument: no-one who actually gets AI risk would engage in this kind of tomfoolery. This is the behavior of someone who almost got it, but then missed the last turn and stumbled into the den of the legendary Black Beast of Aaargh. In the abstract, I think “we should be willing to consider supporting literal Voldemort if we’re sure he has the correct model of AI X-risk” goes through.
The problem is that it just totally doesn’t work in practice, not even on pure consequentialist grounds:
You can never tell whether Voldemorts actually understand and believe your cause, or whether they’re just really good at picking the right things to say to get you to support them. No, not even if you’ve considered the possibility that they’re lying and you still feel sure they’re not. Your object-level evaluations just can’t be trusted. (At least, if they’re competent at their thing. And if they’re not just evil, but also bad at it, so bad you can tell when they’re being honest, why would you support them?)
Voldemorts and their plans are often more incompetent than they seem,[1] and when their evil-but-”effective” plan predictably blows up, you and your cause are going to suffer reputational damage and end up in a worse position than your starting one. (You’re not gonna find an Altman, you’ll find an SBF.)
Voldemorts are naturally predisposed to misunderstanding the AI risk in precisely the ways that later make them engage in sketchy stuff around it. They’re very tempted to view ASI as a giant pile of power they can grab. (They hallucinate the Ring when they look into the Black Beast’s den, if I’m to mix my analogies.)
In general, if you’re considering giving power to a really effective but untrustworthy person because they seem credibly aligned with your cause, despite their general untrustworthiness (they also don’t want to die to ASI!), you are almost certainly just getting exploited. These sorts of people should be avoided like wildfire. (Even in cases where you think you can keep them in check, you’re going to have to spend so much effort paranoidally looking over everything they do in search of gotchas that it almost certainly wouldn’t be worth it.)
Probably because of that thing where if a good person dramatically abandons their morals for the greater good, they feel that it’s a monumental enough sacrifice for the universe to take notice and make it worth it.
A lot of Paranoia: A Beginner’s Guide is actually trying to set up a bunch of the prerequisites for making this kind of argument more strongly. In particular, a feature of people who act in untrustworthy ways, and surround themselves with unprincipled people, is that they end up sacrificing most of their sanity on the altar of paranoia.
Like, fiction HPMoR Voldemort happened to not have any adversaries who could disrupt his OODA loop, but that was purely a fiction. A world with two Voldemort-level competent players results in two people nuking their sanity as they try to get one over each other, and at that point, you can’t really rely on them having good takes, or sane stances on much of anything (or, if they are genuinely smart enough, them making an actually binding alliance, which via utilization of things like unbreakable vows is surprisingly doable in the HPMoR universe, but which in reality runs into many more issues).
Tone note: I really don’t like people responding to other people’s claims with content like “No. Bad… Bad naive consequentialism” (I’m totally fine with “Really not what I support. Strong disagree.”). It reads quite strongly to me as trying to scold someone or socially punish them using social status for a claim that you disagree with; they feel continuous with some kind of frame that’s like “habryka is the arbiter of the Good”
It sounds like scolding someone because it is! Like, IDK, sometimes that’s the thing you want to do?
I mean, I am not the “arbiter of the good”, but like, many things are distasteful and should be reacted to as such. I react similarly to people posting LLM slop on LW (usually more in the form of “wtf, come on man, please at least write a response yourself, don’t copy paste from an LLM”) and many other things I see as norm violations.
I definitely consider the thing I interpreted Michael to be saying a norm violation of LessWrong, and endorse lending my weight to norm enforcement of that (he then clarified in a way that I think largely diffused the situation, but I think I was pretty justified in my initial reaction). Not all spaces I participate in are places where I feel fine participating in norm enforcement, but of course LessWrong is one such place!
Now, I think there are fine arguments to be made that norm enforcement should also happen at the explicit intellectual level and shouldn’t involve more expressive forms of speech. IDK, I am a bit sympathetic to that, but feel reasonably good about my choices here, especially given that Michael’s comment started with “I agree”, therefore implying that the things he was saying were somehow reflective of my personal opinion. It seems eminently natural that when you approach someone and say “hey, I totally agree with you that <X>” where X is something they vehemently disagree with (like, IDK imagine someone coming to you and saying “hey, I totally agree with you that child pornography should be legal” when you absolutely do not believe this), that they respond the kind of way I did.
Overall, feedback is still appreciated, but I think I would still write roughly the same comment in a similar situation!
Michael’s comment started with a specific point he agreed with you on.
He specifically phrased the part you were objecting to as his opinion, not as a shared point of view.
I am pretty sure Michael thought he was largely agreeing with me. He wasn’t saying “I agree this thing is important, but here is this totally other thing that I actually think is more important”. He said (and meant to say) “I agree this thing is important, and here is a slightly different spin on it”. Feel free to ask him!
I claim you misread his original comment, as stated. Then you scolded him based on that misreading. I made the case you misread him via quotes, which you ignored, instead inviting me to ask him about his intentions. That’s your responsibility, not mine! I’d invite you to check in with him about his meaning yourself, and to consider doing that in the future before you scold.
I mean, I think his intention in communicating is the ground truth! I was suggesting his intentions as a way to operationalize the disagreement. Like, I am trying to check that you agree that if that was his intention, and I read it correctly, then you agree that you were wrong to say that I misread him. If that isn’t the case then we have a disagreement about the nature of communication on our hand, which I mean, we can go into, but doesn’t sound super exciting.
I do happen to be chatting with Michael sometime in the next few days, so I can ask. Happy to bet about what he says about what he intended to communicate! Like, I am not overwhelmingly confident, but you seem to present overwhelming confidence, so presumably you would be up for offering me a bet at good odds.
FWIW I think Habryka was right to call out that some parts of my comment were bad, and the scolding got me to think more carefully about it.
I would generally agree, but a mitigating factor here is that that MichaelDickens is presenting himself as agreeing with habryka. It seems more reasonable for habryka to strongly push back against statements that make claims about his own beliefs.
Yeah I pretty much agree with what you’re saying. But I think I misunderstood your comment before mine, and the thing you’re talking about was not captured by the model I wrote in my last comment; so I have some more thinking to do.
I didn’t mean “can be trusted to take AI risk seriously” as “indeterminate trustworthiness but cares about x-risk”, more like “the conjunction of trustworthy + cares about x-risk”.