Maybe I’m misunderstanding you, but I’m not getting why having the ability to discuss involves actually discussing. Compare two ways to build a triskaidekaphobic calculator.
1. You build a normal calculator correctly, and at the end you add a line of code IF ANSWER == 13, PRINT: “ERROR: IT WOULD BE IMPOLITE OF ME TO DISCUSS THIS PARTICULAR QUESTION”.
2. You somehow invent a new form of mathematics that “naturally” never comes up with the number 13, and implement it so perfectly that a naive observer examining the calculator code would never be able to tell which number you were trying to avoid.
Imagine some people who were trying to take the cosines of various angles. If they used method (1), they would have no problem, since cosines are never 13. If they used method (2), it’s hard for me to imagine exactly how this would work but probably they would have a lot of problems.
It sounds like the proposal you’re arguing against (and which I want to argue for) - not talking about taboo political issues on LW—is basically (1). We discuss whatever we want, we use logic which (we hope) would output the correct (taboo) answer on controversial questions, but if for some reason those questions come up (which they shouldn’t, because they’re pretty different from AI-related questions), we instead don’t talk about them. If for some reason they’re really relevant to some really important issue at some point, then we take the hit for that issue only, with lots of consultation first to make sure we’re not stuck in the Unilateralist’s Curse.
This seems like the right answer even in the metaphor—if people burned down calculator factories whenever any of their calculators displayed “13”, and the sorts of problems people used calculators for almost never involved 13, just have the calculator display an error message at that number.
(...plus doing other activism and waterline-raising work to deal with the fact that your society is insane, but that work isn’t going to look like having your calculators display 13 and dying when your factory burns down)
Sure, I’m happy to have separate discussion forums for different topics. For example, I wouldn’t want people talking about football on /r/mylittlepony—that would be crazy![1]
“Take it to /r/TheMotte, you guys” is not that onerous of a demand, and it’s a demand I’m happy to support: I really like the Less Wrong æsthetic of doing everything at the meta level.[2]
But Hubinger seems to argue that the demand should be, “Take it offline,” and that seems extremely onerous to me.
The operative principle here is “Permalink or It Didn’t Happen”: if it’s not online, does it really exist? I mean, okay, there’s a boring literal sense in which it “exists”, but does it exist in a way that matters?
If they used method (2), it’s hard for me to imagine exactly how this would work but probably they would have a lot of problems.
In terms of the calculator metaphor, imagine having to use a triskaidekaphobic calculator multiple times as part of solving a complicated problem with many intermediate results. Triskaidekaphobia doesn’t just break your ability to compute 6 + 7. It breaks your ability to compute the infinite family of expressions that include 13 as an intermediate result, like (6 + 7) + 1. It breaks the associativity of addition, because now you can’t count on (6 + 7) + 1 being the same as 6 + (7 + 1).[3] And so on.
Also, what Interstice said.
with lots of consultation first to make sure we’re not stuck in the Unilateralist’s Curse.
So, I agree that regression to the mean exists, which implies that the one who unilaterally does a thing is likely to be the one who most overestimated the value of the thing. I’m very suspicious of this “Unilateralist’s Curse” meme being motivatedly and selectively wielded as an excuse for groupthink and conformity, to police and intimidate anyone in your cult[4] who tries to do anything interesting that might damage the “rationalist” or “effective altruism” brand names.
In the context of existential risk, regression to the mean recommends a principle of conformity because you don’t want everyone deciding independently that their own AI is probably safe. But if we’re talking about censorship about boring ordinary things like political expression or psychology research (not nukes or bioweapons), I suspect considering regression to the mean makes policy recommendations in the other direction.[5] I’ve been meaning to write a post to be titled “First Offender Models and the Unilateralist’s Blessing”: if shattering a preference-falsification equilibrium would be good for Society, but requires some brave critical group to eat the upfront cost, then that group is likely to be the one that most underestimated the costs to themselves. Maybe we should let them!
if people burned down calculator factories whenever any of their calculators displayed “13”
If.
On the other hand, if people threatened to burn down factories that produced correct calculators but were obviously bluffing, or if they TPed the factories, then calculator manufacurers who care about correct arithmetic might find it better policy to say, “I don’t negotiate with terrorists![6]Do your worst, you superstitious motherfuckers!”
It would be nice if calculator manufacturers with different risk-tolerances or decision theories could manage to cooperate with each other. For a cute story about a structually similar scenario in which that kind of cooperation doesn’t emerge, see my latest Less Wrong post, “Funk-tunul’s Legacy; Or, The Legend of the Extortion War.”[7]
At this point the hypothetical adversary in my head is saying, “Zack, you’re being motivatedly dense—you know damned well why that example of separate forums isn’t analogous!” I reply, “Yeah, sorry, sometimes the text-generating process in my head is motivatedly dense to make a rhetorical point when I understand the consideration my interlocutor is trying to bring up, but I consider it non-normative, the sort of thing an innocent being wouldn’t understand. Call it angelic irony, after the angels in Unsong who can’t understand deception. It’s not intellectually dishonest if I admit I’m doing it in a footnote.”
In an earlier draft of this comment, this phrase was written in the first person: “our cult.” (Yes, this is a noncentral usage of the word cult, but I think the hyperlink to “Every Cause Wants To Be”, and this footnote, is adequate to clarify what I mean.) On consideration, the second person seems more appropriate, because by now I think I’ve actually reached the point of pseudo-ragequitting the so-called “rationalist” community. “Pseudo” because once you’ve spent your entire adult life in a cult, you can’t realistically leave, because your vocabulary has been trained so hard on the cult’s foundational texts that you can’t really talk to anyone else. Instead, what happens is you actually become more active in intra-cult discourse, except being visibly contemptuous about it (putting the cult’s name in scare quotes, using gratuitous cuss words, being inappropriately socially-aggressive to the cult leaders, &c.).
I tend to use this slogan and appeals to timeless decision theory a lot in the context of defying censorship (example), but I very recently realized that this was kind of stupid and/or intellectually dishonest of me. The application of decision theory to the real world can get very complicated very quickly: if the math doesn’t turn out the way I hope, am I actually going to change my behavior? Probably not. Therefore I shouldn’t pretend that my behavior is the result of sophisticated decision-theoretic computations on my part, when the real explanation is a raw emotional disposition that might be usefully summarized in English as, “Do your worst, you motherfuckers!” That disposition probably is the result of a sophisticated decision-theoretic computation—it’s just that it was a distributed computation that took place over thousands of years in humanity’s environment of evolutionary adaptedness.
But you should be suspicious of the real-world relevance of my choice of modeling assumptions in accordance with the psychological considerations in the previous two footnotes, especially since I kind of forced it because it’s half past one in the morning and I really really wanted to shove this post out the door.
I agree much of psychology etc are bad for the reasons you state, but this doesn’t seem to be because everyone else has fried their brains by trying to simulate how to appease triskaidekaphobics too much. It’s because the actual triskaidekaphobics are the ones inventing the psychology theories. I know a bunch of people in academia who do various verbal gymnastics to appease the triskaidekaphobics, and when you talk to them in private they get everything 100% right.
I agree that most people will not literally have their buildings burned down if they speak out against orthodoxies (though there’s a folk etymology for getting fired which is relevant here). But I appreciate Zvi’s sequence on super-perfect competition as a signpost of where things can end up. I don’t think academics, organization leaders, etc. are in super-perfect competition the same way middle managers are, but I also don’t think we live in the world where everyone has infinite amounts of slack to burn endorsing taboo ideas and nothing can possibly go wrong.
when you talk to them in private they get everything 100% right.
I’m happy for them, but I thought the point of having taxpayer-funded academic departments was so that people who aren’t insider experts can have accurate information with which to inform decisions? Getting the right answer in private can only help those you talk to in private.
I also don’t think we live in the world where everyone has infinite amounts of slack to burn endorsing taboo ideas and nothing can possibly go wrong.
Can you think of any ways something could possibly go wrong if our collective map of how humans work fails to reflect the territory?
(I drafted a vicious and hilarious comment about one thing that could go wrong, but I fear that site culture demands that I withhold it.)
“Take it to /r/TheMotte, you guys” is not that onerous of a demand, and it’s a demand I’m happy to support
I’d agree having political discussions in some other designated place online is much less harmful than having them here, but on the other hand, a quick look at what’s being posted on the Motte doesn’t support the idea that rationalist politics discussion has any importance for sanity on more general topics. If none of it had been posted, as far as I can tell, the rationalist community wouldn’t have been any more wrong on any major issue.
real-world floating-point numbers (which your standard desk calculator uses)
Not that it matters, but I expect (and Google seems to confirm) most calculators will use something else, mostly fixed point decimal-based arithmetic. I don’t offhand know if that’s associative.
In the analogy, it’s only possible to build a calculator that outputs the right answer on non-13 numbers because you already understand the true nature of addition. It might be more difficult if you were confused about addition, and were trying to come up with a general theory by extrapolating from known cases—then, thinking 6 + 7 = 15 could easily send you down the wrong path.
In the real world, we’re similarly confused about human preferences, mind architecture, the nature of politics, etc., but some of the information we might want to use to build a general theory is taboo. I think that some of these questions are directly relevant to AI—e.g. the nature of human preferences is relevant to building an AI to satisfy those preferences, the nature of politics could be relevant to reasoning about what the lead-up to AGI will look like, etc.
Maybe I’m misunderstanding you, but I’m not getting why having the ability to discuss involves actually discussing. Compare two ways to build a triskaidekaphobic calculator.
1. You build a normal calculator correctly, and at the end you add a line of code IF ANSWER == 13, PRINT: “ERROR: IT WOULD BE IMPOLITE OF ME TO DISCUSS THIS PARTICULAR QUESTION”.
2. You somehow invent a new form of mathematics that “naturally” never comes up with the number 13, and implement it so perfectly that a naive observer examining the calculator code would never be able to tell which number you were trying to avoid.
Imagine some people who were trying to take the cosines of various angles. If they used method (1), they would have no problem, since cosines are never 13. If they used method (2), it’s hard for me to imagine exactly how this would work but probably they would have a lot of problems.
It sounds like the proposal you’re arguing against (and which I want to argue for) - not talking about taboo political issues on LW—is basically (1). We discuss whatever we want, we use logic which (we hope) would output the correct (taboo) answer on controversial questions, but if for some reason those questions come up (which they shouldn’t, because they’re pretty different from AI-related questions), we instead don’t talk about them. If for some reason they’re really relevant to some really important issue at some point, then we take the hit for that issue only, with lots of consultation first to make sure we’re not stuck in the Unilateralist’s Curse.
This seems like the right answer even in the metaphor—if people burned down calculator factories whenever any of their calculators displayed “13”, and the sorts of problems people used calculators for almost never involved 13, just have the calculator display an error message at that number.
(...plus doing other activism and waterline-raising work to deal with the fact that your society is insane, but that work isn’t going to look like having your calculators display 13 and dying when your factory burns down)
Sure, I’m happy to have separate discussion forums for different topics. For example, I wouldn’t want people talking about football on /r/mylittlepony—that would be crazy![1]
“Take it to /r/TheMotte, you guys” is not that onerous of a demand, and it’s a demand I’m happy to support: I really like the Less Wrong æsthetic of doing everything at the meta level.[2]
But Hubinger seems to argue that the demand should be, “Take it offline,” and that seems extremely onerous to me.
The operative principle here is “Permalink or It Didn’t Happen”: if it’s not online, does it really exist? I mean, okay, there’s a boring literal sense in which it “exists”, but does it exist in a way that matters?
The problem is that between the massive evidential entanglement between facts, the temptation to invent fake epistemology lessons to justify conclusions that you couldn’t otherwise get on the merits, and the set of topics that someone has an interest in distorting being sufficiently large, I think we do end up with the analogue of nonsense-math in large areas of psychology, sociology, political science, history, &c. Which is to say, life.
In terms of the calculator metaphor, imagine having to use a triskaidekaphobic calculator multiple times as part of solving a complicated problem with many intermediate results. Triskaidekaphobia doesn’t just break your ability to compute
6 + 7
. It breaks your ability to compute the infinite family of expressions that include 13 as an intermediate result, like(6 + 7) + 1
. It breaks the associativity of addition, because now you can’t count on(6 + 7) + 1
being the same as6 + (7 + 1)
.[3] And so on.Also, what Interstice said.
So, I agree that regression to the mean exists, which implies that the one who unilaterally does a thing is likely to be the one who most overestimated the value of the thing. I’m very suspicious of this “Unilateralist’s Curse” meme being motivatedly and selectively wielded as an excuse for groupthink and conformity, to police and intimidate anyone in your cult[4] who tries to do anything interesting that might damage the “rationalist” or “effective altruism” brand names.
In the context of existential risk, regression to the mean recommends a principle of conformity because you don’t want everyone deciding independently that their own AI is probably safe. But if we’re talking about censorship about boring ordinary things like political expression or psychology research (not nukes or bioweapons), I suspect considering regression to the mean makes policy recommendations in the other direction.[5] I’ve been meaning to write a post to be titled “First Offender Models and the Unilateralist’s Blessing”: if shattering a preference-falsification equilibrium would be good for Society, but requires some brave critical group to eat the upfront cost, then that group is likely to be the one that most underestimated the costs to themselves. Maybe we should let them!
If.
On the other hand, if people threatened to burn down factories that produced correct calculators but were obviously bluffing, or if they TPed the factories, then calculator manufacurers who care about correct arithmetic might find it better policy to say, “I don’t negotiate with terrorists![6] Do your worst, you superstitious motherfuckers!”
It would be nice if calculator manufacturers with different risk-tolerances or decision theories could manage to cooperate with each other. For a cute story about a structually similar scenario in which that kind of cooperation doesn’t emerge, see my latest Less Wrong post, “Funk-tunul’s Legacy; Or, The Legend of the Extortion War.”[7]
At this point the hypothetical adversary in my head is saying, “Zack, you’re being motivatedly dense—you know damned well why that example of separate forums isn’t analogous!” I reply, “Yeah, sorry, sometimes the text-generating process in my head is motivatedly dense to make a rhetorical point when I understand the consideration my interlocutor is trying to bring up, but I consider it non-normative, the sort of thing an innocent being wouldn’t understand. Call it angelic irony, after the angels in Unsong who can’t understand deception. It’s not intellectually dishonest if I admit I’m doing it in a footnote.”
Although as Wei Dai points out, preceded by an earlier complaint by Vanessa Kosoy, this does carry the cost of encouraging hidden agendas.
This is the part where a pedant points out that real-world floating-point numbers (which your standard desk calculator uses) aren’t associative anyway. I hope there aren’t any pedants on this website!
In an earlier draft of this comment, this phrase was written in the first person: “our cult.” (Yes, this is a noncentral usage of the word cult, but I think the hyperlink to “Every Cause Wants To Be”, and this footnote, is adequate to clarify what I mean.) On consideration, the second person seems more appropriate, because by now I think I’ve actually reached the point of pseudo-ragequitting the so-called “rationalist” community. “Pseudo” because once you’ve spent your entire adult life in a cult, you can’t realistically leave, because your vocabulary has been trained so hard on the cult’s foundational texts that you can’t really talk to anyone else. Instead, what happens is you actually become more active in intra-cult discourse, except being visibly contemptuous about it (putting the cult’s name in scare quotes, using gratuitous cuss words, being inappropriately socially-aggressive to the cult leaders, &c.).
But I have pretty intense psychological reasons to want to believe this, so maybe you shouldn’t believe me until I actually come up with the math.
I tend to use this slogan and appeals to timeless decision theory a lot in the context of defying censorship (example), but I very recently realized that this was kind of stupid and/or intellectually dishonest of me. The application of decision theory to the real world can get very complicated very quickly: if the math doesn’t turn out the way I hope, am I actually going to change my behavior? Probably not. Therefore I shouldn’t pretend that my behavior is the result of sophisticated decision-theoretic computations on my part, when the real explanation is a raw emotional disposition that might be usefully summarized in English as, “Do your worst, you motherfuckers!” That disposition probably is the result of a sophisticated decision-theoretic computation—it’s just that it was a distributed computation that took place over thousands of years in humanity’s environment of evolutionary adaptedness.
But you should be suspicious of the real-world relevance of my choice of modeling assumptions in accordance with the psychological considerations in the previous two footnotes, especially since I kind of forced it because it’s half past one in the morning and I really really wanted to shove this post out the door.
I agree much of psychology etc are bad for the reasons you state, but this doesn’t seem to be because everyone else has fried their brains by trying to simulate how to appease triskaidekaphobics too much. It’s because the actual triskaidekaphobics are the ones inventing the psychology theories. I know a bunch of people in academia who do various verbal gymnastics to appease the triskaidekaphobics, and when you talk to them in private they get everything 100% right.
I agree that most people will not literally have their buildings burned down if they speak out against orthodoxies (though there’s a folk etymology for getting fired which is relevant here). But I appreciate Zvi’s sequence on super-perfect competition as a signpost of where things can end up. I don’t think academics, organization leaders, etc. are in super-perfect competition the same way middle managers are, but I also don’t think we live in the world where everyone has infinite amounts of slack to burn endorsing taboo ideas and nothing can possibly go wrong.
I’m happy for them, but I thought the point of having taxpayer-funded academic departments was so that people who aren’t insider experts can have accurate information with which to inform decisions? Getting the right answer in private can only help those you talk to in private.
Can you think of any ways something could possibly go wrong if our collective map of how humans work fails to reflect the territory?
(I drafted a vicious and hilarious comment about one thing that could go wrong, but I fear that site culture demands that I withhold it.)
I’d agree having political discussions in some other designated place online is much less harmful than having them here, but on the other hand, a quick look at what’s being posted on the Motte doesn’t support the idea that rationalist politics discussion has any importance for sanity on more general topics. If none of it had been posted, as far as I can tell, the rationalist community wouldn’t have been any more wrong on any major issue.
Not that it matters, but I expect (and Google seems to confirm) most calculators will use something else, mostly fixed point decimal-based arithmetic. I don’t offhand know if that’s associative.
Well this is awkward.
In the analogy, it’s only possible to build a calculator that outputs the right answer on non-13 numbers because you already understand the true nature of addition. It might be more difficult if you were confused about addition, and were trying to come up with a general theory by extrapolating from known cases—then, thinking 6 + 7 = 15 could easily send you down the wrong path. In the real world, we’re similarly confused about human preferences, mind architecture, the nature of politics, etc., but some of the information we might want to use to build a general theory is taboo. I think that some of these questions are directly relevant to AI—e.g. the nature of human preferences is relevant to building an AI to satisfy those preferences, the nature of politics could be relevant to reasoning about what the lead-up to AGI will look like, etc.