I do indeed consider that your evidence (“Eliezer Yudkowsky called Richard Loosemore an idiot!”) is not good enough to establish the claim you were making (“we should expect LW people to shoot the messenger if someone reports a refutation of an idea that’s been important here”).
However, the point isn’t that “very strong evidence is needed”, the point is that the evidence you offered is very weak.
(Maybe you disagree and think the evidence you offered is not very weak. If so, maybe you’d like to explain why. I’ve explained in more detail why I think it such weak evidence, elsewhere in this thread. Your defence of it has mostly amounted to “it is so an ad hominem”, as if the criticism had been “TAG says it was an ad hominem but it wasn’t”; again, I’ve explained elsewhere in the thread why I think that entirely misses the point.)
For what it’s worth, I think it’s possible that he is, in the relevant sense. As I said elsewhere, the most likely scenario in which EY is wrong about RL being an “idiot” (by which, to repeat, I take it he meant “person obstinately failing to grasp an essential point”) is one in which on the relevant point RL is right and EY wrong, in which case EY would indeed be an “idiot”.
But let’s suppose you’re right. What of it? I thought the question here was whether LW people shoot the messenger, not whether my opinions of Eliezer Yudkowsky and Richard Loosemore are perfectly symmetrical.
In common sense terms, telling an audience that the messenger is an idiot who shouldn’t be listened to because he’s an idiot, is shooting the messenger. It’s about as central an classic an example you can get. What else would it be?
Unfortunately some messengers are idiots (we have already established that most likely either Yudkowsky or Loosemore is an idiot, in this particular scenario). Saying that someone is an idiot isn’t shooting the messenger in any culpable sense if in fact they are an idiot, nor if the person making the accusation has reasonable grounds for thinking they are.
So I guess maybe we actually have to look at the substance of Loosemore’s argument with Yudkowsky. So far as I can make out, it goes like this.
Yudkowsky says: superintelligent AI could well be dangerous, because despite our efforts to arrange for it to do things that suit us (e.g., trying to program it to do things that make us happy) a superintelligent AI might decide to do things that in fact are very bad for us, and if it’s superintelligent then it might well also be super-powerful (on account of being super-persuasive, or super-good at acquiring money via the stock market, or super-good at understanding physics better, or etc.).
Loosemore says: this is ridiculous, because if an AI were really superintelligent in any useful sense then it would be smart enough to see that (e.g.) wireheading all the humans isn’t really what we wanted; if it isn’t smart enough to understand that then it isn’t smart enough to (e.g.) pass the Turing test, to convince us that it’s smart, or to be an actual threat; for that matter, the researchers working on it would have turned it off long before, because its behaviour would necessarily have been bizarrely erratic in other domains besides human values.
The usual response to this by LW-ish people is along the lines of “you’re assuming that a hypothetical AI, on finding an inconsistency between its actual values and the high-level description of ‘doing things that suit its human creators’, would realise that its actual values are crazy and adjust them to match that high-level description better; but that is no more inevitable than that humans, on finding inconsistencies between our actual values and the high-level description of ‘doing things that lead us to have more surviving descendants’, would abandon our actual values in order to better serve the values of Evolution”. To me this seems sufficient to establish that Loosemore has not shown that a hypothetical AI couldn’t behave in clearly-intelligent ways that mostly work towards a given broad goal, but in some cases diverge greatly from it.
There’s clearly more to be said here, but this comment is already rather long, so I’ll skip straight to my conclusion: maybe there’s some version of Loosemore’s argument that’s salvageable as an argument against Yudkowsky-type positions in general, but it’s not clear to me that there is, and while I personally wouldn’t have been nearly as rude as Yudkowsky was I think it’s very much not clear that Yudkowsky was wrong. (With, again, the understanding that “idiot” here doesn’t mean e.g. “person scoring very badly in IQ tests” but something like “person who obstinately fails to grasp a fundamental point of the topic under discussion”.)
I don’t think it’s indefensible to say that Yudkowsky was shooting the messenger in this case. But, please note, your original comment was not about what Yudkowsky would do; it was about what the LW community in general would do. What did the LW community in general think about Yudkowsky’s response to Loosemore? They downvoted it to hell, and several of them continued to discuss things with Loosemore.
One rather prominent LWer (Kaj Sotala, who I think is an admin or a moderator or something of the kind here) wrote a lengthy post in which he opined that Loosemore (in the same paper that was being discussed when Yudkowsky called Loosemore an idiot) had an important point. (I think, though, that he would agree with me that Loosemore has not demonstrated that Yudkowsky-type nightmare scenarios are anything like impossible, contra Loosemore’s claim in that paper that “this entire class of doomsday scenarios is found to be logically incoherent at such a fundamental level that they can be dismissed”, which I think is the key question here. Sotala does agree with Loosemore than some concrete doomsday scenarios are very implausible.) He made a linkpost for that here on LW. How did the community respond? Well, that post is at +23, and there are a bunch of comments discussing it in what seem to me like constructive terms.
So, I reiterate: it seems to me that you’re making a large and unjustified leap from “Yudkowsky called Loosemore an idiot” to “LW should be expected to shoot the messenger”. Y and L had a history of repeatedly-unproductive interactions in the past; L’s paper pretty much called Y an idiot anyway (by implication, not as frankly as Y called L an idiot); there’s a pretty decent case to e made that L was an idiot in the relevant sense; other LWers did not shoot Loosemore even when EY did, and when his objections were brought up again a few years later there was no acrimony.
[EDITED to add:] And of course this is only one case; even if Loosemore were a 100% typical example of someone making an objection to EY’s arguments, and even if we were interested only in EY’s behaviour and not anyone else, the inference from “EY was obnoxious to RL” to “EY generally shoots the messenger” is still pretty shaky.
My reconstruction of Loosemore’s point is that an AI wouldnt have two sets of semantics , one for interpreting verbal commands, and another for negotiating the world and doing things.
My reconstruction of Yudkowkys
argument is that it depends on what I’ve been calling the Ubiquitous Utility Function. If you think of any given AI as having a separate module where its goals or values are hard coded then the idea that they were hard coded wrong, but the AI is helpless to change them, is plausible.
Actual AI researchers don’t believe in ubiquitous UF’s because only a few architectures gave them. EY believes in them for reasons unconnected with empirical evidence about AI architectures.
If Loosemore’s point is only that an AI wouldn’t have separate semantics for those things, then I don’t see how it can possibly lead to the conclusion that concerns about disastrously misaligned superintelligent AIs are absurd.
I do not think Yudkowsky’s arguments assume that an AI would have a separate module in which its goals are hard-coded. Some of his specific intuition-pumping thought experiments are commonly phrased in ways that suggest that, but I don’t think it’s anything like an essential assumption in any case.
E.g., consider the “paperclip maximizer” scenario. You could tell that story in terms of a programmer who puts something like “double objective_function() { return count_paperclips(DESK_REGION); }” in their AI’s code. But you could equally tell it in terms of someone who makes an AI that does what it’s told, and whose creator says “Please arrange for there to be as many paperclips as possible on my desk three hours from now.”.
(I am not claiming that any version of the “paperclip maximizer” scenario is very realistic. It’s a nice simple example to suggest the kind of thing that could go wrong, that’s all.)
Loosemore would say: this is a stupid scenario, because understanding human language in particular implies understanding that that isn’t really a request to maximize paperclips at literally any cost, and an AI that lacks that degree of awareness won’t be any good at navigating the world. I would say: that’s a reasonable hope but I don’t think we have anywhere near enough understanding of how AIs could possibly work to be confident of that; e.g., some humans are unusually bad at that sort of contextual subtlety, and some of those humans are none the less awfully good at making various kinds of things happen.
Loosemore claims that Yudkowsky-type nightmare scenarios are “logically incoherent at a fundamental level”. If all that’s actually true is that an AI triggering such a scenario would have to be somewhat oddly designed, or would have to have a rather different balance of mental capabilities than an average human being, then I think his claim is very very wrong.
If Loosemore’s point is only that an AI wouldn’t have separate semantics for those things, then I don’t see how it can possibly lead to the conclusion that concerns about disastrously misaligned superintelligent AIs are absurd.
If there’s one principle argument that it is highly likely for an ASI to be an existential threat, then refuting it refutes claims about ASI and existential threat.
Maybe you think there are other arguments.
E.g., consider the “paperclip maximizer” scenario. You could tell that story in terms of a programmer who puts something like “double objective_function() { return count_paperclips(DESK_REGION); }” in their AI’s code. But you could equally tell it in terms of someone who makes an AI that does what it’s told, and whose creator says “Please arrange for there to be as many paperclips as possible on my desk three hours from now.”.
If it obeys verbal commands ,you could to it to stop at any time. That’s not a strong likelihood of existential threat. How could.it kill us all in three hours?
loosemore claims that Yudkowsky-type nightmare scenarios are “logically incoherent at a fundamental level”. If all that’s actually true is that an AI triggering such a scenario would have to be somewhat oddly designed,
I’ll say! Its logically possible to design a car without brakes or a steering wheel, but it’s not likely. Now you don’t have an argument in favour of there being a strong likelihood of existential threat from ASI.
If Loosemore’s point is only that an AI wouldn’t have separate semantics for “interpreting commands” and for “navigating the world and doing things”, then he hasn’t refuted “one principal argument” for ASI danger; he hasn’t refuted any argument for it that doesn’t actually assume that an AI must have separate semantics for those things. I don’t think any of the arguments actually made for ASI danger make that assumption.
I think the first version of the paperclip-maximizer scenario I encountered had the hapless AI programmer give the AI its instructions (“as many paperclips as possible by tomorrow morning”) and then go to bed, or something along those lines.
You seem to be conflating “somewhat oddly designed” with “so stupidly designed that no one could possibly think it was a good idea”. I don’t think Loosemore has made anything resembling a strong case for the latter; it doesn’t look to me as if he’s even really tried.
For Yudkowskian concerns about AGI to be worth paying attention to, it isn’t necessary that there be a “strong likelihood” of disaster if that means something like “at least a 25% chance”. Suppose it turns out that, say, there are lots of ways to make something that could credibly be called an AGI, and if you pick a random one that seems like it might work then 99% of the time you get something that’s perfectly safe (maybe for Loosemore-type reasons) but 1% of the time you get disaster. It seems to me that in this situation it would be very reasonable to have Yudkowsky-type concerns. Do you think Loosemore has given good reason to think that things are much better than that?
Here’s what seems to me the best argument that he has (but, of course, this is just my attempt at a steelman, and maybe your views are quite different): “Loosemore argues that if you really want to make an AGI then you would have to be very foolish to do it in a way that’s vulnerable to Yudkowsky-type problems, even if you weren’t thinking about safety at all. So potential AGI-makers fall into two classes: the stupid ones, and the ones who are taking approaches that are fundamentally immune to the failure modes Yudkowsky worries about. Yudkowsky hopes for intricate mathematical analyses that will reveal ways to build AGI safely, but the stupid potential AGI engineers won’t be reading those analyses, won’t be able to understand them, and won’t be able to follow their recommendations, and the not-stupid ones won’t need them. So Yudkowsky’s wasting his time.”
The main trouble with this is that I don’t see that Loosemore has made a good argument that if you really want to make an AGI then you’d be stupid to do it in a way that’s vulnerable to Yudkowsky-type concerns. Also, I think Yudkowsky hopes to find ways of thinking about AI that both make something like provable safety achievable and clarify what’s needed for AI in a way that makes it easier to make an AI at all, in which case, it might not matter what everyone else is doing.
In any case, this is all a bit of a sidetrack. The point is: Loosemore claimed that the sort of thing Yudkowsky worries about is “logically incoherent at [] a fundamental level”, but even being maximally generous to his arguments I think it’s obvious that he hasn’t shown that; there is a reasonable case to be made that he simply hasn’t understood some of what Yudkowsky has been saying; that is what Y meant by calling L a “permanent idiot”; whether or not detailed analysis of Y’s and L’s arguments ends up favouring one or the other, this is sufficient to suggest that (at worst) what we have here is a good ol’ academic feud where Y has a specific beef with L, which is not at all the same thing as a general propensity for messenger-shooting.
And, to repeat the actually key point: what Yudkowsky did on one occasion is not strong evidence for what the Less Wrong community at large should be expected to do on a future occasion, and I am still waiting (with little hope) for you to provide some of the actual examples you claim to have where the Less Wrong community at large responded with messenger-shooting to refutations of their central ideas. As mentioned elsewhere in the thread, my attempts to check your claims have produced results that point in the other direction; the nearest things I found to at-all-credibly-claimed refutations of central LW ideas met with positive responses from LW: upvotes, reasonable discussion, no messenger-shooting.
“(I haven’t downvoted this question nor any of Haziq’s others; but my guess is that this one was downvoted because it’s only a question worth asking if Halpern’s counterexample to Cox’s theorem is a serious problem, which johnswentworth already gave very good reasons for thinking it isn’t in response to one of Haziq’s other questions; so readers may reasonably wonder whether he’s actually paying any attention to the answers his questions get. Haziq did engage with johnswentworth in that other question—but from this question you’d never guess that any of that had happened.)”
Sorry, haven’t checked LW in a while. I actually came across this comment when I was trying to delete my LW account due to the “shoot the messenger” phenomenon that TAG was describing.
I do not think that Johnwentworth’s answer is satisfactory. In his response to my previous question, he claims that Cox’s theorem holds under very specific conditions which doesn’t happen in most cases. He also claims that probability as extended logic is justified by empirical evidence. I don’t think this is a good justification unless he happens to have an ACME plausibility-o-meter.
David Chapman, another messenger you (meaning LW) were too quick to shoot, explains the issues with subjective Bayesianism here:
I do agree that this framework is useful but only in the same sense that frequentism is useful. I consider myself a ”pragmatic statistician “ who doesn’t hesitate to use frequentist or Bayesian methods, as long as they are useful, because the justifications for either seem to be equally worse.
‘It might turn out that the way Cox’s theorem is wrong is that the requirements it imposes for a minimally-reasonable belief system need strengthening, but in ways that we would regard as reasonable. In that case there would still be a theorem along the lines of “any reasonable way of structuring your beliefs is equivalent to probability theory with Bayesian updates”.’
I find this statement to be quite disturbing because it seems to me that you are assuming Jaynes-Cox theory to be true first and then trying to find a proof for it. Sounds very much like confirmation bias. Van Horn’s paper could potentially revive Cox’s theorem but nobody’s talking about it because they are not ready to accept that Cox’s theorem has any issues in the first place.
I think the messenger-shooting is quite apparent in LW. It’s the reason why posts that oppose or criticise the “tenets of LW”, that LW members adhere to in a cult-like fashion, are so scarce. For instance, Chapman’s critique of LW seems to have been ignored altogether.
“The most dangerous ideas in a society are not the ones being argued, but the ones that are assumed.”
I guess there’s not that much point responding to this, since Haziq has apparently now deleted his account, but it seems worth saying a few words.
Haziq says he’s deleting his account because of LW’s alleged messenger-shooting, but I don’t see any sign that he was ever “shot” in any sense beyond this: one of his several questions received a couple of downvotes.
What johnswentworth’s answer about Cox’s theorem says isn’t at all that it “holds under very specific conditions which doesn’t happen in most cases”.
You’ll get no objection from me to the idea of being a “pragmatic statistician”.
No, I am not at all “assuming Jaynes-Cox theory to be true first and then trying to find a proof for it”. I am saying: the specific scenario you describe (there’s a hole in the proof of Cox’s theorem) might play out in various ways and here are some of them. Some of them would mean something a bit like “Less Wrong is dead” (though, I claim, not exactly that); some of them wouldn’t. I mentioned some of both.
I can’t speak for anyone else here, but for me Cox’s theorem isn’t foundational in the sort of way it sounds as if you think it is (or should be?). If Cox’s theorem turns out to be disastrously wrong, that would be very interesting, but rather little of my thinking depends on Cox’s theorem. It’s a bit as if you went up to a Christian and said “Is Christianity dead if the ontological argument is invalid?”; most Christians aren’t Christians because they were persuaded by the ontological argument, and I think most Bayesians (in the sense in which LW folks are mostly Bayesians) aren’t Bayesians because they were persuaded by Cox’s theorem.
I do not know what it would mean to adhere to, say, Cox’s theorem “in a cult-like fashion”.
The second-last bullet point there is maybe the most important, and warrants a bit more explanation.
Whether anything “similar enough” to Cox’s theorem is true or not, the following things are (I think) rather uncontroversial:
We should hold most (maybe all) of our opinions with some degree of uncertainty.
One possible way of thinking about opinions-with-uncertainty is as probabilities.
If we think about our beliefs this way, then there are some theorems telling us how to adjust them in the light of new evidence, how our beliefs about various logically-related propositions should be related, etc.
No obviously-better general approach to quantifying strength of beliefs is known.
To be clear, this doesn’t mean that nothing is known that is ever better at anything than probability theory with Bayesian updates. E.g., “provably approximately correct learning” isn’t (so far as I know) equivalent to anything Bayesian, and it gives some nice guarantees that (so far as I know) no Bayesian approach is known to give. So when what you want is what PACL theory gives you, you should be using PACL.
For me, these are sufficient to justify a generally-Bayes-flavoured approach, by which I mean:
Of course I don’t literally attach numerical probabilities to all my beliefs. Nor do I think it’s obvious that any real reasoner, given the finite resources we inevitably have, should be explicitly probabilistic about everything.
But if for some reason I need to think clearly about how plausible I find various possibilities, I generally do it in terms of probabilities. (Taking care, e.g., to notice when there is danger of double-counting things, which is an easy way to go wrong when applying Bayesian probability naively.)
If I notice that some element of how I think is outright inconsistent with this way of quantifying uncertainty, consider whether it’s mistaken (albeit possibly a useful approximation). E.g., most people’s intuitive judgements of how likely things are produce instances of the (poorly named) “conjunction fallacy”, plausibly often because we tend to apply the “representativeness heuristic”; on reflection I think this genuinely does indicate places where our intuitive judgements are mistaken, and trying to notice it happening and do something cleverer is of some value.
(None of this feels very cultish to me.)
Finding a bug in the proof of Cox’s theorem doesn’t do anything to invalidate any of the above. Finding a concrete case of a structure other than probabilities-with-Bayesian-updates does better (in some sense of “better”) on a problem resembling actual real-world reasoning absolutely might; in particular, it might make it false that “no obviously-better general approach to quantifying strength of beliefs is known”. Halpern’s counterexample to Cox is not, so far as I can tell, like that; it depends essentially on a sort of “sparsity” that doesn’t hold if what you’re trying to assign credences to is all propositions you might consider about how the world is. I think (indeed, I think Halpern pointed this out, but I may be misremembering) you can fix up the proof by adding an assumption saying that this sort of sparsity doesn’t occur, and although that assumption is ugly and technical I think it is a reasonable assumption in real life; and in the most obvious regime where you can’t make that assumption—where everything is finite—the van Horn approach seems to yield essentially the same conclusions with a pretty modest set of assumptions that are reasonable there.
So far as I can tell by introspection, I’m not saying all this because I am determined not to admit the possibility that Cox’s theorem might be all wrong. It looks to me like it isn’t all wrong, because I’ve looked at the alleged issues and some alleged patches for them and I think the patches work and the issues are purely technical. But it could be all wrong, and that possibility feels more interesting than upsetting to me.
. I don’t think any of the arguments actually made for ASI danger make that assumption.
I do not think there’s actually a great variety of arguments for existential threat from AI. The arguments other than Dopamine Drip, don’t add up to existential threat.
You seem to be conflating “somewhat oddly designed” with “so stupidly designed that no one could possibly think it was a good idea”
Who would have the best idea of what a stupid design is...the person who has designed AIs or the person who hadn’t? If this were any other topic, you would allow that practical experience counts.
The main trouble with this is that I don’t see that Loosemore has made a good argument
That’s irrelevant. The question is whether his argument is so bad it can be dismissed without being addressed.
Also, I think Yudkowsky hopes to find ways of thinking about AI that both make something like provable safety achievable and clarify what’s needed for AI in a way that makes it easier to make an AI at all, in which case, it might not matter what everyone else is doing.
If pure armchair reasoning works, then it doesn’t matter what everyone else is doing. But why would it work? There’s never been a proof of that—just a reluctance to discuss it.
Even the “dopamine drip” argument does not make that assumption, even if some ways of presenting it do.
Loosemore hasn’t designed actually-intelligent AIs, any more than Yudkowsky has. In fact, I don’t see any sign that he’s designed any sort of AIs any more than Yudkowsky has. Both of them are armchair theorists with abstract ideas about how AI ought or ought not to work. Am I missing something? Has Loosemore produced any actual things that could reasonably be called AIs?
No one was dismissing Loosemore’s argument without addressing it. Yudkowsky dismissed Loosemore having argued with him about AI for years.
I don’t know what your last paragraph means. I mean, connotationally it’s clear enough: it means “boo, Yudkowsky and his pals are dilettantes who don’t know anything and haven’t done anything valuable”. But beyond that I can’t make enough sense of it to engage with it.
“If pure armchair reasoning works …”—what does that actually mean? Any sort of reasoning can work or not work. Reasoning that’s done from an armchair (so to speak) has some characteristic failure modes, but it doesn’t always fail.
“Why would it work?”—what does that actually mean? It works if Yudkowsky’s argument is sound. You can’t tell that by looking at whether he’s sitting in an armchair; it depends on whether its (explicit and implicit) premises are true and whether the logic holds; Loosemore says there’s an implicit premise along the lines of “AI systems will have such-and-such structure” which is false; I say no one really knows much about the structure of actual human-level-or-better AI because no one is close to building one yet, I don’t see where Yudkowsky’s argument actually assumes what Loosemore says it does, and Loosemore’s counterargument is more or less “any human-or-better AI will have to work the way I want it to work, and that’s just obvious” and it isn’t obvious.
“There’s never been a proof of that”—a proof of what, exactly? A proof that armchair reasoning works? (Again, what would that even mean? Some armchair reasoning works, some doesn’t.)
“Just a reluctance to discuss it”—seems to me there’s been a fair bit of discussion of Loosemore’s claims on LW. (Including in the very discussion where Yudkowsky called him an idiot.) And, as I understand it, there was a fair bit of discussion between Yudkowsky and Loosemore, but by the time of that discussion Yudkowsky had decided Loosemore wasn’t worth arguing with. This doesn’t look to me like a “reluctance to discuss” in any useful sense. Yudkowsky discussed Loosemore’s ideas with Loosemore for a while and got fed up of doing so. Other LW people discussed Loosemore’s ideas (with Loosemore and I think with one another) and didn’t get particularly fed up. What exactly is the problem here, other than that Yudkowsky was rude?
I do indeed consider that your evidence (“Eliezer Yudkowsky called Richard Loosemore an idiot!”) is not good enough to establish the claim you were making (“we should expect LW people to shoot the messenger if someone reports a refutation of an idea that’s been important here”).
However, the point isn’t that “very strong evidence is needed”, the point is that the evidence you offered is very weak.
(Maybe you disagree and think the evidence you offered is not very weak. If so, maybe you’d like to explain why. I’ve explained in more detail why I think it such weak evidence, elsewhere in this thread. Your defence of it has mostly amounted to “it is so an ad hominem”, as if the criticism had been “TAG says it was an ad hominem but it wasn’t”; again, I’ve explained elsewhere in the thread why I think that entirely misses the point.)
If Loosemore had called Yudkowsky an idiot, you would not be saying “maybe he is”.
For what it’s worth, I think it’s possible that he is, in the relevant sense. As I said elsewhere, the most likely scenario in which EY is wrong about RL being an “idiot” (by which, to repeat, I take it he meant “person obstinately failing to grasp an essential point”) is one in which on the relevant point RL is right and EY wrong, in which case EY would indeed be an “idiot”.
But let’s suppose you’re right. What of it? I thought the question here was whether LW people shoot the messenger, not whether my opinions of Eliezer Yudkowsky and Richard Loosemore are perfectly symmetrical.
In common sense terms, telling an audience that the messenger is an idiot who shouldn’t be listened to because he’s an idiot, is shooting the messenger. It’s about as central an classic an example you can get. What else would it be?
Unfortunately some messengers are idiots (we have already established that most likely either Yudkowsky or Loosemore is an idiot, in this particular scenario). Saying that someone is an idiot isn’t shooting the messenger in any culpable sense if in fact they are an idiot, nor if the person making the accusation has reasonable grounds for thinking they are.
So I guess maybe we actually have to look at the substance of Loosemore’s argument with Yudkowsky. So far as I can make out, it goes like this.
Yudkowsky says: superintelligent AI could well be dangerous, because despite our efforts to arrange for it to do things that suit us (e.g., trying to program it to do things that make us happy) a superintelligent AI might decide to do things that in fact are very bad for us, and if it’s superintelligent then it might well also be super-powerful (on account of being super-persuasive, or super-good at acquiring money via the stock market, or super-good at understanding physics better, or etc.).
Loosemore says: this is ridiculous, because if an AI were really superintelligent in any useful sense then it would be smart enough to see that (e.g.) wireheading all the humans isn’t really what we wanted; if it isn’t smart enough to understand that then it isn’t smart enough to (e.g.) pass the Turing test, to convince us that it’s smart, or to be an actual threat; for that matter, the researchers working on it would have turned it off long before, because its behaviour would necessarily have been bizarrely erratic in other domains besides human values.
The usual response to this by LW-ish people is along the lines of “you’re assuming that a hypothetical AI, on finding an inconsistency between its actual values and the high-level description of ‘doing things that suit its human creators’, would realise that its actual values are crazy and adjust them to match that high-level description better; but that is no more inevitable than that humans, on finding inconsistencies between our actual values and the high-level description of ‘doing things that lead us to have more surviving descendants’, would abandon our actual values in order to better serve the values of Evolution”. To me this seems sufficient to establish that Loosemore has not shown that a hypothetical AI couldn’t behave in clearly-intelligent ways that mostly work towards a given broad goal, but in some cases diverge greatly from it.
There’s clearly more to be said here, but this comment is already rather long, so I’ll skip straight to my conclusion: maybe there’s some version of Loosemore’s argument that’s salvageable as an argument against Yudkowsky-type positions in general, but it’s not clear to me that there is, and while I personally wouldn’t have been nearly as rude as Yudkowsky was I think it’s very much not clear that Yudkowsky was wrong. (With, again, the understanding that “idiot” here doesn’t mean e.g. “person scoring very badly in IQ tests” but something like “person who obstinately fails to grasp a fundamental point of the topic under discussion”.)
I don’t think it’s indefensible to say that Yudkowsky was shooting the messenger in this case. But, please note, your original comment was not about what Yudkowsky would do; it was about what the LW community in general would do. What did the LW community in general think about Yudkowsky’s response to Loosemore? They downvoted it to hell, and several of them continued to discuss things with Loosemore.
One rather prominent LWer (Kaj Sotala, who I think is an admin or a moderator or something of the kind here) wrote a lengthy post in which he opined that Loosemore (in the same paper that was being discussed when Yudkowsky called Loosemore an idiot) had an important point. (I think, though, that he would agree with me that Loosemore has not demonstrated that Yudkowsky-type nightmare scenarios are anything like impossible, contra Loosemore’s claim in that paper that “this entire class of doomsday scenarios is found to be logically incoherent at such a fundamental level that they can be dismissed”, which I think is the key question here. Sotala does agree with Loosemore than some concrete doomsday scenarios are very implausible.) He made a linkpost for that here on LW. How did the community respond? Well, that post is at +23, and there are a bunch of comments discussing it in what seem to me like constructive terms.
So, I reiterate: it seems to me that you’re making a large and unjustified leap from “Yudkowsky called Loosemore an idiot” to “LW should be expected to shoot the messenger”. Y and L had a history of repeatedly-unproductive interactions in the past; L’s paper pretty much called Y an idiot anyway (by implication, not as frankly as Y called L an idiot); there’s a pretty decent case to e made that L was an idiot in the relevant sense; other LWers did not shoot Loosemore even when EY did, and when his objections were brought up again a few years later there was no acrimony.
[EDITED to add:] And of course this is only one case; even if Loosemore were a 100% typical example of someone making an objection to EY’s arguments, and even if we were interested only in EY’s behaviour and not anyone else, the inference from “EY was obnoxious to RL” to “EY generally shoots the messenger” is still pretty shaky.
My reconstruction of Loosemore’s point is that an AI wouldnt have two sets of semantics , one for interpreting verbal commands, and another for negotiating the world and doing things.
My reconstruction of Yudkowkys argument is that it depends on what I’ve been calling the Ubiquitous Utility Function. If you think of any given AI as having a separate module where its goals or values are hard coded then the idea that they were hard coded wrong, but the AI is helpless to change them, is plausible.
Actual AI researchers don’t believe in ubiquitous UF’s because only a few architectures gave them. EY believes in them for reasons unconnected with empirical evidence about AI architectures.
If Loosemore’s point is only that an AI wouldn’t have separate semantics for those things, then I don’t see how it can possibly lead to the conclusion that concerns about disastrously misaligned superintelligent AIs are absurd.
I do not think Yudkowsky’s arguments assume that an AI would have a separate module in which its goals are hard-coded. Some of his specific intuition-pumping thought experiments are commonly phrased in ways that suggest that, but I don’t think it’s anything like an essential assumption in any case.
E.g., consider the “paperclip maximizer” scenario. You could tell that story in terms of a programmer who puts something like “double objective_function() { return count_paperclips(DESK_REGION); }” in their AI’s code. But you could equally tell it in terms of someone who makes an AI that does what it’s told, and whose creator says “Please arrange for there to be as many paperclips as possible on my desk three hours from now.”.
(I am not claiming that any version of the “paperclip maximizer” scenario is very realistic. It’s a nice simple example to suggest the kind of thing that could go wrong, that’s all.)
Loosemore would say: this is a stupid scenario, because understanding human language in particular implies understanding that that isn’t really a request to maximize paperclips at literally any cost, and an AI that lacks that degree of awareness won’t be any good at navigating the world. I would say: that’s a reasonable hope but I don’t think we have anywhere near enough understanding of how AIs could possibly work to be confident of that; e.g., some humans are unusually bad at that sort of contextual subtlety, and some of those humans are none the less awfully good at making various kinds of things happen.
Loosemore claims that Yudkowsky-type nightmare scenarios are “logically incoherent at a fundamental level”. If all that’s actually true is that an AI triggering such a scenario would have to be somewhat oddly designed, or would have to have a rather different balance of mental capabilities than an average human being, then I think his claim is very very wrong.
If there’s one principle argument that it is highly likely for an ASI to be an existential threat, then refuting it refutes claims about ASI and existential threat.
Maybe you think there are other arguments.
If it obeys verbal commands ,you could to it to stop at any time. That’s not a strong likelihood of existential threat. How could.it kill us all in three hours?
I’ll say! Its logically possible to design a car without brakes or a steering wheel, but it’s not likely. Now you don’t have an argument in favour of there being a strong likelihood of existential threat from ASI.
If Loosemore’s point is only that an AI wouldn’t have separate semantics for “interpreting commands” and for “navigating the world and doing things”, then he hasn’t refuted “one principal argument” for ASI danger; he hasn’t refuted any argument for it that doesn’t actually assume that an AI must have separate semantics for those things. I don’t think any of the arguments actually made for ASI danger make that assumption.
I think the first version of the paperclip-maximizer scenario I encountered had the hapless AI programmer give the AI its instructions (“as many paperclips as possible by tomorrow morning”) and then go to bed, or something along those lines.
You seem to be conflating “somewhat oddly designed” with “so stupidly designed that no one could possibly think it was a good idea”. I don’t think Loosemore has made anything resembling a strong case for the latter; it doesn’t look to me as if he’s even really tried.
For Yudkowskian concerns about AGI to be worth paying attention to, it isn’t necessary that there be a “strong likelihood” of disaster if that means something like “at least a 25% chance”. Suppose it turns out that, say, there are lots of ways to make something that could credibly be called an AGI, and if you pick a random one that seems like it might work then 99% of the time you get something that’s perfectly safe (maybe for Loosemore-type reasons) but 1% of the time you get disaster. It seems to me that in this situation it would be very reasonable to have Yudkowsky-type concerns. Do you think Loosemore has given good reason to think that things are much better than that?
Here’s what seems to me the best argument that he has (but, of course, this is just my attempt at a steelman, and maybe your views are quite different): “Loosemore argues that if you really want to make an AGI then you would have to be very foolish to do it in a way that’s vulnerable to Yudkowsky-type problems, even if you weren’t thinking about safety at all. So potential AGI-makers fall into two classes: the stupid ones, and the ones who are taking approaches that are fundamentally immune to the failure modes Yudkowsky worries about. Yudkowsky hopes for intricate mathematical analyses that will reveal ways to build AGI safely, but the stupid potential AGI engineers won’t be reading those analyses, won’t be able to understand them, and won’t be able to follow their recommendations, and the not-stupid ones won’t need them. So Yudkowsky’s wasting his time.”
The main trouble with this is that I don’t see that Loosemore has made a good argument that if you really want to make an AGI then you’d be stupid to do it in a way that’s vulnerable to Yudkowsky-type concerns. Also, I think Yudkowsky hopes to find ways of thinking about AI that both make something like provable safety achievable and clarify what’s needed for AI in a way that makes it easier to make an AI at all, in which case, it might not matter what everyone else is doing.
In any case, this is all a bit of a sidetrack. The point is: Loosemore claimed that the sort of thing Yudkowsky worries about is “logically incoherent at [] a fundamental level”, but even being maximally generous to his arguments I think it’s obvious that he hasn’t shown that; there is a reasonable case to be made that he simply hasn’t understood some of what Yudkowsky has been saying; that is what Y meant by calling L a “permanent idiot”; whether or not detailed analysis of Y’s and L’s arguments ends up favouring one or the other, this is sufficient to suggest that (at worst) what we have here is a good ol’ academic feud where Y has a specific beef with L, which is not at all the same thing as a general propensity for messenger-shooting.
And, to repeat the actually key point: what Yudkowsky did on one occasion is not strong evidence for what the Less Wrong community at large should be expected to do on a future occasion, and I am still waiting (with little hope) for you to provide some of the actual examples you claim to have where the Less Wrong community at large responded with messenger-shooting to refutations of their central ideas. As mentioned elsewhere in the thread, my attempts to check your claims have produced results that point in the other direction; the nearest things I found to at-all-credibly-claimed refutations of central LW ideas met with positive responses from LW: upvotes, reasonable discussion, no messenger-shooting.
“(I haven’t downvoted this question nor any of Haziq’s others; but my guess is that this one was downvoted because it’s only a question worth asking if Halpern’s counterexample to Cox’s theorem is a serious problem, which johnswentworth already gave very good reasons for thinking it isn’t in response to one of Haziq’s other questions; so readers may reasonably wonder whether he’s actually paying any attention to the answers his questions get. Haziq did engage with johnswentworth in that other question—but from this question you’d never guess that any of that had happened.)”
Sorry, haven’t checked LW in a while. I actually came across this comment when I was trying to delete my LW account due to the “shoot the messenger” phenomenon that TAG was describing.
I do not think that Johnwentworth’s answer is satisfactory. In his response to my previous question, he claims that Cox’s theorem holds under very specific conditions which doesn’t happen in most cases. He also claims that probability as extended logic is justified by empirical evidence. I don’t think this is a good justification unless he happens to have an ACME plausibility-o-meter.
David Chapman, another messenger you (meaning LW) were too quick to shoot, explains the issues with subjective Bayesianism here:
https://metarationality.com/probabilism-applicability
https://metarationality.com/probability-limitations
I do agree that this framework is useful but only in the same sense that frequentism is useful. I consider myself a ”pragmatic statistician “ who doesn’t hesitate to use frequentist or Bayesian methods, as long as they are useful, because the justifications for either seem to be equally worse.
‘It might turn out that the way Cox’s theorem is wrong is that the requirements it imposes for a minimally-reasonable belief system need strengthening, but in ways that we would regard as reasonable. In that case there would still be a theorem along the lines of “any reasonable way of structuring your beliefs is equivalent to probability theory with Bayesian updates”.’
I find this statement to be quite disturbing because it seems to me that you are assuming Jaynes-Cox theory to be true first and then trying to find a proof for it. Sounds very much like confirmation bias. Van Horn’s paper could potentially revive Cox’s theorem but nobody’s talking about it because they are not ready to accept that Cox’s theorem has any issues in the first place.
I think the messenger-shooting is quite apparent in LW. It’s the reason why posts that oppose or criticise the “tenets of LW”, that LW members adhere to in a cult-like fashion, are so scarce. For instance, Chapman’s critique of LW seems to have been ignored altogether.
“The most dangerous ideas in a society are not the ones being argued, but the ones that are assumed.”
— C. S. Lewis
I guess there’s not that much point responding to this, since Haziq has apparently now deleted his account, but it seems worth saying a few words.
Haziq says he’s deleting his account because of LW’s alleged messenger-shooting, but I don’t see any sign that he was ever “shot” in any sense beyond this: one of his several questions received a couple of downvotes.
What johnswentworth’s answer about Cox’s theorem says isn’t at all that it “holds under very specific conditions which doesn’t happen in most cases”.
You’ll get no objection from me to the idea of being a “pragmatic statistician”.
No, I am not at all “assuming Jaynes-Cox theory to be true first and then trying to find a proof for it”. I am saying: the specific scenario you describe (there’s a hole in the proof of Cox’s theorem) might play out in various ways and here are some of them. Some of them would mean something a bit like “Less Wrong is dead” (though, I claim, not exactly that); some of them wouldn’t. I mentioned some of both.
I can’t speak for anyone else here, but for me Cox’s theorem isn’t foundational in the sort of way it sounds as if you think it is (or should be?). If Cox’s theorem turns out to be disastrously wrong, that would be very interesting, but rather little of my thinking depends on Cox’s theorem. It’s a bit as if you went up to a Christian and said “Is Christianity dead if the ontological argument is invalid?”; most Christians aren’t Christians because they were persuaded by the ontological argument, and I think most Bayesians (in the sense in which LW folks are mostly Bayesians) aren’t Bayesians because they were persuaded by Cox’s theorem.
I do not know what it would mean to adhere to, say, Cox’s theorem “in a cult-like fashion”.
The second-last bullet point there is maybe the most important, and warrants a bit more explanation.
Whether anything “similar enough” to Cox’s theorem is true or not, the following things are (I think) rather uncontroversial:
We should hold most (maybe all) of our opinions with some degree of uncertainty.
One possible way of thinking about opinions-with-uncertainty is as probabilities.
If we think about our beliefs this way, then there are some theorems telling us how to adjust them in the light of new evidence, how our beliefs about various logically-related propositions should be related, etc.
No obviously-better general approach to quantifying strength of beliefs is known.
To be clear, this doesn’t mean that nothing is known that is ever better at anything than probability theory with Bayesian updates. E.g., “provably approximately correct learning” isn’t (so far as I know) equivalent to anything Bayesian, and it gives some nice guarantees that (so far as I know) no Bayesian approach is known to give. So when what you want is what PACL theory gives you, you should be using PACL.
For me, these are sufficient to justify a generally-Bayes-flavoured approach, by which I mean:
Of course I don’t literally attach numerical probabilities to all my beliefs. Nor do I think it’s obvious that any real reasoner, given the finite resources we inevitably have, should be explicitly probabilistic about everything.
But if for some reason I need to think clearly about how plausible I find various possibilities, I generally do it in terms of probabilities. (Taking care, e.g., to notice when there is danger of double-counting things, which is an easy way to go wrong when applying Bayesian probability naively.)
If I notice that some element of how I think is outright inconsistent with this way of quantifying uncertainty, consider whether it’s mistaken (albeit possibly a useful approximation). E.g., most people’s intuitive judgements of how likely things are produce instances of the (poorly named) “conjunction fallacy”, plausibly often because we tend to apply the “representativeness heuristic”; on reflection I think this genuinely does indicate places where our intuitive judgements are mistaken, and trying to notice it happening and do something cleverer is of some value.
(None of this feels very cultish to me.)
Finding a bug in the proof of Cox’s theorem doesn’t do anything to invalidate any of the above. Finding a concrete case of a structure other than probabilities-with-Bayesian-updates does better (in some sense of “better”) on a problem resembling actual real-world reasoning absolutely might; in particular, it might make it false that “no obviously-better general approach to quantifying strength of beliefs is known”. Halpern’s counterexample to Cox is not, so far as I can tell, like that; it depends essentially on a sort of “sparsity” that doesn’t hold if what you’re trying to assign credences to is all propositions you might consider about how the world is. I think (indeed, I think Halpern pointed this out, but I may be misremembering) you can fix up the proof by adding an assumption saying that this sort of sparsity doesn’t occur, and although that assumption is ugly and technical I think it is a reasonable assumption in real life; and in the most obvious regime where you can’t make that assumption—where everything is finite—the van Horn approach seems to yield essentially the same conclusions with a pretty modest set of assumptions that are reasonable there.
So far as I can tell by introspection, I’m not saying all this because I am determined not to admit the possibility that Cox’s theorem might be all wrong. It looks to me like it isn’t all wrong, because I’ve looked at the alleged issues and some alleged patches for them and I think the patches work and the issues are purely technical. But it could be all wrong, and that possibility feels more interesting than upsetting to me.
I do not think there’s actually a great variety of arguments for existential threat from AI. The arguments other than Dopamine Drip, don’t add up to existential threat.
Who would have the best idea of what a stupid design is...the person who has designed AIs or the person who hadn’t? If this were any other topic, you would allow that practical experience counts.
That’s irrelevant. The question is whether his argument is so bad it can be dismissed without being addressed.
If pure armchair reasoning works, then it doesn’t matter what everyone else is doing. But why would it work? There’s never been a proof of that—just a reluctance to discuss it.
Even the “dopamine drip” argument does not make that assumption, even if some ways of presenting it do.
Loosemore hasn’t designed actually-intelligent AIs, any more than Yudkowsky has. In fact, I don’t see any sign that he’s designed any sort of AIs any more than Yudkowsky has. Both of them are armchair theorists with abstract ideas about how AI ought or ought not to work. Am I missing something? Has Loosemore produced any actual things that could reasonably be called AIs?
No one was dismissing Loosemore’s argument without addressing it. Yudkowsky dismissed Loosemore having argued with him about AI for years.
I don’t know what your last paragraph means. I mean, connotationally it’s clear enough: it means “boo, Yudkowsky and his pals are dilettantes who don’t know anything and haven’t done anything valuable”. But beyond that I can’t make enough sense of it to engage with it.
“If pure armchair reasoning works …”—what does that actually mean? Any sort of reasoning can work or not work. Reasoning that’s done from an armchair (so to speak) has some characteristic failure modes, but it doesn’t always fail.
“Why would it work?”—what does that actually mean? It works if Yudkowsky’s argument is sound. You can’t tell that by looking at whether he’s sitting in an armchair; it depends on whether its (explicit and implicit) premises are true and whether the logic holds; Loosemore says there’s an implicit premise along the lines of “AI systems will have such-and-such structure” which is false; I say no one really knows much about the structure of actual human-level-or-better AI because no one is close to building one yet, I don’t see where Yudkowsky’s argument actually assumes what Loosemore says it does, and Loosemore’s counterargument is more or less “any human-or-better AI will have to work the way I want it to work, and that’s just obvious” and it isn’t obvious.
“There’s never been a proof of that”—a proof of what, exactly? A proof that armchair reasoning works? (Again, what would that even mean? Some armchair reasoning works, some doesn’t.)
“Just a reluctance to discuss it”—seems to me there’s been a fair bit of discussion of Loosemore’s claims on LW. (Including in the very discussion where Yudkowsky called him an idiot.) And, as I understand it, there was a fair bit of discussion between Yudkowsky and Loosemore, but by the time of that discussion Yudkowsky had decided Loosemore wasn’t worth arguing with. This doesn’t look to me like a “reluctance to discuss” in any useful sense. Yudkowsky discussed Loosemore’s ideas with Loosemore for a while and got fed up of doing so. Other LW people discussed Loosemore’s ideas (with Loosemore and I think with one another) and didn’t get particularly fed up. What exactly is the problem here, other than that Yudkowsky was rude?