See something I’ve written which you disagree with? I’m experimenting with offering cash prizes of up to US$1000 to anyone who changes my mind about something I consider important. Message me our disagreement and I’ll tell you how much I’ll pay if you change my mind + details :-) (EDIT: I’m not logging into Less Wrong very often now, it might take me a while to see your message—I’m still interested though)
John_Maxwell
Thoughts on designing policies for oneself
System Administrator Appreciation Day—Thanks Trike!
Applying Behavioral Psychology on Myself
The Case for a Bigger Audience
Response to Glen Weyl on Technocracy and the Rationalist Community
How to Make Billions of Dollars Reducing Loneliness
Why GPT wants to mesa-optimize & how we might change this
Thanks for sharing your contrarian views, both with this post and with your previous posts. Part of me is disappointed that you didn’t write more… it feels like you have several posts’ worth of objections to Less Wrong here, and at times you are just vaguely gesturing towards a larger body of objections you have towards some popular LW position. I wouldn’t mind seeing those objections fleshed out in to long, well-researched posts. Of course you aren’t obliged to put in the time & effort to write more posts, but it might be worth your time to fix specific flaws you see in the LW community given that it consists of many smart people interested in maximizing their positive impact on the far future.
I’ll preface this by stating some points of general agreement:
I haven’t bothered to read the quantum physics sequence (I figure if I want to take the time to learn that topic, I’ll learn from someone who researches it full-time).
I’m annoyed by the fact that the sequences in practice seem to constitute a relatively static document that doesn’t get updated in response to critiques people have written up. I think it’s worth reading them with a grain of salt for that reason. (I’m also annoyed by the fact that they are extremely wordy and mostly without citation. Given the choice of getting LWers to either read the sequences or read Thinking Fast and Slow, I would prefer they read the latter; it’s a fantastic book, and thoroughly backed up by citations. No intellectually serious person should go without reading it IMO, and it’s definitely a better return on time. Caveat: I personally haven’t read the sequences through and through, although I’ve read lots of individual posts, some of which were quite insightful. Also, there is surprisingly little overlap between the two works and it’s likely worthwhile to read both.)
And here are some points of disagreement :P
You talk about how Less Wrong encourages the mistake of reasoning by analogy. I searched for “site:lesswrong.com reasoning by analogy” on Google and came up with these 4 posts: 1, 2, 3, 4. Posts 1, 2, and 4 argue against reasoning by analogy, while post 3 claims the situation is a bit more nuanced. In this comment here, I argue that reasoning by analogy is a bit like taking the outside view: analogous phenomena can be considered part of the same (weak) reference class. So...
Insofar as there is an explicit “LW consensus” about whether reasoning by analogy is a good idea, it seems like you’ve diagnosed it incorrectly (although maybe there are implicit cultural norms that go against professed best practices).
It seems useful to know the answer to questions like “how valuable are analogies”, and the discussions I linked to above seem like discussions that might help you answer that question. These discussions are on LW.
Finally, it seems you’ve been unable to escape a certain amount of reasoning by analogy in your post. You state that experimental investigation of asteroid impacts was useful, so by analogy, experimental investigation of AI risks should be useful.
The steelman of this argument would be something like “experimentally, we find that investigators who take experimental approaches tend to do better than those who take theoretical approaches”. But first, this isn’t obviously true… mathematicians, for instance, have found theoretical approaches to be more powerful. (I’d guess that the developer of Bitcoin took a theoretical rather than an empirical approach to creating a secure cryptocurrency.) And second, I’d say that even this argument is analogy-like in its structure, since the reference class of “people investigating things” seems sufficiently weak to start pushing in to analogy territory. See my above point about how reasoning by analogy at its best is reasoning from a weak reference class. (Do people think this is worth a toplevel post?)
This brings me to what I think is my most fundamental point of disagreement with you. Viewed from a distance, your argument goes something like “Philosophy is a waste of time! Resolve your disagreements experimentally! There’s no need for all this theorizing!” And my rejoinder would be: Resolving disagreements experimentally is great… when it’s possible. We’d love to do a randomized controlled trial of whether universes with a Machine Intelligence Research Institute are more likely to have a positive singularity, but that unfortunately we don’t currently know how to do that.
There are a few issues with too much emphasis of experimentation over theory. The first issue is that you may be tempted to prefer experimentation over theory even for problems that theory is better suited for (e.g. empirically testing prime number conjectures). The second issue is that you may fall prey to the streetlight effect and prioritize areas of investigation that look tractable from an experimental point of view, ignoring questions that are both very important and not very tractable experimentally.
You write:
Well, much of our uncertainty about the actions of an unfriendly AI could be resolved if we were to know more about how such agents construct their thought models, and relatedly what language were used to construct their goal systems.
This would seem to depend on the specifics of the agent in question. This seems like a potentially interesting line of inquiry. My impression is that MIRI thinks most possible AGI architectures wouldn’t meet its standards for safety, so given that their ideal architecture is so safety-constrained, they’re focused on developing the safety stuff first before working on constructing thought models etc. This seems like a pretty reasonable approach for an organization with limited resources, if it is in fact MIRI’s approach. But I could believe that value could be added by looking at lots of budding AGI architectures and trying to figure out how one might make them safer on the margin.
We could also stand to benefit from knowing more practical information (experimental data) about in what ways AI boxing works and in what ways it does not, and how much that is dependent on the structure of the AI itself.
Sure… but note that Eliezer Yudkowsky from MIRI was the one who invented the AI box experiment and ran the first few experiments, and FHI wrote this paper consisting of a bunch of ideas for what AI boxes consist of. (The other thing I didn’t mention as a weakness of empiricism is that empiricism doesn’t tell you what hypotheses might be useful to test. Knowing what hypotheses to test is especially nice to know when testing hypotheses is expensive.)
I could believe that there are fruitful lines of experimental inquiry that are neglected in the AI safety space. Overall it looks kinda like crypto to me in the sense that theoretical investigation seems more likely to pan out. But I’m supportive of people thinking hard about specific useful experiments that someone could run. (You could survey all the claims in Bostrom’s Superintelligence and try to estimate what fraction could be cheaply tested experimentally. Remember that just because a claim can’t be tested experimentally doesn’t mean it’s not an important claim worth thinking about...)
- 22 May 2015 21:49 UTC; 17 points) 's comment on Leaving LessWrong for a more rational life by (
- 11 Jun 2015 17:29 UTC; 5 points) 's comment on I am Nate Soares, AMA! by (EA Forum;
Chapter 7 in this book had a few good thoughts on getting critical feedback from subordinates, specifically in the context of avoiding disasters. The book claims that merely encouraging subordinates to give critical feedback is often insufficient, and offers ideas for other things to do.
PSA: Learn to code
Yeah, this is a great book. Curious to see how the rise of adblockers/anti-addiction tech will change things. There’s a counterintuitive argument that things will become worse: As sophisticated folks tune out, unsophisticated folks become the most lucrative audience and the race to the bottom accelerates. As wise people leave the conversation, the conversation that remains is even more crazy. And unfortunately, even a small number of well-coordinated crazy people can do a lot of damage.
I actually think leaving comments online is a more scaleable strategy than people realize. I leave a lot of comments on LW, the EA Forum, etc. and I’m now no longer surprised when I meet someone IRL and they recognize my name. It took me a while to internalize how skewed reader/writer ratios are online and how many lurkers there really are. I suspect the lurker numbers for any kind of culture war discussion are even higher. It’s like two people having a shouting match on public transportation: Everyone wants to watch, no one wants to participate. But the size of the audience means that if you do choose to participate, then you massively amplify your influence.
I once did an experiment where I registered a throway twitter account, searched for the trending hashtag controversy du jour, replied to peoples’ tweets and tried to talk them down from their extreme positions. I was surprised by how few tweets there were, how little time it took for me to respond to all of them, how many people engaged with me, and how successful I was at getting people to moderate their positions a bit. I got the impression that if I had the money to hire 100 people to use Twitter full time and mediate every ugly discussion they saw, I’d have a nontrivial chance of moving the needle for a nation of 330 million.
- 9 May 2019 7:23 UTC; 6 points) 's comment on Aligning Recommender Systems as Cause Area by (EA Forum;
I used to feel that getting offended was useless and counterproductive, but a friend pointed out that if people are not treating you with respect, that can be a genuinely problematic situation.
Wei Dai suggests that offense is experienced when people feel they are being treated as being low status. So if you feel offended, a good first question might be “do I care that this person is treating me as low status?” If there is no one else around, and you don’t expect to see the person again, then your answer may be no. If there are others around, or you expect to see the person again, then things may be more difficult. Yes, you can politely ask people to be more considerate of you, but that’s not exactly a high-status move.
So, I don’t feel that “never act offended” passes the “rationalists should win” test as a group norm. It might actually be good that “That’s offensive” represents a high-status way to say “You’re treating me as low status. Stop.”
It might even be worthwhile to expand the concept of offense. Currently it’s only acceptable to be offended when people treat you as low status in certain narrow ways. If someone says something nasty about your nose, “That’s offensive” is not nearly as high-status a response as it would be if someone said something about your race. (Theory: “That’s racist” works as a high-status response because you’re implicitly invoking the coalition of all the people who think racist statements are bad.) But nasty statements about your nose can still be pretty nasty.
To expand the concept of offense to all nasty statements, you might have to create a widespread social norm against nasty statements in general, to give people a coalition to invoke. Though, perhaps “Gee, you sound like someone who has a lot of friends” or similar would act as an effective stand-in.
(As you point out, it’s not too hard to fake offense, so we don’t necessarily disagree on anything.)
Karma me!
Marketplace Transactions Open Thread
You started with an intent to associate SIAI with self delusion
I see, he must be one of those innately evil enemies of ours, eh?
My current model of aaronsw is something like this: He’s a fairly rational person who’s a fan of Givewell. He’s read about SI and thinks the singularity is woo, but he’s self-skeptical enough to start reading SI’s website. He finds a question in their FAQ where they fail to address points made by those who disagree, reinforcing the woo impression. At this point he could just say “yeah, they’re woo like I thought”. But he’s heard they run a blog on rationality, so he makes a post pointing out the self-skepticism failure in case there’s something he’s missing.
The FAQ on the website is not the place to signal humility and argue against your own conclusions.
Why not? I think it’s an excellent place to do that. Signalling humility and arguing against your own conclusions is a good way to be taken seriously.
Overall, I thought aaronsw’s post had a much higher information to accusations ratio than your comment, for whatever that’s worth. As criticism goes his is pretty polite and intelligent.
Also, aaronsw is not the first person I’ve seen on the internet complaining about lack of self-skepticism on LW, and I agree with him that it’s something we could stand to work on. Or at least signalling self-skepticism; it’s possible that we’re already plenty self-skeptical and all we need to do is project typical self-skeptical attitudes.
For example, Eliezer Yudkowsky seems to think that the rational virtue of “humility” is about “taking specific actions in anticipation of your own errors”, not actually acting humble. (Presumably self-skepticism counts as humility by this definition.) But I suspect that observing how humble someone seems is a typical way to gauge the degree to which they take specific actions in anticipation of their own errors. If this is the case, it’s best for signalling purposes to actually act humble as well.
(I also suspect that acting humble makes it easier to publicly change your mind, since the status loss for doing so becomes lower. So that’s another reason to actually act humble.)
(Yes, I’m aware that I don’t always act humble. Unfortunately, acting humble by always using words like “I suspect” everywhere makes my comments harder to read and write. I’m not sure what the best solution to this is.)
6 Tips for Productive Arguments
Donated $400.
How a game theorist buys a car (on the phone with the dealer):
“Hello, my name is Bruce Bueno de Mesquita. I plan to buy the following car [list the exact model and features] today at five P.M. I am calling all of the dealerships within a fifty-mile radius of my home and I am telling each of them what I am telling you. I will come in and buy the car today at five P.M. from the dealer who gives me the lowest price. I need to have the all-in price, including taxes, dealer prep [I ask them not to prep the car and not charge me for it, since dealer prep is little more than giving you a washed car with plastic covers and paper floormats removed, usually for hundreds of dollars], everything, because I will make out the check to your dealership before I come and will not have another check with me.”
From The Predictioneer’s Game, page 7.
Other car-buying tips from Bueno de Mesquita, in case you’re about to buy a car:
Figure out exactly what car you want to buy by searching online before making any contact with dealerships.
Don’t be afraid to purchase a car from a distant dealership—the manufacturer provides the warranty, not the dealer.
Be sure to tell each dealer you will be sharing the price they quote you with subsequent dealers.
Don’t take shit from dealers who tell you “you can’t buy a car over the phone” or do anything other than give you their number. If a dealer is stonewalling, make it quite clear that you’re willing to get what you want elsewhere.
Arrive at the lowest-price dealer just before 5:00 PM to close the deal. In the unlikely event that the dealer changes their terms, go for the next best price.
- 7 May 2012 16:21 UTC; 10 points) 's comment on Lesswrong Community’s How-Tos and Recommendations by (
- 15 Jul 2012 21:44 UTC; 6 points) 's comment on Bargaining and Auctions by (
- 29 Jan 2013 15:23 UTC; 0 points) 's comment on Notes on Autonomous Cars by (
My summary (now with endorsement by Eliezer!):
SI can be a valuable organization even if Tool AI turns out to be the right approach:
Skills/organizational capabilities for safe Tool AI are similar to those for Friendly AI.
EY seems to imply that much of SI’s existing body of work can be reused.
Offhand remark that seemed important: Superintelligent Tool AI would be more difficult since it would have to be developed in way that it would not recursively self-improve.
Tool AI is nontrivial:
The number of possible plans is way too large for an AI to realistically evaluate all them. Heuristics will have to be used to find suboptimal but promising plans.
The reasoning behind the plan the AI chooses might be way beyond the comprehension of the user. It’s not clear how best to deal with this, given that the AI is only approximating the user’s wishes and can’t really be trusted to choose plans without supervision.
Constructing a halfway decent approximation of the user’s utility function and having a model good enough to make plans with are also far from solved problems.
Potential Tool AI gotcha: The AI might give you a self-fulfilling negative prophecy that the AI didn’t realize would harm you.
These are just examples. Point is, saying “but the AI will just do this!” is far removed from specifying the AI in a rigorous formal way and proving it will do that.
Tool AI is not obviously the way AGI should or will be developed:
Many leading AGI thinkers have their own pet idea about what AGI should do. Few to none endorse Tool AI. If it was obvious all the leading AGI thinkers would endorse it.
Actually, most modern AI applications don’t involve human input, so it’s not obvious that AGI will develop along Tool AI lines.
Full-time Friendliness researchers are worth having:
If nothing else, they’re useful for evaluating proposals like Holden’s Tool AI one to figure out if they are really sound.
Friendliness philosophy would be difficult to program an AI to do. Even if we thought we had a program that could do it, how would we know the answers from that program were correct? So we probably need humans.
Friendliness researchers need to have a broader domain of expertise than Holden gives them credit for. They need to have expertise in whatever happens to be necessary to ensure safe AI.
The problems of Friendliness are tricky, so laypeople should beware of jumping to conclusions about Friendliness.
Holden’s estimate of a 90% chance of doom even given a 100 person FAI team approving the design is overly pessimistic:
EY is aware it’s extremely difficult to know what properties about a prospective FAI need to be formally proved, and plans to put a lot of effort into figuring this out.
The difficulty of Friendliness is finite. The difficulties are big and subtle, but not unending.
Where did 90% come from? Lots of uncertainty here...
Holden made other good points not addressed here.