John_Maxwell

Karma: 13,326

See something I’ve written which you disagree with? I’m experimenting with offering cash prizes of up to US$1000 to anyone who changes my mind about something I consider important. Message me our disagreement and I’ll tell you how much I’ll pay if you change my mind + details :-) (EDIT: I’m not logging into Less Wrong very often now, it might take me a while to see your message—I’m still interested though)

John_Maxwell 12 Jun 2012 4:14 UTC
83 points
on: Reply to Holden on ‘Tool AI’
My summary (now with endorsement by Eliezer!):
- SI can be a valuable organization even if Tool AI turns out to be the right approach:
  - Skills/organizational capabilities for safe Tool AI are similar to those for Friendly AI.
  - EY seems to imply that much of SI’s existing body of work can be reused.
  - Offhand remark that seemed important: Superintelligent Tool AI would be more difficult since it would have to be developed in way that it would not recursively self-improve.
- Tool AI is nontrivial:
  - The number of possible plans is way too large for an AI to realistically evaluate all them. Heuristics will have to be used to find suboptimal but promising plans.
  - The reasoning behind the plan the AI chooses might be way beyond the comprehension of the user. It’s not clear how best to deal with this, given that the AI is only approximating the user’s wishes and can’t really be trusted to choose plans without supervision.
  - Constructing a halfway decent approximation of the user’s utility function and having a model good enough to make plans with are also far from solved problems.
  - Potential Tool AI gotcha: The AI might give you a self-fulfilling negative prophecy that the AI didn’t realize would harm you.
  - These are just examples. Point is, saying “but the AI will just do this!” is far removed from specifying the AI in a rigorous formal way and proving it will do that.
- Tool AI is not obviously the way AGI should or will be developed:
  - Many leading AGI thinkers have their own pet idea about what AGI should do. Few to none endorse Tool AI. If it was obvious all the leading AGI thinkers would endorse it.
  - Actually, most modern AI applications don’t involve human input, so it’s not obvious that AGI will develop along Tool AI lines.
- Full-time Friendliness researchers are worth having:
  - If nothing else, they’re useful for evaluating proposals like Holden’s Tool AI one to figure out if they are really sound.
  - Friendliness philosophy would be difficult to program an AI to do. Even if we thought we had a program that could do it, how would we know the answers from that program were correct? So we probably need humans.
  - Friendliness researchers need to have a broader domain of expertise than Holden gives them credit for. They need to have expertise in whatever happens to be necessary to ensure safe AI.
  - The problems of Friendliness are tricky, so laypeople should beware of jumping to conclusions about Friendliness.
- Holden’s estimate of a 90% chance of doom even given a 100 person FAI team approving the design is overly pessimistic:
  - EY is aware it’s extremely difficult to know what properties about a prospective FAI need to be formally proved, and plans to put a lot of effort into figuring this out.
  - The difficulty of Friendliness is finite. The difficulties are big and subtle, but not unending.
  - Where did 90% come from? Lots of uncertainty here...
- Holden made other good points not addressed here.
What links here?
- Can we evaluate the “tool versus agent” AGI prediction? by Ben_West (EA Forum; 8 Apr 2023 18:35 UTC; 63 points)
- Can we evaluate the “tool versus agent” AGI prediction? by Xodarap (8 Apr 2023 18:40 UTC; 16 points)

John_Maxwell 22 May 2015 4:54 UTC
53 points
0
on: Leaving LessWrong for a more rational life
Thanks for sharing your contrarian views, both with this post and with your previous posts. Part of me is disappointed that you didn’t write more… it feels like you have several posts’ worth of objections to Less Wrong here, and at times you are just vaguely gesturing towards a larger body of objections you have towards some popular LW position. I wouldn’t mind seeing those objections fleshed out in to long, well-researched posts. Of course you aren’t obliged to put in the time & effort to write more posts, but it might be worth your time to fix specific flaws you see in the LW community given that it consists of many smart people interested in maximizing their positive impact on the far future.

I’ll preface this by stating some points of general agreement:
- I haven’t bothered to read the quantum physics sequence (I figure if I want to take the time to learn that topic, I’ll learn from someone who researches it full-time).
- I’m annoyed by the fact that the sequences in practice seem to constitute a relatively static document that doesn’t get updated in response to critiques people have written up. I think it’s worth reading them with a grain of salt for that reason. (I’m also annoyed by the fact that they are extremely wordy and mostly without citation. Given the choice of getting LWers to either read the sequences or read Thinking Fast and Slow, I would prefer they read the latter; it’s a fantastic book, and thoroughly backed up by citations. No intellectually serious person should go without reading it IMO, and it’s definitely a better return on time. Caveat: I personally haven’t read the sequences through and through, although I’ve read lots of individual posts, some of which were quite insightful. Also, there is surprisingly little overlap between the two works and it’s likely worthwhile to read both.)
And here are some points of disagreement :P

You talk about how Less Wrong encourages the mistake of reasoning by analogy. I searched for “site:lesswrong.com reasoning by analogy” on Google and came up with these 4 posts: 1, 2, 3, 4. Posts 1, 2, and 4 argue against reasoning by analogy, while post 3 claims the situation is a bit more nuanced. In this comment here, I argue that reasoning by analogy is a bit like taking the outside view: analogous phenomena can be considered part of the same (weak) reference class. So...
- Insofar as there is an explicit “LW consensus” about whether reasoning by analogy is a good idea, it seems like you’ve diagnosed it incorrectly (although maybe there are implicit cultural norms that go against professed best practices).
- It seems useful to know the answer to questions like “how valuable are analogies”, and the discussions I linked to above seem like discussions that might help you answer that question. These discussions are on LW.
- Finally, it seems you’ve been unable to escape a certain amount of reasoning by analogy in your post. You state that experimental investigation of asteroid impacts was useful, so by analogy, experimental investigation of AI risks should be useful.
The steelman of this argument would be something like “experimentally, we find that investigators who take experimental approaches tend to do better than those who take theoretical approaches”. But first, this isn’t obviously true… mathematicians, for instance, have found theoretical approaches to be more powerful. (I’d guess that the developer of Bitcoin took a theoretical rather than an empirical approach to creating a secure cryptocurrency.) And second, I’d say that even this argument is analogy-like in its structure, since the reference class of “people investigating things” seems sufficiently weak to start pushing in to analogy territory. See my above point about how reasoning by analogy at its best is reasoning from a weak reference class. (Do people think this is worth a toplevel post?)

This brings me to what I think is my most fundamental point of disagreement with you. Viewed from a distance, your argument goes something like “Philosophy is a waste of time! Resolve your disagreements experimentally! There’s no need for all this theorizing!” And my rejoinder would be: Resolving disagreements experimentally is great… when it’s possible. We’d love to do a randomized controlled trial of whether universes with a Machine Intelligence Research Institute are more likely to have a positive singularity, but that unfortunately we don’t currently know how to do that.

There are a few issues with too much emphasis of experimentation over theory. The first issue is that you may be tempted to prefer experimentation over theory even for problems that theory is better suited for (e.g. empirically testing prime number conjectures). The second issue is that you may fall prey to the streetlight effect and prioritize areas of investigation that look tractable from an experimental point of view, ignoring questions that are both very important and not very tractable experimentally.

You write:

Well, much of our uncertainty about the actions of an unfriendly AI could be resolved if we were to know more about how such agents construct their thought models, and relatedly what language were used to construct their goal systems.

This would seem to depend on the specifics of the agent in question. This seems like a potentially interesting line of inquiry. My impression is that MIRI thinks most possible AGI architectures wouldn’t meet its standards for safety, so given that their ideal architecture is so safety-constrained, they’re focused on developing the safety stuff first before working on constructing thought models etc. This seems like a pretty reasonable approach for an organization with limited resources, if it is in fact MIRI’s approach. But I could believe that value could be added by looking at lots of budding AGI architectures and trying to figure out how one might make them safer on the margin.

We could also stand to benefit from knowing more practical information (experimental data) about in what ways AI boxing works and in what ways it does not, and how much that is dependent on the structure of the AI itself.

Sure… but note that Eliezer Yudkowsky from MIRI was the one who invented the AI box experiment and ran the first few experiments, and FHI wrote this paper consisting of a bunch of ideas for what AI boxes consist of. (The other thing I didn’t mention as a weakness of empiricism is that empiricism doesn’t tell you what hypotheses might be useful to test. Knowing what hypotheses to test is especially nice to know when testing hypotheses is expensive.)

I could believe that there are fruitful lines of experimental inquiry that are neglected in the AI safety space. Overall it looks kinda like crypto to me in the sense that theoretical investigation seems more likely to pan out. But I’m supportive of people thinking hard about specific useful experiments that someone could run. (You could survey all the claims in Bostrom’s Superintelligence and try to estimate what fraction could be cheaply tested experimentally. Remember that just because a claim can’t be tested experimentally doesn’t mean it’s not an important claim worth thinking about...)
What links here?
- Rob Bensinger's comment on Leaving LessWrong for a more rational life by Mark_Friedenbach (22 May 2015 21:49 UTC; 16 points)
- John_Maxwell's comment on I am Nate Soares, AMA! by So8res (EA Forum; 11 Jun 2015 17:29 UTC; 5 points)

John_Maxwell 20 Jun 2022 11:25 UTC
51 points
28
in reply to: Eliezer Yudkowsky’s comment on: Where I agree and disagree with Eliezer
Chapter 7 in this book had a few good thoughts on getting critical feedback from subordinates, specifically in the context of avoiding disasters. The book claims that merely encouraging subordinates to give critical feedback is often insufficient, and offers ideas for other things to do.

John_Maxwell 14 Aug 2018 4:16 UTC
47 points
on: Trust Me I’m Lying: A Summary and Review
Yeah, this is a great book. Curious to see how the rise of adblockers/anti-addiction tech will change things. There’s a counterintuitive argument that things will become worse: As sophisticated folks tune out, unsophisticated folks become the most lucrative audience and the race to the bottom accelerates. As wise people leave the conversation, the conversation that remains is even more crazy. And unfortunately, even a small number of well-coordinated crazy people can do a lot of damage.

I actually think leaving comments online is a more scaleable strategy than people realize. I leave a lot of comments on LW, the EA Forum, etc. and I’m now no longer surprised when I meet someone IRL and they recognize my name. It took me a while to internalize how skewed reader/writer ratios are online and how many lurkers there really are. I suspect the lurker numbers for any kind of culture war discussion are even higher. It’s like two people having a shouting match on public transportation: Everyone wants to watch, no one wants to participate. But the size of the audience means that if you do choose to participate, then you massively amplify your influence.

I once did an experiment where I registered a throway twitter account, searched for the trending hashtag controversy du jour, replied to peoples’ tweets and tried to talk them down from their extreme positions. I was surprised by how few tweets there were, how little time it took for me to respond to all of them, how many people engaged with me, and how successful I was at getting people to moderate their positions a bit. I got the impression that if I had the money to hire 100 people to use Twitter full time and mediate every ugly discussion they saw, I’d have a nontrivial chance of moving the needle for a nation of 330 million.
What links here?
- John_Maxwell's comment on Aligning Recommender Systems as Cause Area by IvanVendrov (EA Forum; 9 May 2019 7:23 UTC; 6 points)

John_Maxwell 7 Mar 2013 8:20 UTC
47 points
on: Don’t Get Offended
I used to feel that getting offended was useless and counterproductive, but a friend pointed out that if people are not treating you with respect, that can be a genuinely problematic situation.

Wei Dai suggests that offense is experienced when people feel they are being treated as being low status. So if you feel offended, a good first question might be “do I care that this person is treating me as low status?” If there is no one else around, and you don’t expect to see the person again, then your answer may be no. If there are others around, or you expect to see the person again, then things may be more difficult. Yes, you can politely ask people to be more considerate of you, but that’s not exactly a high-status move.

So, I don’t feel that “never act offended” passes the “rationalists should win” test as a group norm. It might actually be good that “That’s offensive” represents a high-status way to say “You’re treating me as low status. Stop.”

It might even be worthwhile to expand the concept of offense. Currently it’s only acceptable to be offended when people treat you as low status in certain narrow ways. If someone says something nasty about your nose, “That’s offensive” is not nearly as high-status a response as it would be if someone said something about your race. (Theory: “That’s racist” works as a high-status response because you’re implicitly invoking the coalition of all the people who think racist statements are bad.) But nasty statements about your nose can still be pretty nasty.

To expand the concept of offense to all nasty statements, you might have to create a widespread social norm against nasty statements in general, to give people a coalition to invoke. Though, perhaps “Gee, you sound like someone who has a lot of friends” or similar would act as an effective stand-in.

(As you point out, it’s not too hard to fake offense, so we don’t necessarily disagree on anything.)

John_Maxwell 4 Nov 2012 3:19 UTC
47 points
on: 2012 Less Wrong Census/Survey
Karma me!

John_Maxwell 6 Aug 2012 6:19 UTC
46 points
in reply to: wedrifid’s comment on: Self-skepticism: the first principle of rationality

You started with an intent to associate SIAI with self delusion

I see, he must be one of those innately evil enemies of ours, eh?

My current model of aaronsw is something like this: He’s a fairly rational person who’s a fan of Givewell. He’s read about SI and thinks the singularity is woo, but he’s self-skeptical enough to start reading SI’s website. He finds a question in their FAQ where they fail to address points made by those who disagree, reinforcing the woo impression. At this point he could just say “yeah, they’re woo like I thought”. But he’s heard they run a blog on rationality, so he makes a post pointing out the self-skepticism failure in case there’s something he’s missing.

The FAQ on the website is not the place to signal humility and argue against your own conclusions.

Why not? I think it’s an excellent place to do that. Signalling humility and arguing against your own conclusions is a good way to be taken seriously.

Overall, I thought aaronsw’s post had a much higher information to accusations ratio than your comment, for whatever that’s worth. As criticism goes his is pretty polite and intelligent.

Also, aaronsw is not the first person I’ve seen on the internet complaining about lack of self-skepticism on LW, and I agree with him that it’s something we could stand to work on. Or at least signalling self-skepticism; it’s possible that we’re already plenty self-skeptical and all we need to do is project typical self-skeptical attitudes.

For example, Eliezer Yudkowsky seems to think that the rational virtue of “humility” is about “taking specific actions in anticipation of your own errors”, not actually acting humble. (Presumably self-skepticism counts as humility by this definition.) But I suspect that observing how humble someone seems is a typical way to gauge the degree to which they take specific actions in anticipation of their own errors. If this is the case, it’s best for signalling purposes to actually act humble as well.

(I also suspect that acting humble makes it easier to publicly change your mind, since the status loss for doing so becomes lower. So that’s another reason to actually act humble.)

(Yes, I’m aware that I don’t always act humble. Unfortunately, acting humble by always using words like “I suspect” everywhere makes my comments harder to read and write. I’m not sure what the best solution to this is.)

John_Maxwell 29 Jan 2015 6:52 UTC
44 points
on: CFAR fundraiser far from filled; 4 days remaining
Donated $400.

John_Maxwell 7 May 2012 18:09 UTC
44 points
on: Rationality Quotes May 2012
How a game theorist buys a car (on the phone with the dealer):

“Hello, my name is Bruce Bueno de Mesquita. I plan to buy the following car [list the exact model and features] today at five P.M. I am calling all of the dealerships within a fifty-mile radius of my home and I am telling each of them what I am telling you. I will come in and buy the car today at five P.M. from the dealer who gives me the lowest price. I need to have the all-in price, including taxes, dealer prep [I ask them not to prep the car and not charge me for it, since dealer prep is little more than giving you a washed car with plastic covers and paper floormats removed, usually for hundreds of dollars], everything, because I will make out the check to your dealership before I come and will not have another check with me.”

From The Predictioneer’s Game, page 7.

Other car-buying tips from Bueno de Mesquita, in case you’re about to buy a car:
- Figure out exactly what car you want to buy by searching online before making any contact with dealerships.
- Don’t be afraid to purchase a car from a distant dealership—the manufacturer provides the warranty, not the dealer.
- Be sure to tell each dealer you will be sharing the price they quote you with subsequent dealers.
- Don’t take shit from dealers who tell you “you can’t buy a car over the phone” or do anything other than give you their number. If a dealer is stonewalling, make it quite clear that you’re willing to get what you want elsewhere.
- Arrive at the lowest-price dealer just before 5:00 PM to close the deal. In the unlikely event that the dealer changes their terms, go for the next best price.
What links here?

John_Maxwell 6 Nov 2019 5:37 UTC
42 points
on: Open & Welcome Thread—November 2019
Eliezer Yudkowsky and Paul Graham have a lot in common. They’re both well-known bloggers who write about rationality and are influential in Silicon Valley. They’re both known for Bayesian stuff (Graham was a pioneer of Bayesian spam filtering). They both played a role in creating discussion sites which are, in my opinion, among the best on the internet (Less Wrong for Eliezer, Hacker News for Paul Graham). And they’ve both stopped posting to the sites they created, but they both still post to… Twitter, which is, in my opinion, one of the worst discussion sites on the internet. (Here is one of many illustrations.)

It seems like having so many celebrities, scientists, and politicians is a major asset for Twitter. What is it about Twitter which makes big names want to post there? How could a rival site attract big names without also importing Twitter’s pathologies?

John_Maxwell 27 Nov 2016 13:57 UTC
42 points
0
in reply to: Morendil’s comment on: On the importance of Less Wrong, or another single conversational locus
There has been lots of discussion of this. This is probably at least the tenth thread on why/how to fix LW.

http://lesswrong.com/lw/kbc/meta_the_decline_of_discussion_now_with_charts/

http://lesswrong.com/r/discussion/lw/nf2/lesswrong_potential_changes/

http://lesswrong.com/lw/n0l/lesswrong_20/

http://lesswrong.com/lw/n9b/upcoming_lw_changes/

https://wiki.lesswrong.com/index.php?title=Less_Wrong_2016_strategy_proposal

http://lesswrong.com/lw/nkw/2016_lesswrong_diaspora_survey_results/

http://lesswrong.com/lw/mbd/lesswrong_effective_altruism_forum_and_slate_star/

http://lesswrong.com/lw/mcv/effectively_less_altruistically_wrong_codex/

http://lesswrong.com/lw/m7g/open_thread_may_18_may_24_2015/cdfe

http://lesswrong.com/lw/kzf/should_people_be_writing_more_or_fewer_lw_posts/

http://lesswrong.com/lw/not/revitalizing_less_wrong_seems_like_a_lost_purpose/

http://lesswrong.com/lw/np2/revitalising_less_wrong_is_not_a_lost_purpose/

http://lesswrong.com/lw/o7b/downvotes_temporarily_disabled/

http://lesswrong.com/lw/oho/thoughts_on_operation_make_less_wrong_the_single/

(These are just the ones I recall, and they don’t include all the posts Eugene generated or the discussion in Slack.)
What links here?
- Elo's comment on On the importance of Less Wrong, or another single conversational locus by AnnaSalamon (27 Nov 2016 22:19 UTC; 25 points)
- John_Maxwell's comment on On the importance of Less Wrong, or another single conversational locus by AnnaSalamon (29 Nov 2016 11:25 UTC; 14 points)

John_Maxwell 27 Nov 2016 13:02 UTC
39 points
0
in reply to: AnnaSalamon’s comment on: On the importance of Less Wrong, or another single conversational locus
I used to do this a lot on Less Wrong; then I started thinking I should do work that was somehow “more important”. In hindsight, I think I undervalued the importance of pointing out minor reasoning/content errors on Less Wrong. “Someone is wrong on less wrong” seems to me to be an actually worth fixing; it seems like that’s how we make a community that is capable of vetting arguments.

Participating in online discussions tends to reduce one’s attention span. There’s the variable reinforcement factor. There’s also the fact that a person who comes to a discussion earlier gets more visibility. This incentivizes checking for new discussions frequently. (These two factors exacerbate one another.)

These effects are so strong that if I stay away from the internet for a few days (“internet fast”), my attention span increases dramatically. And if I’ve posted comments online yesterday, it’s hard for me to focus today—there’s always something in the back of my mind that wants to check & see if anyone’s responded. I need to refrain from making new comments for several days before I can really focus.

Lots of people have noticed that online discussions sap their productivity this way. And due to the affect heuristic, they downgrade the importance & usefulness of online discussions in general. I think this inspired Patri’s Self-Improvement or Shiny Distraction post. Like video games, Less Wrong can be distracting… so if video games are a distracting waste of time, Less Wrong must also be, right?

Except that doesn’t follow. Online content can be really valuable to read. Bloggers don’t have an incentive to pad their ideas the way book authors do. And they write simply instead of unnecessarily obfuscating like academics. (Some related discussion.)

Participating in discussions online is often high leverage. The ratio of readers to participants in online discussions can be quite high. Some numbers from the LW-sphere that back this up:
- In 2010, Kevin created a thread where he asked lurkers to say hi. The thread generated 617 comments.
- 77% of respondents to the Less Wrong survey have never posted a comment. (And this is a population of readers who were sufficiently engaged to take the survey!)
- Here’s a relatively obscure comment of mine that was voted to +2. But it was read by at least 135 logged-in users. Since 54+% of the LW readership has never registered an account, this obscure comment was likely read by 270+ people. A similar case study—deeply threaded comment posted 4 days after a top-level post, read by at least 22 logged-in users.
Based on this line of reasoning, I’m currently working on the problem of preserving focus while participating in online discussions. I’ve got some ideas, but I’d love to hear thoughts from anyone who wants to spend a minute brainstorming.

John_Maxwell 5 Nov 2012 9:10 UTC
39 points
on: Voting is like donating thousands of dollars to charity
It’s worth pointing out that the $100 you “donate” probably goes much, much less far than a $100 donation to an effective charity. So it might be better to think in terms of shifting $100 in federal funds. That makes it seem like a lot less of a slam dunk to me. Would I take half an hour out of my day to move $100 from an ineffective government agency to an effective one? Meh. Feels like my other attempts at altruism probably have a much higher expected impact.

I’m also worried about getting called up for jury duty if I re-register to vote now that I’m living in a different county.

On the whole though, fairly persuasive.

John_Maxwell 5 Aug 2012 0:30 UTC
38 points
on: What are the optimal biases to overcome?
Here are some tentative guesses about this whole rationality and success business.

Let’s set aside “rationality” for a minute and talk about mental habits. Everyone seems to agree that having the right habits is key to success, perhaps most famously the author of 7 Habits of Highly Effective People. But if you look at the 7 habits the Covey identifies (“Be Proactive”, “Begin with the End in Mind”, “Put First Things First”, “Think Win/Win”, “Seek First to Understand, Then Be Understood”, “Synergize”, and “Sharpen the Saw”) they don’t look too much like what gets discussed on Less Wrong. So what gives?

I think part of the problem is the standard pattern-matching trap. Perhaps books like Covey’s genuinely do address the factors that the vast majority of people need to work on in order to be more successful. But analytical folks tend not to read these books because
- they’re part of a genre that’s sullied its reputation by overpromising
- even when they don’t overpromise, analytical people are rarely part of the target audience, and the books do things like give incorrect folk explanations for stuff that actually happens to work (but the analytical people don’t try the stuff because they can tell the stated explanation for why it works is bogus)
- they tend to distrust their emotions, and a good part of how the books work, when they do, is by manipulating your emotions to elevate your mood and make it easier for you to get to work or implement changes
- analytical people typically manage to figure out stuff up to the level discussed in popular books for themselves, and don’t do further careful study because they’ve written off the genre
So in the same way that pure math grad students are smarter than psychology grad students, even though good psychology research is probably higher-value than good pure math research, Less Wrong has focused on a particular set of mental habits that have the right set of superficial characteristics: mental habits related to figuring out what’s true. But figuring out what’s true isn’t always that important for success. See Goals for which Less Wrong does (and doesn’t) help. (Although the focus has gradually drifted towards more generally useful mental habits since the site’s creation, I think.)

A big problem with addressing these more generally useful habits through the internet is that people who get good enough at applying them are liable to decide that surfing the internet is a waste of time and leave the conversation. I’m quite interested if anyone has any suggestions for dealing with this problem.

So when Holden Karnofsky says something like “rationality is a strong (though not perfect) predictor of success”, maybe he is claiming that mental habits that make you better at figuring out what’s true are actually quite useful in practice. (Or maybe by “rationality” he meains “instrumental rationality”, in which case his statement would be true by definition.) Perhaps the reason Stephen Covey doesn’t write about that stuff is because it’s too advanced or controversial for him or his audience?

(Disclaimer: I haven’t read The Seven Habits of Highly Effective People, although I did read the version for teenagers when I was a teenager.)

John_Maxwell 9 Apr 2021 5:40 UTC
LW: 36 AF: 14
AF
on: Testing The Natural Abstraction Hypothesis: Project Intro
I’m glad you are thinking about this. I am very optimistic about AI alignment research along these lines. However, I’m inclined to think that the strong form of the natural abstraction hypothesis is pretty much false. Different languages and different cultures, and even different academic fields within a single culture (or different researchers within a single academic field), come up with different abstractions. See for example lsusr’s posts on the color blue or the flexibility of abstract concepts. (The Whorf hypothesis might also be worth looking into.)

This is despite humans having pretty much identical cognitive architectures (assuming that we can create a de novo AGI with a cognitive architecture as similar to a human brain as human brains are to each other seems unrealistic). Perhaps you could argue that some human-generated abstractions are “natural” and others aren’t, but that leaves the problem of ensuring that the human operating our AI is making use of the correct, “natural” abstractions in their own thinking. (Some ancient cultures lacked a concept of the number 0. From our perspective, and that of a superintelligent AGI, 0 is a ‘natural’ abstraction. But there could be ways in which the superintelligent AGI invents ‘natural’ abstraction that we haven’t yet invented, such that we are living in a “pre-0 culture” with respect to this abstraction, and this would cause an ontological mismatch between us and our AGI.)

But I’m still optimistic about the overall research direction. One reason is if your dataset contains human-generated artifacts, e.g. pictures with captions written in English, then many unsupervised learning methods will naturally be incentivized to learn English-language abstractions to minimize reconstruction error. (For example, if we’re using self-supervised learning, our system will be incentivized to correctly predict the English-language caption beneath an image, which essentially requires the system to understand the picture in terms of English-language abstractions. This incentive would also arise for the more structured supervised learning task of image captioning, but the results might not be as robust.)

This is the natural abstraction hypothesis in action: across the sciences, we find that low-dimensional summaries of high-dimensional systems suffice for broad classes of “far-away” predictions, like the speed of a sled.

Social sciences are a notable exception here. And I think social sciences (or even humanities) may be the best model for alignment—‘human values’ and ‘corrigibility’ seem related to the subject matter of these fields.

Anyway, I had a few other comments on the rest of what you wrote, but I realized what they all boiled down to was me having a different set of abstractions in this domain than the ones you presented. So as an object lesson in how people can have different abstractions (heh), I’ll describe my abstractions (as they relate to the topic of abstractions) and then explain how they relate to some of the things you wrote.

I’m thinking in terms of minimizing some sort of loss function that looks vaguely like

reconstruction_error + other_stuff

where reconstruction_error is a measure of how well we’re able to recreate observed data after running it through our abstractions, and other_stuff is the part that is supposed to induce our representations to be “useful” rather than just “predictive”. You keep talking about conditional independence as the be-all-end-all of abstraction, but from my perspective, it is an interesting (potentially novel!) option for the other_stuff term in the loss function. The same way dropout was once an interesting and novel other_stuff which helped supervised learning generalize better (making neural nets “useful” rather than just “predictive” on their training set).

The most conventional choice for other_stuff would probably be some measure of the complexity of the abstraction. E.g. a clustering algorithm’s complexity can be controlled through the number of centroids, or an autoencoder’s complexity can be controlled through the number of latent dimensions. Marcus Hutter seems to be as enamored with compression as you are with conditional independence, to the point where he created the Hutter Prize, which offers half a million dollars to the person who can best compress a 1GB file of Wikipedia text.

Another option for other_stuff would be denoising, as we discussed here.

You speak of an experiment to “run a reasonably-detailed low-level simulation of something realistic; see if info-at-a-distance is low-dimensional”. My guess is if the other_stuff in your loss function consists only of conditional independence things, your representation won’t be particularly low-dimensional—your representation will see no reason to avoid the use of 100 practically-redundant dimensions when one would do the job just as well.

Similarly, you speak of “a system which provably learns all learnable abstractions”, but I’m not exactly sure what this would look like, seeing as how for pretty much any abstraction, I expect you can add a bit of junk code that marginally decreases the reconstruction error by overfitting some aspect of your training set. Or even junk code that never gets run / other functional equivalences.

The right question in my mind is how much info at a distance you can get for how many additional dimensions. There will probably be some number of dimensions N such that giving your system more than N dimensions to play with for its representation will bring diminishing returns. However, that doesn’t mean the returns will go to 0, e.g. even after you have enough dimensions to implement the ideal gas law, you can probably gain a bit more predictive power by checking for wind currents in your box. See the elbow method (though, the existence of elbows isn’t guaranteed a priori).

(I also think that an algorithm to “provably learn all learnable abstractions”, if practical, is a hop and a skip away from a superintelligent AGI. Much of the work of science is learning the correct abstractions from data, and this algorithm sounds a lot like an uberscientist.)

Anyway, in terms of investigating convergence, I’d encourage you to think about the inductive biases induced by both your loss function and also your learning algorithm. (We already know that learning algorithms can have different inductive biases than humans, e.g. it seems that the input-output surfaces for deep neural nets aren’t as biased towards smoothness as human perceptual systems, and this allows for adversarial perturbations.) You might end up proving a theorem which has required preconditions related to the loss function and/or the algorithm’s inductive bias.

Another riff on this bit:

This is the natural abstraction hypothesis in action: across the sciences, we find that low-dimensional summaries of high-dimensional systems suffice for broad classes of “far-away” predictions, like the speed of a sled.

Maybe we could differentiate between the ‘useful abstraction hypothesis’, and the stronger ‘unique abstraction hypothesis’. This statement supports the ‘useful abstraction hypothesis’, but the ‘unique abstraction hypothesis’ is the one where alignment becomes way easier because we and our AGI are using the same abstractions. (Even though I’m only a believer in the useful abstraction hypothesis, I’m still optimistic because I tend to think we can have our AGI cast a net wide enough to capture enough useful abstractions that ours are in their somewhere, and this number will be manageable enough to find the right abstractions from within that net—or something vaguely like that.) In terms of science, the ‘unique abstraction hypothesis’ doesn’t just say scientific theories can be useful, it also says there is only one ‘natural’ scientific theory for any given phenomenon, and the existence of competing scientific schools sorta seems to disprove this.

Anyway, the aspect of your project that I’m most optimistic about is this one:

This raises another algorithmic problem: how do we efficiently check whether a cognitive system has learned particular abstractions? Again, this doesn’t need to be fully general or arbitrarily precise. It just needs to be general enough to use as a tool for the next step.

Since I don’t believe in the “unique abstraction hypothesis”, checking whether a given abstraction corresponds to a human one seems important to me. The problem seems tractable, and a method that’s abstract enough to work across a variety of different learning algorithms/architectures (including stuff that might get invented in the future) could be really useful.

John_Maxwell 8 Jun 2015 19:30 UTC
34 points
on: Lesswrong, Effective Altruism Forum and Slate Star Codex: Harm Reduction
I’ve previously talked about how I think Less Wrong’s culture seems to be on a gradual trajectory towards posting less stuff and posting it in less visible places. For example, six years ago a post like this qualified as a featured post in Main. Nowadays it’s the sort of thing that would go in an Open Thread. Vaniver’s recent discussion post is the kind of thing that would have been a featured Main post in 2010.

Less Wrong is one of the few forums on the internet that actually discourages posting content. This is a feature of the culture that manifests in several ways:
- One of the first posts on the site explained why it’s important to downvote people. The post repeatedly references experiences with Usenet to provide support for this. But I think the internet has evolved a lot since Usenet. Subtle site mechanics have the potential to affect the culture of your community a lot. (I don’t think it’s a coincidence that Tumblr and 4chan have significantly different site mechanics and also significantly different cultures and even significantly different politics. Tumblr’s “replies go to the writer’s followers” mechanic leads to a concern with social desirability that 4chan’s anonymity totally lacks.)
- On reddit, if your submission is downvoted, it’s downvoted in to obscurity. On Less Wrong, downvoted posts remain on the Discussion page, creating a sort of public humiliation for people who are downvoted.
- The Main/Discussion/Open Thread distinction invites snippy comments about whether your thing would have been more appropriate for some other tier. On most social sites, readers decide how much visibility a post should get (by upvoting, sharing, etc.) Less Wrong is one of the few that leaves it down to the writer. This has advantages and disadvantages. One advantage is that important but boring scholarly work can get visibility more easily.
- Upvotes substitute for praise: instead of writing “great post” type comments, readers will upvote you, which is less of a motivator.
My experience of sitting down to write a Less Wrong post is as follows:
1. I have some interesting idea for a Less Wrong post. I sit down and excitedly start writing it out.
2. A few paragraphs in, I think of some criticism of my post that users are likely to make. I try to persevere for a while anyway.
3. Within an hour, I have thought of so many potential criticisms or reasons that my post might come across as lame that I am totally demoralized. I save my post as a draft, close the tab, and never return to it.
Contrast the LW model with the “conversational blogging” model where you sit down, scribble some thoughts out, hit post, and see what your readers think. Without worrying excessively about what readers think, you’re free to write in open mode and have creative ideas you wouldn’t have when you’re feeling self-critical.

Anyway, now that I’ve described the problem, here are some offbeat solution ideas:
- LW users move away from posting on LW and post on Medium.com instead. There aren’t upvotes or downvotes, so there’s little fear of being judged. Bad posts are “punished” by being ignored, not downvoted. And Medium.com gives you a built-in audience so you don’t need to build up a following the way you would with an independent blog. (I haven’t actually used Medium.com that much; maybe it has problems.)
- The EA community pays broke postdocs to create peer-reviewed, easily understandable blog posts on topics of interest to the EA community at large (e.g. an overview of the literature on how to improve the quality of group discussions, motivation hacking, rationality stuff, whatever). This goes on its own site. After establishing a trusted brand, we could branch out in to critiquing science journalism in order to raise the sanity waterline or other cool stuff like that.
- Someone makes it their business to read everything gets written on every blog in the EA-sphere and create a “Journal of Effective Altruism” that’s a continually updated list of links to the very best writing in the EA-sphere. This gives boring scholarly stuff a chance to get high visibility. This “Editor-in-Chief” figure could also provide commentary, link to related posts that they remember, etc. I’ll bet it wouldn’t be more than a part-time job. Ideally it would be a high status, widely trusted person in the EA community who has a good memory for related ideas.
Some of these are solutions that make more sense if the EA movement grows significantly beyond its current scope, but it can’t hurt to start kicking them around.

The top tier quality for actually read posting is dominated by one individual (a great one, but still)

Are we talking about LW proper here? Arguably this has been true over a good chunk of the site’s history: at one time it was Eliezer, then Yvain, then Lukeprog, etc.
What links here?
- John_Maxwell's comment on Effectively Less Altruistically Wrong Codex by diegocaleiro (18 Jun 2015 0:54 UTC; 3 points)
- John_Maxwell's comment on We are living in a suboptimal blogosphere by Diego_Caleiro (EA Forum; 8 Jun 2015 19:48 UTC; 2 points)

John_Maxwell 23 Jun 2022 23:47 UTC
33 points
8
in reply to: Lone Pine’s comment on: Where I agree and disagree with Eliezer
- Power makes you dumb, stay humble.
- Tell everyone in the organization that safety is their responsibility, everyone’s views are important.
- Try to be accessible and not intimidating, admit that you make mistakes.
- Schedule regular chats with underlings so they don’t have to take initiative to flag potential problems. (If you think such chats aren’t a good use of your time, another idea is to contract someone outside of the organization to do periodic informal safety chats. Chapter 9 is about how organizational outsiders are uniquely well-positioned to spot safety problems. Among other things, it seems workers are sometimes more willing to share concerns frankly with an outsider than they are with their boss.)
- Accept that not all of the critical feedback you get will be good quality.
The book disrecommends anonymous surveys on the grounds that they communicate the subtext that sharing your views openly is unsafe. I think anonymous surveys might be a good idea in the EA community though—retaliation against critics seems fairly common here (i.e. the culture of fear didn’t come about by chance). Anyone who’s been around here long enough will have figured out that sharing your views openly isn’t safe. (See also the “People are pretty justified in their fears of critiquing EA leadership/community norms” bullet point here, and the last paragraph in this comment.)

John_Maxwell 28 Jul 2020 11:31 UTC
33 points
in reply to: DragonGod’s comment on: Are we in an AI overhang?
I’m not sure it’s good for this comment to get a lot of attention? OpenAI is more altruism-oriented than a typical AI research group, and this is essentially a persuasive essay for why other groups should compete with them.

John_Maxwell 3 Dec 2015 7:10 UTC
33 points
0
on: LessWrong 2.0
Some miscellaneous thoughts:
- Online community design is an important subfield of group rationality, which is arguably more important than individual rationality. It’s hard to deny that many of the biggest group rationality failures are happening online nowadays.
- A great thing about online communities is they let you aggregate the work of a variety of sporadic contributors. People have heard of Yvain because he writes good stuff on a consistent schedule. Imagine alternate universe Yvain whose blog has two posts, spaced 6 months apart: Meditations on Moloch and The Control Group Is Out Of Control. Since alternate universe Yvain does not write on a consistent schedule, few people have heard of his blog and his insights aren’t read by many people.
I think the “Self-Improvement or Shiny Distraction” post is wrong, which is unfortunate because I suspect it played a big role in killing LW.

Let’s rewind to the dawn of the internet era. We’re having coffee with Tim Berners-Lee and talking about his new invention, the World Wide Web. Speculatively we can see the Web disrupting many industries, but predicting that the Web will disrupt academia seems downright unimaginative. Heck, Tim is using the Web to share physics research already. After all, the Web means
- An end to credentialism. Now any amateur physicist can contribute in their spare time.
- Smoother, better peer review processes.
- Cheap, universal distribution.
Academia could use a shakeup anyway: much academic writing stinks, and philosophy in particular has gone astray.

Now fast forward to the present. The academic utopia we envisioned has happened to some degree—see Wikipedia and the AskHistorians subreddit, for instance. But it hasn’t happened to the degree we hoped. Why not? I can think of a few reasons:
- Financial incentives and prestige inertia that benefit established systems. See e.g. Bryan Caplan on this.
- Lack of a profit motive. The Web revolutionized areas it was possible to get rich revolutionizing. Revolutionizing academia has much less profit potential. (Revolutionizing credentialing might make someone rich, but academia serves valuable roles for society that aren’t credentialing and are hard to make money from. For example, it certifies smart people as high status topic experts. If you’ve attended high school you know that smart people are not high status by default. We’re lucky to live in a world where journalists are more likely to interview college professors for trend pieces than celebrities. If colleges went away and cons + Mensa became the primary places smart people gathered, that might change.)
- The acceleration of addictiveness. The Web is selecting for addictive stimuli. Blogs are a more addictive version of personal websites. Twitter and Facebook are more addictive versions of blogs. If the web-based version of academia is optimizing for something other than addictiveness, it’s likely to get crowded out. (I suspect this is playing a role in Wikipedia’s decline.)
All of these factors seem surmountable, and indeed LW made decent progress despite them. They haven’t been surmounted due to a combination of apathy and this problem not being on peoples’ radar.

That’s the research side of academia. Now let’s look at the teaching side.

Imagine you’re a professor teaching a critical thinking class. Out of all the classes in the general education curriculum, the case for your class actually helping the lives of your students is among the strongest. You’re a really good teacher, and your students are so engaged with your assigned readings that they are putting off homework for other classes to do them. Sounds great right?

That’s basically the problem Patri’s post complained about. It’s a “first world” problem by professorial standards. If your students are really having issues with their other classes because they are so excited about the readings for your class, maybe do the readings during class so they aren’t a distraction while doing other homework, prevent students from reading ahead, or something like that.

The higher education bubble is likely going to “pop” eventually. (Maybe when employers realize that taking Coursera classes is a positive signal of conscientiousness, curiosity, and having the wisdom to avoid debt… Google’s HR guy is already on record saying people who make their way without college are “exceptional human beings”.) The market will provide a new solution for credentialing because there’s money in that. There’s less money in the other stuff academia does, and it’d be great if we could start laying the foundation for that now. Stretch goal: bake EA principles in from the start.
What links here?
- John_Maxwell's comment on On the importance of Less Wrong, or another single conversational locus by AnnaSalamon (27 Nov 2016 13:02 UTC; 39 points)
- Revitalizing Less Wrong seems like a lost purpose, but here are some other ideas by John_Maxwell (12 Jun 2016 7:38 UTC; 34 points)

John_Maxwell 27 Dec 2013 7:19 UTC
33 points
on: Donating to MIRI vs. FHI vs. CEA vs. CFAR
I’ve heard that CFAR is already trying to move in the direction of being self-sustaining by charging higher fees and stuff. I went to a 4-day CFAR workshop and was relatively unimpressed; my feeling about CFAR is that they are providing a service to individuals for money and it’s probably not a terrible idea to let the market determine if their services are worth the amount they charge. (In other words, if they’re not able to make a sustainable business or at least a university-style alum donor base out of what they’re doing, I’m skeptical that propping them up as a non-alum is an optimal use of your funds.)

FHI states that they are interested in using marginal donations to increase the amount of public outreach they do. It seems like FHI would have a comparative advantage over MIRI in doing outreach, given that they are guys with PhDs from Oxford and thus would have a higher level of baseline credibility with the media, etc. So it’s kind of disappointing that MIRI seems to be more outreach-focused of the two, but it seems like the fact that FHI gets most of its funding from grants means they’re restricted in what they can spend money on. FHI strikes me as more underfunded than MIRI, given that they are having to do a collaboration with an insurance company to stay afloat, whereas MIRI has maxed out all of their fundraisers to date. (Hence my decision to give to FHI this year.)

If you do want to donate to MIRI, it seems like the obvious thing to do would be to email them and tell them that you want to be a matching funds provider for one of their fundraisers, since they’re so good at maxing those out. (I think Malo would be the person to contact; you can find his email on this page.)