It seems quite possible to me that the philosophical stance + mathematical taste you’re describing aren’t “natural kinds” (e.g. the topics you listed don’t actually have a ton in common, besides being popular MIRI-sphere topics).
If that’s the case, selecting for people with the described philosophical stance + mathematical taste could basically be selecting for “people with little resistance to MIRI’s organizational narrative” (people who have formed their opinions about math + philosophy based on opinions common in/around MIRI).
selection on philosophical competence, and thus, by proxy, philosophical agreement
It sounds like you’re saying that at MIRI, you approximate a potential hire’s philosophical competence by checking to see how much they agree with you on philosophy. That doesn’t seem great for group epistemics?
I’ve been following MIRI for many years now. I’ve sometimes noticed conversations that I’m tempted to summarize as “patting each other on the back for how right we all are”. (“No one else is actually trying” has that flavor. Here is a comment I wrote recently which might also be illustrative. You could argue this is a universal human tendency, but when I look back at different organizations where I’ve worked, I don’t think any of them had it nearly as bad as MIRI does. Or at least, how bad MIRI used to have it. I believe it’s gotten a bit better in recent years.)
I think MIRI is doing important work on important problems. I also think it would be high value of information for MIRI to experiment with trying to learn from people who don’t share the “typical MIRI worldview”—people interested in topics that MIRI-sphere people don’t talk about much, people who have a somewhat different philosophical stance, etc. I think this could make MIRI’s research significantly stronger. The marginal value of talking to / hiring a researcher who’s already ~100% in agreement with you seems low compared to the marginal value of talking to / hiring a researcher who brings something new to the table.
If you’re still in the mode of “searching for more promising paths”, I think this sort of exploration strategy could be especially valuable. Perhaps you could establish some sort of visiting scholars program. This could maximize your exposure to diverse worldviews, and also encourage new researchers to be candid in their disagreements, if their employment is for a predetermined, agreed-upon length. (I know that SIAI had a visiting fellows program in years past that wasn’t that great. If you want me to help you think about how to run something better I’m happy to do that.)
Another thought is it might be helpful to try & articulate precisely what makes MIRI different from other AI safety organizations, and make sure your hiring selects for that and nothing else. When I think about what makes MIRI different from other AI safety orgs, there are some broad things that come to mind:
But there are also some much more specific things, like the ones you mentioned—interest in specific, fairly narrow mathematical & philosophical topics. From the outside it looks kinda like MIRI suffers from “not invented here” syndrome.
My personal guess is that MIRI would be a stronger org, and the AI safety ecosystem as a whole would be stronger, if MIRI expanded their scope to the bullet points I listed above and tried to eliminate the influence of “not invented here” on their hiring decisions. (My reasoning is partially based on the fact that I can’t think of AI safety organizations besides MIRI which match the bullet points I listed. I think this proposal would be an expansion into neglected research territory. I’d appreciate a correction if there are orgs I’m unaware of / not remembering.)
It seems quite possible to me that the philosophical stance + mathematical taste you’re describing aren’t “natural kinds” (e.g. the topics you listed don’t actually have a ton in common, besides being popular MIRI-sphere topics).
So, I believe that the philosophical stance is a natural kind. I can try to describe it better, but note that I won’t be able to point at it perfectly:
I would describe it as “taking seriously the idea that you are a computation[Edit: an algorithm].” (As opposed to a collection of atoms, or a location in spacetime, or a Christian soul, or any number of other things you could identify with.)
I think that most of the selection for this philosophical stance happens not in MIRI hiring, but instead in being in the LW community. I think that the sequences are actually mostly about the consequences of this philosophical stance, and that the sequences pipeline is largely creating a selection for this philosophical stance.
One can have this philosophical stance without a bunch of math ability, (many LessWrongers do) but when the philosophical stance is combined with math ability, it leads to a lot of agreement in taste in math-philosophy models, which is what you see in MIRI employees.
To make a specific (but hard to verify) claim, I think that if you were to take MIRI employees, and intervene on before they found lesswrong, and show them a lot of things like UDASSA, TDT, reflective oracles, they will be very interested in them relative to other math/philosophy ideas. Further, if you were to take people in 2000, before the existence of LW and filter on being interested in some of these ideas, you will will find people interested in many of these ideas.
(I listed ideas that came from MIRI, but there are many ideas that did not come from MIRI that people with this philosophical stance (and math ability) tend to be interested in: Logic, Probability, Game Theory, Information Theory, Algorithmic Information Theory)
(I used to not believe this. When I first started working at MIRI, I felt like I was lucky to have all of these mathematical and philosophical interests converge to the same place. I attributed it to a coincidence, but now think it has a common natural cause.)
(I think that this philosophical stance is really not enough to cause people to converge on many strategic questions. For example, I think Eliezer Yudkowsky, Jessica Taylor, Paul Christiano, and Andrew Critch all score very highly on this philosophical stance, and have a wide range of different views on timelines, probability of doom, and the strategic landscape.)
The most natural shared interest for a group united by “taking seriously the idea that you are a computation” seems like computational neuroscience, but that’s not on your list, nor do I recall it being covered in the sequences. If we were to tell 5 random philosophically inclined STEM PhD students to write a lit review on “taking seriously the idea that you are a computation” (giving them that phrase and nothing else), I’m quite doubtful we would see any sort of convergence towards the set of topics you allude to (Haskell, anthropics, mathematical logic).
As a way to quickly sample the sequences, I went to Eliezer’s userpage, sorted by score, and checked the first 5 sequence posts:
IMO very little of the content of these 5 posts fits strongly into the theme of “taking seriously the idea that you are a computation”. I think this might be another one of these rarity narrative things (computers have been a popular metaphor for the brain for decades, but we’re the only ones who take this seriously, same way we’re the only ones who are actually trying).
the sequences pipeline is largely creating a selection for this philosophical stance
I think the vast majority of people who bounce off the sequences do so either because it’s too longwinded or they don’t like Eliezer’s writing style. I predict that if you ask someone involved in trying to popularize the sequences, they will agree.
I’ve written about how “science” is inherently public...
But that’s only one vision of the future. In another vision, the knowledge we now call “science” is taken out of the public domain—the books and journals hidden away, guarded by mystic cults of gurus wearing robes, requiring fearsome initiation rituals for access—so that more people will actually study it.
I assume this has motivated a lot of the stylistic choices in the sequences and Eliezer’s other writing: the 12 virtues of rationality, the litany of Gendlin/Tarski/Hodgell, parables and fables, Jeffreyssai and his robes/masks/rituals.
I find the sequences to be longwinded and repetitive. I think Eliezer is a smart guy with interesting ideas, but if I wanted to learn quantum mechanics (or any other academic topic the sequences cover), I would learn it from someone who has devoted their life to understanding the subject and is widely recognized as a subject matter expert.
From my perspective, the question how anyone gets through all 1800+ pages of the sequences. My answer is that the post I linked is right. The mystical presentation, where Eliezer plays the role of your sensei who throws you to the mat out of nowhere if you forgot to keep your center of gravity low, really resonates with some people (and really doesn’t resonate with others). By the time someone gets through all 1800+ pages, they’ve invested a significant chunk of their ego in Eliezer and his ideas.
I agree that the phrase “taking seriously the idea that you are a computation” does not directly point at the cluster, but I still think it is a natural cluster. I think that computational neuroscience is in fact high up on the list of things I expect less wrongers to be interested in. To the extent that they are not as interested in it as other things, I think it is because it is too hard to actually get much that feels like algorithmic structure from neuroscience.
I think that the interest in anthropics is related to the fact that computations are the kind of thing that can be multiply instantiated. I think logic is a computational-like model of epistemics. I think that haskell is not really that much about this philosophy, and is more about mathematical elegance. (I think that liking elegance/simplicity is mostly different from the “I am a computation” philosophy, and is also selected for at MIRI.)
I think that a lot of the sequences (including the first and third and fourth posts in your list) are about thinking about the computation that you are running in contrast and relation to an ideal (AIXI-like) computation.
I think that That alien message is directly about getting the reader to imagine being a subprocess inside an AI, and thinking about what they would do in that situation.
I think that the politics post is not that representative of the sequences, and it bubbled to the top by karma because politics gets lots of votes.
(It does feel a little like I am justifying the connection in a way that could be used to justify false connections. I still believe that there is a cluster very roughly described as “taking seriously the idea that you are a computation” that is a natural class of ideas that is the heart of the sequences)
I think the vast majority of people who bounce off the sequences do so either because it’s too longwinded or they don’t like Eliezer’s writing style. I predict that if you ask someone involved in trying to popularize the sequences, they will agree.
I agree, but I think that the majority of people who love the sequences do so because they deeply share this philosophical stance, and don’t find it much elsewhere, more so than because they e.g. find a bunch of advice in it that actually works for them.
I think the effect you describe is also part of why people like the sequences, but I think that a stronger effect is that there are a bunch of people who had a certain class of thoughts prior to reading the sequences, didn’t see thoughts of this type before finding LessWrong, and then saw these thoughts in sequences. (I especially believe this about the kind of people who get hired at MIRI.) Prior to the sequences, they were intellectually lonely in not having people to talk to that shared this philosophical stance, that is a large part of their worldview.
I view the sequences as a collection of thoughts similar to things that I was already thinking, that was then used as a flag to connect me with people who were also already thinking the same things, more so than something that taught me a bunch of stuff. I predict a large portion of karma-weighted lesswrongers will say the same thing. (This isn’t inconsistent with your theory, but I think would be evidence of mine.)
My theory about why people like the sequences is very entangled with the philosophical stance actually being a natural cluster, and thus something that many different people would have independently.
I think that MIRI selects for the kind of person who likes the sequences, which under my theory is a philosophical stance related to being a computation, and under your theory seems entangled with little mental resistance to (some kinds of) narratives.
I notice I like “you are an algorithm” better than “you are a computation”, since “computation” feels like it could point to a specific instantiation of an algorithm, and I think that algorithm as opposed to instantiation of an algorithm is an important part of it.
To be slightly more precise, I think I historically felt like I identify with like 60% of framings in the general MIRI cluster(at least the way it appears in public outputs) and now I’m like 80%+, and part of the difference here was that I already was pretty into stuff like empiricism, materalism, Bayesianism, etc, but I previously (not very reflectively) had opinions and intuitions in the direction of thinking myself as an computational instance, and these days I can understand the algorithmic framing much better (even though it’s still not very intuitive/natural to me).
Datapoint: I’ve read the sequences and am familiar with lots of Miri-related math and philosophy, and very much think humans are atoms. I think this is compatible with 95%+ (but not 100%) of Eliezer’s writing.
Interesting. I just went and looked at some old survey results hoping I would find a question like this one. I did not find a similar question. (The lack of a question about this is itself evidence against my theory.)
(Agreement among less wrongers is not that crux-y for my belief that it is both a natural cluster and is highly selected for at MIRI, but I am still interested about the question about LW)
It sounds like you’re saying that at MIRI, you approximate a potential hire’s philosophical competence by checking to see how much they agree with you on philosophy. That doesn’t seem great for group epistemics?
I did not mean to imply that MIRI does this any more than e.g. philosophy academia.
When you don’t have sufficient objective things to use to judge competence, you end up having to use agreement as a proxy for competence. This is because when you understand a mistake, you can filter for people who do not make that mistake, but when you do not understand a mistake you are making, it is hard to filter for people that do not make that mistake.
Sometimes, you interact with someone who disagrees with you, and you talk to them, and you learn that you were making a mistake that they did not make, and this is a very good sign for competence, but you can only really get this positive signal about as often as you change your mind, which isn’t often.
Sometimes, you can also disagree with someone, and see that their position is internally consistent, which is another way you can observe some competence without agreement.
I think that personally, I use a proxy that is something like “How much do I feel like I learn(/like where my mind goes) when I am talking to the person,” which I think selects for some philosophical agreement (their concepts are not so far from my own that I can’t translate), but also some philosophical disagreement (their concepts are better than my own at making at least one thing less confusing). (This condition does not feel necessary for me. I feel like having a coherent plan is also a great sign, even if I do not feel like I learn when I am talking to the person.)
If that’s the case, selecting for people with the described philosophical stance + mathematical taste could basically be selecting for “people with little resistance to MIRI’s organizational narrative”
So, I do think that MIRI hiring does select for people with “little resistance to MIRI’s organizational narrative,” through the channel of “You have less mental resistance to narratives you agree with” and “You are more likely to work for an organization when you agree with their narrative.”
I think that additionally people have a score on “mental resistance to organizational narratives” in general, and was arguing that MIRI does not select against this property (very strongly). (Indeed, I think they select for it, but not as strongly as they select for philosophy). I think that when the OP was thinking about how much to trust her own judgement, this is the more relevant variable, and the variable they were referring to.
I don’t want to speak for/about MIRI here, but I think that I personally do the “patting each other on the back for how right we all are” more than I endorse doing it. I think the “we” is less likely to be MIRI, and more likely to be a larger group that includes people like Paul.
I agree that it would be really really great if MIRI can interact with and learn from different views. I think mostly everyone agrees with this, and has tried, and in practice, we keep hitting “inferential distance” shaped walls, and become discouraged, and (partially) give up. To be clear, there are a lot of people/ideas where I interact with them and conclude “There probably isn’t much for me to learn here,” but there are also a lot of people/ideas where I interact with them and become sad because I think there is something for me to learn there, and communicating across different ontologies is very hard.
I agree with your bullet points descriptively, but they are not exhaustive.
I agree that MIRI has strong (statistical) bias towards things that were invented internally. It is currently not clear to me how much of this statistical bias is also a mistake vs the correct reaction to how much internally invented things seem to fit our needs, and how hard it is to find the good stuff that exists externally when it exists. (I think there a lot of great ideas out there that I really wish I had, but I dont have a great method for filtering for in in the sea of irrelevant stuff.)
I agree that MIRI has strong (statistical) bias towards things that were invented internally. It is currently not clear to me how much of this statistical bias is also a mistake vs the correct reaction to how much internally invented things seem to fit our needs, and how hard it is to find the good stuff that exists externally when it exists. (I think there a lot of great ideas out there that I really wish I had, but I dont have a great method for filtering for in in the sea of irrelevant stuff.)
Strong-upvoted for this paragraph in particular, for pointing out that the strategy of “seeking out disagreement in order to learn” (which obviously isn’t how hg00 actually worded it, but seems to me descriptive of their general suggested attitude/approach) has real costs, which can sometimes be prohibitively high.
I often see this strategy contrasted with a group’s default behavior, and when this happens it is often presented as [something like] a Pareto improvement over said default behavior, with little treatment (or even acknowledgement) given to the tradeoffs involved. I think this occurs because the strategy in question is viewed as inherently virtuous (which in turn I fundamentally see as a consequence of epistemic learned helplessness run rampant, leaking past the limits of any particular domain and seeping into a general attitude towards anything considered sufficiently “hard” [read: controversial]), and attributing “virtuousness” to something often has the effect of obscuring the real costs and benefits thereof.
which in turn I fundamentally see as a consequence of epistemic learned helplessness run rampant
Not sure I follow. It seems to me that the position you’re pushing, that learning from people who disagree is prohibitively costly, is the one that goes with learned helplessness. (“We’ve tried it before, we encountered inferential distances, we gave up.”)
Suppose there are two execs at an org on the verge of building AGI. One says “MIRI seems wrong for many reasons, but we should try and talk to them anyways to see what we learn.” The other says “Nah, that’s epistemic learned helplessness, and the costs are prohibitive. Turn this baby on.” Which exec do you agree with?
This isn’t exactly hypothetical, I know someone at a top AGI org (I believe they “take seriously the idea that they are a computation/algorithm”) who reached out to MIRI and was basically ignored. It seems plausible to me that MIRI is alienating a lot of people this way, in fact. I really don’t get the impression they are spending excessive resources engaging people with different worldviews.
Anyway, one way to think about it is talking to people who disagree is just a much more efficient way to increase the accuracy of your beliefs. Suppose the population as a whole is 50⁄50 pro-Skub and anti-Skub. Suppose you learn that someone is pro-Skub. This should cause you to update in the direction that they’ve been exposed to more evidence for the pro-Skub position than the anti-Skub position. If they’re trying to learn facts about the world as quickly as possible, their time is much better spent reading an anti-Skub book than a pro-Skub book, since the pro-Skub book will have more facts they already know. An anti-Skub book also has more decision-relevant info. If they read a pro-Skub book, they’ll probably still be pro-Skub afterwards. If they read an anti-Skub book, they might change their position and therefore change their actions.
Talking to an informed anti-Skub in person is even more efficient, since the anti-Skub person can present the very most relevant/persuasive evidence that is the very most likely to change their actions.
Applying this thinking to yourself, if you’ve got a particular position you hold, that’s evidence you’ve been disproportionately exposed to facts that favor that position. If you want to get accurate beliefs quickly you should look for the strongest disconfirming evidence you can find.
None of this discussion even accounts for confirmation bias, groupthink, or information cascades! I’m getting a scary “because we read a website that’s nominally about biases, we’re pretty much immune to bias” vibe from your comment. Knowing about a bias and having implemented an effective, evidence-based debiasing intervention for it are very different.
BTW this is probably the comment that updated me the most in the direction that LW will become / already is a cult.
So I think my orientation on seeking out disagreement is roughly as follows. (This is going to be a rant I write in the middle of the night, so might be a little incoherent.)
There are two distinct tasks: 1)Generating new useful hypotheses/tools, and 2)Selecting between existing hypotheses/filtering out bad hypotheses.
There are a bunch of things that make people good at both these tasks simultaneously. Further, each of these tasks is partially helpful for doing the other. However, I still think of them as mostly distinct tasks.
I think skill at these tasks is correlated in general, but possibly anti-correlated after you filter on enough g correlates, in spite of the fact that they are each common subtasks of the other.
I don’t think this (anti-correlated given g) very confidently, but I do think it is good to track your own and others skill in the two tasks separately, because it is possible to have very different scores (and because of side effects of judging generators on reliability might make them less generative as a result of being afraid of being wrong, and similarly vise versa.)
I think that seeking out disagreement is especially useful for the selection task, and less useful for the generation task. I think that echo chambers are especially harmful for the selection task, but can sometimes be useful for the generation task. Working with someone who agrees with you on a bunch of stuff and shares your ontology allows you to build deeply faster. Someone with a lot of disagreement with you can cause you to get stuck on the basics and not get anywhere. (Sometimes disagreement can also be actively helpful for generation, but it is definitely not always helpful.)
I spend something like 90+% of my research time focused on the generation task. Sometimes I think my colleagues are seeing something that I am missing, and I seek out disagreement, so that I can get a new perspective, but the goal is to get a slightly different perspective on the thing I am working on, and not on really filtering based on which view is more true. I also sometimes do things like double-crux with people with fairly different world views, but even there, it feels like the goal is to collect new ways to think, rather than to change my mind. I think that for this task a small amount of focusing on people who disagree with you is pretty helpful, but even then, I think I get the most out of people who disagree with me a little bit, because I am more likely to be able to actually pick something up. Further, my focus is not really on actually understanding the other person, I just want to find new ways to think, so I will often translate things to something near by my ontology, and thus learn a lot, but still not be able to pass an ideological Turing test.
On the other hand, when you are not trying to find new stuff, but instead e.g. evaluate various different hypotheses about AI timelines, I think it is very important to try to understand views that are very far from your own, and take steps to avoid echo chamber effects. It is important to understand the view, the way the other person understands it, not just the way that conveniently fits with your ontology. This is my guess at the relevant skills, but I do not actually identify as especially good at this task. I am much better at generation, and I do a lot of outside-view style thinking here.
However, I think that currently, AI safety disagreements are not about two people having mostly the same ontology and disagreeing on some important variables, but rather trying to communicate across very different ontologies. This means that we have to build bridges, and the skills start to look more like generation skill. It doesn’t help to just say, “Oh, this other person thinks I am wrong, I should be less confident.” You actually have to turn that into something more productive, which means building new concepts, and a new ontology in which the views can productively dialogue. Actually talking to the person you are trying to bridge to is useful, but I think so is retreating to your echo chamber, and trying to make progress on just becoming less confused yourself.
For me, there is a handful of people who I think of as having very different views from me on AI safety, but are still close enough that I feel like I can understand them at all. When I think about how to communicate, I mostly think about bridging the gap to these people (which already feels like and impossibly hard task), and not as much the people that are really far away. Most of these people I would describe as sharing the philosophical stance I said MIRI selects for, but probably not all.
If I were focusing on resolving strategic disagreements, I would try to interact a lot more than I currently do with people who disagree with me. Currently, I am choosing to focus more on just trying to figure out how minds work in theory, which means I only interact with people who disagree with me a little. (Indeed, I currently also only interact with people who agree with me a little bit, and so am usually in an especially strong echo chamber, which is my own head.)
However, I feel pretty doomy about my current path, and might soon go back to trying to figure out what I should do, which means trying to leave the echo chamber. Often when I do this, I neither produce anything great nor change my mind, and eventually give up and go back to doing the doomy thing where at least I make some progress (at the task of figuring out how minds work in theory, which may or may not end up translating to AI safety at all).
Basically, I already do quite a bit of the “Here are a bunch of people who are about as smart as I am, and have thought about this a bunch, and have a whole bunch of views that differ from me and from each other. I should be not that confident” (although I should often take actions that are indistinguishable from confidence, since that is how you work with your inside view.) But learning from disagreements more than that is just really hard, and I don’t know how to do it, and I don’t think spending more time with them fixes it on its own. I think this would be my top priority if I had a strategy I was optimistic about, but I don’t, and so instead, I am trying to figure out how minds work, which seems like it might be useful for a bunch of different paths. (I feel like I have some learned helplessness here, but I think everyone else (not just MIRI) is also failing to learn (new ontologies, rather than just noticing mistakes) from disagreements, which makes me think it is actually pretty hard.)
Not sure I follow. It seems to me that the position you’re pushing, that learning from people who disagree is prohibitively costly, is the one that goes with learned helplessness. (“We’ve tried it before, we encountered inferential distances, we gave up.”)
I believe they are saying that cheering for seeking out disagreement is learned helplessness as opposed to doing a cost-benefit analysis about seeking out disagreement. I am not sure I get that part either.
I was also confused reading the comment, thinking that maybe they copied the wrong paragraph, and meant the 2nd paragraph.
I am interested in the fact that you find the comment so cult-y though, because I didn’t pick that up.
I am interested in the fact that you find the comment so cult-y though, because I didn’t pick that up.
It’s a fairly incoherent comment which argues that we shouldn’t work to overcome our biases or engage with people outside our group, with strawmanning that seems really flimsy… and it has a bunch of upvotes. Seems like curiosity, argument, and humility are out, and hubris is in.
which in turn I fundamentally see as a consequence of epistemic learned helplessness run rampant
I don’t understand this, but for some reason I’m interested. Could you say a couple sentences more? How does rampant learned helplessness about having correct beliefs make it more appealing to seek new information by seeking disagreement? Are you saying that there’s learned helplessness about a different strategy for relating to potential sources of information?
So, my model is that “epistemic learned helplessness” essentially stems from an inability to achieve high confidence in one’s own (gears-level) models. Specifically, by “high confidence” here I mean a level of confidence substantially higher than one would attribute to an ambient hypothesis in a particular space—if you’re not strongly confident that your model [in some domain] is better than the average competing model [in that domain], then obviously you’d prefer to adopt an exploration-based strategy (that is to say: one in which you seek out disagreeing hypotheses in order to increase the variance of your information intake) with respect to that domain.
I think this is correct, so far as it goes, as long as we are in fact restricting our focus to some domain or set of domains. That is to say: as humans, naturally it’s impossible to explore every domain in sufficient depth that we can form and hold high confidence in gears-level model for said domain, which in turn means there will obviously be some domains in which “epistemic learned helplessness” is simply the correct attitude to take. (And indeed, the original blog post in which Scott introduced the concept of “epistemic learned helplessness” does in fact contextualize it using history books as an example.)
Where I think this goes wrong, however, is when the proponent of “epistemic learned helplessness” fails to realize that this attitude’s appropriateness is actually a function of one’s confidence in some particular domain, and instead allows the attitude to seep into every domain. Once that happens, “inability to achieve confidence in own’s own models” ceases to be a rational reaction to a lack of knowledge, and instead turns into an omnipresent fog clouding over everything you think and do. (And the exploration-based strategy I outlined above ceases to be a rational reaction to a lack of confidence, and instead turns into a strategy that’s always correct and virtuous.)
This is the sense in which I characterized the result as
a consequence of epistemic learned helplessness run rampant, leaking past the limits of any particular domain and seeping into a general attitude towards anything considered sufficiently “hard”
(Note the importance of the disclaimer “hard”. For example, I’ve yet to encounter anyone whose “epistemic learned helplessness” is so extreme that they stop to question e.g. whether they are in fact capable of driving a car. But that in itself is not particularly reassuring, especially when domains we care about include stuff labeled “hard”.)
Now for the rub: I think anyone working on AI alignment (or any technical question of comparable difficulty) mustn’t exhibit this attitude with respect to [the thing they’re working on]. If you have a problem where you’re not able to achieve high confidence in your own models of something (relative to competing ambient models), you’re not going to be able to follow your own thoughts far enough to do good work—not without being interrupted by thoughts like “But if I multiply the probability of this assumption being true, by the probability of that assumption being true, by the probability of that assumption being true...” and “But [insert smart person here] thinks this assumption is unlikely to be true, so what probability should I assign to it really?”
I think this is very bad. And since I think it’s very bad, naturally I will strongly oppose attempts to increase pressure in that particular direction—especially since I think pressure to think this way in this particular community is already ALARMINGLY HIGH. I think “epistemic learned helplessness” (which sometimes goes by more innocuous names as well, like fox epistemology or modest epistemology) is epistemically corrosive once it has breached quarantine, and by and large I think it has breached quarantine for a dismayingly large number of people (though thankfully my impression is that this has largely not occurred at MIRI).
It seems like you wanted me to respond to this comment, so I’ll write a quick reply.
Now for the rub: I think anyone working on AI alignment (or any technical question of comparable difficulty) mustn’t exhibit this attitude with respect to [the thing they’re working on]. If you have a problem where you’re not able to achieve high confidence in your own models of something (relative to competing ambient models), you’re not going to be able to follow your own thoughts far enough to do good work—not without being interrupted by thoughts like “But if I multiply the probability of this assumption being true, by the probability of that assumption being true, by the probability of that assumption being true...” and “But [insert smart person here] thinks this assumption is unlikely to be true, so what probability should I assign to it really?”
This doesn’t seem true for me. I think through details of exotic hypotheticals all the time.
Maybe others are different. But it seems like maybe you’re proposing that people self-deceive in order to get themselves confident enough to explore the ramifications of a particular hypothesis. I think we should be a bit skeptical of intentional self-deception. And if self-deception is really necessary, let’s make it a temporary suspension of belief sort of thing, as opposed to a life belief that leads you to not talk to those with other views.
It’s been a while since I read Inadequate Equilibria. But I remember the message of the book being fairly nuanced. For example, it seems pretty likely to me that there’s no specific passage which contradicts the statement “hedgehogs make better predictions on average than foxes”.
I support people trying to figure things out for themselves, and I apologize if I unintentionally discouraged anyone from doing that—it wasn’t my intention. I also think people consider learning from disagreement to be virtuous for a good reason, not just due to “epistemic learned helplessness”. Also, learning from disagreement seems importantly different from generic deference—especially if you took the time to learn about their views and found yourself unpersuaded. Basically, I think people should account for both known unknowns (in the form of people who disagree whose views you don’t understand) and unknown unknowns, but it seems OK to not defer to the masses / defer to authorities if you have a solid grasp of how they came to their conclusion (this is my attempt to restate the thesis of Inadequate Equilibria as I remember it).
I don’t deny that learning from disagreement has costs. Probably some people do it too much. I encouraged MIRI to do it more on the margin, but it could be that my guess about their current margin is incorrect, who knows.
But it seems like maybe you’re proposing that people self-deceive in order to get themselves confident enough to explore the ramifications of a particular hypothesis. I think we should be a bit skeptical of intentional self-deception.
I want to clarify that this is not my proposal, and to the extent that it had been someone’s proposal, I would be approximately as wary about it as you are. I think self-deception is quite bad on average, and even on occasions when it’s good, that fact isn’t predictable in advance, making choosing to self-deceive pretty much always a negative expected-value action.
The reason I suspect you interpreted this as my proposal is that you’re speaking from a frame where “confidence in one’s model” basically doesn’t happen by default, so to get there people need to self-deceive, i.e. there’s no way for someone [in a sufficiently “hard” domain] to have a model and be confident in that model without doing [something like] artificially inflating their confidence higher than it actually is.
I think this is basically false. I claim that having (real, not artificial) confidence in a given model (even of something “hard”) is entirely possible, and moreover happens naturally, as part of the process of constructing a gears-level model to begin with. If your gears-level model actually captures some relevant fraction of the problem domain, I claim it will be obviously the case that it does so—and therefore a researcher holding that model would be very much justified in placing high confidence in [that part of] their model.
How much should such a researcher be swayed by the mere knowledge that other researchers disagree? I claim the ideal answer is “not at all”, for the same reason that argument screens off authority. And I agree that, from the perspective of somebody on the outside (who only has access to the information that two similarly-credentialed researchers disagree, without access to the gears in question), this can look basically like self-deception. But (I claim) from the inside the difference is very obvious, and not at all reminiscent of self-deception.
(Some fields do not admit good gears-level models at all, and therefore it’s basically impossible to achieve the epistemic state described above. For people in such fields, they might plausibly imagine that all fields have this property. But this isn’t the case—and in fact, I would argue that the division of the sciences into “harder” and “softer” is actually pointing at precisely this distinction: the “hardness” attributed to a field is in fact a measure of how possible it is to form a strong gears-level model.)
Does this mean “learning from disagreement” is useless? Not necessarily; gears-level models can also be wrong and/or incomplete, and one entirely plausible (and sometimes quite useful) mechanism by which to patch up incomplete models is to exchange gears with someone else, who may not be working with quite the same toolbox as you. But (I claim) for this process to actually help, it should done in a targeted way: ideally, you’re going into the conversation already with some idea of what you hope to get out of it, having picked your partner beforehand for their likeliness to have gears you personally are missing. If you’re merely “seeking out disagreement” for the purpose of fulfilling a quota, that (I claim) is unlikely to lead anywhere productive. (And I view your exhortations for MIRI to “seek out more disagreement on the margin” as proposing essentially just such a quota.)
(Standard disclaimer: I am not affiliated with MIRI, and my views do not necessarily reflect their views, etc.)
I think mostly everyone agrees with this, and has tried, and in practice, we keep hitting “inferential distance” shaped walls, and become discouraged, and (partially) give up.
I’ve found an unexpected benefit of trying to explain my thinking and overcome the inferential distance is that I think of arguments which change my mind. Just having another person to bounce ideas off of causes me to look at things differently, which sometimes produces new insights. See also the book passage I quoted here.
Note that I think the form of inferential distance is often about trying to communicate across different ontologies. Sometimes a person will even correctly get the arguments of their discussion partner to the point where they can internally inhabit that point of view, but it is still hard to get the argument to dialogue productively with your other views because the two viewpoints have such different ontologies.
It seems quite possible to me that the philosophical stance + mathematical taste you’re describing aren’t “natural kinds” (e.g. the topics you listed don’t actually have a ton in common, besides being popular MIRI-sphere topics).
If that’s the case, selecting for people with the described philosophical stance + mathematical taste could basically be selecting for “people with little resistance to MIRI’s organizational narrative” (people who have formed their opinions about math + philosophy based on opinions common in/around MIRI).
It sounds like you’re saying that at MIRI, you approximate a potential hire’s philosophical competence by checking to see how much they agree with you on philosophy. That doesn’t seem great for group epistemics?
I’ve been following MIRI for many years now. I’ve sometimes noticed conversations that I’m tempted to summarize as “patting each other on the back for how right we all are”. (“No one else is actually trying” has that flavor. Here is a comment I wrote recently which might also be illustrative. You could argue this is a universal human tendency, but when I look back at different organizations where I’ve worked, I don’t think any of them had it nearly as bad as MIRI does. Or at least, how bad MIRI used to have it. I believe it’s gotten a bit better in recent years.)
I think MIRI is doing important work on important problems. I also think it would be high value of information for MIRI to experiment with trying to learn from people who don’t share the “typical MIRI worldview”—people interested in topics that MIRI-sphere people don’t talk about much, people who have a somewhat different philosophical stance, etc. I think this could make MIRI’s research significantly stronger. The marginal value of talking to / hiring a researcher who’s already ~100% in agreement with you seems low compared to the marginal value of talking to / hiring a researcher who brings something new to the table.
If you’re still in the mode of “searching for more promising paths”, I think this sort of exploration strategy could be especially valuable. Perhaps you could establish some sort of visiting scholars program. This could maximize your exposure to diverse worldviews, and also encourage new researchers to be candid in their disagreements, if their employment is for a predetermined, agreed-upon length. (I know that SIAI had a visiting fellows program in years past that wasn’t that great. If you want me to help you think about how to run something better I’m happy to do that.)
Another thought is it might be helpful to try & articulate precisely what makes MIRI different from other AI safety organizations, and make sure your hiring selects for that and nothing else. When I think about what makes MIRI different from other AI safety orgs, there are some broad things that come to mind:
A focus on conceptual research over applied research
Relative disinterest in deep learning; greater willingness to invent stuff from the ground up
Higher paranoia
Below average credentialism
But there are also some much more specific things, like the ones you mentioned—interest in specific, fairly narrow mathematical & philosophical topics. From the outside it looks kinda like MIRI suffers from “not invented here” syndrome.
My personal guess is that MIRI would be a stronger org, and the AI safety ecosystem as a whole would be stronger, if MIRI expanded their scope to the bullet points I listed above and tried to eliminate the influence of “not invented here” on their hiring decisions. (My reasoning is partially based on the fact that I can’t think of AI safety organizations besides MIRI which match the bullet points I listed. I think this proposal would be an expansion into neglected research territory. I’d appreciate a correction if there are orgs I’m unaware of / not remembering.)
So, I believe that the philosophical stance is a natural kind. I can try to describe it better, but note that I won’t be able to point at it perfectly:
I would describe it as “taking seriously the idea that you are
a computation[Edit: an algorithm].” (As opposed to a collection of atoms, or a location in spacetime, or a Christian soul, or any number of other things you could identify with.)I think that most of the selection for this philosophical stance happens not in MIRI hiring, but instead in being in the LW community. I think that the sequences are actually mostly about the consequences of this philosophical stance, and that the sequences pipeline is largely creating a selection for this philosophical stance.
One can have this philosophical stance without a bunch of math ability, (many LessWrongers do) but when the philosophical stance is combined with math ability, it leads to a lot of agreement in taste in math-philosophy models, which is what you see in MIRI employees.
To make a specific (but hard to verify) claim, I think that if you were to take MIRI employees, and intervene on before they found lesswrong, and show them a lot of things like UDASSA, TDT, reflective oracles, they will be very interested in them relative to other math/philosophy ideas. Further, if you were to take people in 2000, before the existence of LW and filter on being interested in some of these ideas, you will will find people interested in many of these ideas.
(I listed ideas that came from MIRI, but there are many ideas that did not come from MIRI that people with this philosophical stance (and math ability) tend to be interested in: Logic, Probability, Game Theory, Information Theory, Algorithmic Information Theory)
(I used to not believe this. When I first started working at MIRI, I felt like I was lucky to have all of these mathematical and philosophical interests converge to the same place. I attributed it to a coincidence, but now think it has a common natural cause.)
(I think that this philosophical stance is really not enough to cause people to converge on many strategic questions. For example, I think Eliezer Yudkowsky, Jessica Taylor, Paul Christiano, and Andrew Critch all score very highly on this philosophical stance, and have a wide range of different views on timelines, probability of doom, and the strategic landscape.)
The most natural shared interest for a group united by “taking seriously the idea that you are a computation” seems like computational neuroscience, but that’s not on your list, nor do I recall it being covered in the sequences. If we were to tell 5 random philosophically inclined STEM PhD students to write a lit review on “taking seriously the idea that you are a computation” (giving them that phrase and nothing else), I’m quite doubtful we would see any sort of convergence towards the set of topics you allude to (Haskell, anthropics, mathematical logic).
As a way to quickly sample the sequences, I went to Eliezer’s userpage, sorted by score, and checked the first 5 sequence posts:
https://www.lesswrong.com/posts/a7n8GdKiAZRX86T5A/making-beliefs-pay-rent-in-anticipated-experiences
https://www.lesswrong.com/posts/5wMcKNAwB6X4mp9og/that-alien-message
https://www.lesswrong.com/posts/RcZCwxFiZzE6X7nsv/what-do-we-mean-by-rationality-1
https://www.lesswrong.com/posts/HLqWn5LASfhhArZ7w/expecting-short-inferential-distances
https://www.lesswrong.com/posts/6hfGNLf4Hg5DXqJCF/a-fable-of-science-and-politics
IMO very little of the content of these 5 posts fits strongly into the theme of “taking seriously the idea that you are a computation”. I think this might be another one of these rarity narrative things (computers have been a popular metaphor for the brain for decades, but we’re the only ones who take this seriously, same way we’re the only ones who are actually trying).
I think the vast majority of people who bounce off the sequences do so either because it’s too longwinded or they don’t like Eliezer’s writing style. I predict that if you ask someone involved in trying to popularize the sequences, they will agree.
In this post Eliezer wrote:
I assume this has motivated a lot of the stylistic choices in the sequences and Eliezer’s other writing: the 12 virtues of rationality, the litany of Gendlin/Tarski/Hodgell, parables and fables, Jeffreyssai and his robes/masks/rituals.
I find the sequences to be longwinded and repetitive. I think Eliezer is a smart guy with interesting ideas, but if I wanted to learn quantum mechanics (or any other academic topic the sequences cover), I would learn it from someone who has devoted their life to understanding the subject and is widely recognized as a subject matter expert.
From my perspective, the question how anyone gets through all 1800+ pages of the sequences. My answer is that the post I linked is right. The mystical presentation, where Eliezer plays the role of your sensei who throws you to the mat out of nowhere if you forgot to keep your center of gravity low, really resonates with some people (and really doesn’t resonate with others). By the time someone gets through all 1800+ pages, they’ve invested a significant chunk of their ego in Eliezer and his ideas.
I agree that the phrase “taking seriously the idea that you are a computation” does not directly point at the cluster, but I still think it is a natural cluster. I think that computational neuroscience is in fact high up on the list of things I expect less wrongers to be interested in. To the extent that they are not as interested in it as other things, I think it is because it is too hard to actually get much that feels like algorithmic structure from neuroscience.
I think that the interest in anthropics is related to the fact that computations are the kind of thing that can be multiply instantiated. I think logic is a computational-like model of epistemics. I think that haskell is not really that much about this philosophy, and is more about mathematical elegance. (I think that liking elegance/simplicity is mostly different from the “I am a computation” philosophy, and is also selected for at MIRI.)
I think that a lot of the sequences (including the first and third and fourth posts in your list) are about thinking about the computation that you are running in contrast and relation to an ideal (AIXI-like) computation.
I think that That alien message is directly about getting the reader to imagine being a subprocess inside an AI, and thinking about what they would do in that situation.
I think that the politics post is not that representative of the sequences, and it bubbled to the top by karma because politics gets lots of votes.
(It does feel a little like I am justifying the connection in a way that could be used to justify false connections. I still believe that there is a cluster very roughly described as “taking seriously the idea that you are a computation” that is a natural class of ideas that is the heart of the sequences)
I agree, but I think that the majority of people who love the sequences do so because they deeply share this philosophical stance, and don’t find it much elsewhere, more so than because they e.g. find a bunch of advice in it that actually works for them.
I think the effect you describe is also part of why people like the sequences, but I think that a stronger effect is that there are a bunch of people who had a certain class of thoughts prior to reading the sequences, didn’t see thoughts of this type before finding LessWrong, and then saw these thoughts in sequences. (I especially believe this about the kind of people who get hired at MIRI.) Prior to the sequences, they were intellectually lonely in not having people to talk to that shared this philosophical stance, that is a large part of their worldview.
I view the sequences as a collection of thoughts similar to things that I was already thinking, that was then used as a flag to connect me with people who were also already thinking the same things, more so than something that taught me a bunch of stuff. I predict a large portion of karma-weighted lesswrongers will say the same thing. (This isn’t inconsistent with your theory, but I think would be evidence of mine.)
My theory about why people like the sequences is very entangled with the philosophical stance actually being a natural cluster, and thus something that many different people would have independently.
I think that MIRI selects for the kind of person who likes the sequences, which under my theory is a philosophical stance related to being a computation, and under your theory seems entangled with little mental resistance to (some kinds of) narratives.
I notice I like “you are an algorithm” better than “you are a computation”, since “computation” feels like it could point to a specific instantiation of an algorithm, and I think that algorithm as opposed to instantiation of an algorithm is an important part of it.
This sounds right to me. FDT feels more natural when I think of myself as an algorithm than when I think of myself as a computation, for example.
To be slightly more precise, I think I historically felt like I identify with like 60% of framings in the general MIRI cluster(at least the way it appears in public outputs) and now I’m like 80%+, and part of the difference here was that I already was pretty into stuff like empiricism, materalism, Bayesianism, etc, but I previously (not very reflectively) had opinions and intuitions in the direction of thinking myself as an computational instance, and these days I can understand the algorithmic framing much better (even though it’s still not very intuitive/natural to me).
(Numbers made up and not well thought out)
Datapoint: I’ve read the sequences and am familiar with lots of Miri-related math and philosophy, and very much think humans are atoms. I think this is compatible with 95%+ (but not 100%) of Eliezer’s writing.
Interesting. I just went and looked at some old survey results hoping I would find a question like this one. I did not find a similar question. (The lack of a question about this is itself evidence against my theory.)
(Agreement among less wrongers is not that crux-y for my belief that it is both a natural cluster and is highly selected for at MIRI, but I am still interested about the question about LW)
I did not mean to imply that MIRI does this any more than e.g. philosophy academia.
When you don’t have sufficient objective things to use to judge competence, you end up having to use agreement as a proxy for competence. This is because when you understand a mistake, you can filter for people who do not make that mistake, but when you do not understand a mistake you are making, it is hard to filter for people that do not make that mistake.
Sometimes, you interact with someone who disagrees with you, and you talk to them, and you learn that you were making a mistake that they did not make, and this is a very good sign for competence, but you can only really get this positive signal about as often as you change your mind, which isn’t often.
Sometimes, you can also disagree with someone, and see that their position is internally consistent, which is another way you can observe some competence without agreement.
I think that personally, I use a proxy that is something like “How much do I feel like I learn(/like where my mind goes) when I am talking to the person,” which I think selects for some philosophical agreement (their concepts are not so far from my own that I can’t translate), but also some philosophical disagreement (their concepts are better than my own at making at least one thing less confusing). (This condition does not feel necessary for me. I feel like having a coherent plan is also a great sign, even if I do not feel like I learn when I am talking to the person.)
So, I do think that MIRI hiring does select for people with “little resistance to MIRI’s organizational narrative,” through the channel of “You have less mental resistance to narratives you agree with” and “You are more likely to work for an organization when you agree with their narrative.”
I think that additionally people have a score on “mental resistance to organizational narratives” in general, and was arguing that MIRI does not select against this property (very strongly). (Indeed, I think they select for it, but not as strongly as they select for philosophy). I think that when the OP was thinking about how much to trust her own judgement, this is the more relevant variable, and the variable they were referring to.
I don’t want to speak for/about MIRI here, but I think that I personally do the “patting each other on the back for how right we all are” more than I endorse doing it. I think the “we” is less likely to be MIRI, and more likely to be a larger group that includes people like Paul.
I agree that it would be really really great if MIRI can interact with and learn from different views. I think mostly everyone agrees with this, and has tried, and in practice, we keep hitting “inferential distance” shaped walls, and become discouraged, and (partially) give up. To be clear, there are a lot of people/ideas where I interact with them and conclude “There probably isn’t much for me to learn here,” but there are also a lot of people/ideas where I interact with them and become sad because I think there is something for me to learn there, and communicating across different ontologies is very hard.
I agree with your bullet points descriptively, but they are not exhaustive.
I agree that MIRI has strong (statistical) bias towards things that were invented internally. It is currently not clear to me how much of this statistical bias is also a mistake vs the correct reaction to how much internally invented things seem to fit our needs, and how hard it is to find the good stuff that exists externally when it exists. (I think there a lot of great ideas out there that I really wish I had, but I dont have a great method for filtering for in in the sea of irrelevant stuff.)
Strong-upvoted for this paragraph in particular, for pointing out that the strategy of “seeking out disagreement in order to learn” (which obviously isn’t how hg00 actually worded it, but seems to me descriptive of their general suggested attitude/approach) has real costs, which can sometimes be prohibitively high.
I often see this strategy contrasted with a group’s default behavior, and when this happens it is often presented as [something like] a Pareto improvement over said default behavior, with little treatment (or even acknowledgement) given to the tradeoffs involved. I think this occurs because the strategy in question is viewed as inherently virtuous (which in turn I fundamentally see as a consequence of epistemic learned helplessness run rampant, leaking past the limits of any particular domain and seeping into a general attitude towards anything considered sufficiently “hard” [read: controversial]), and attributing “virtuousness” to something often has the effect of obscuring the real costs and benefits thereof.
Not sure I follow. It seems to me that the position you’re pushing, that learning from people who disagree is prohibitively costly, is the one that goes with learned helplessness. (“We’ve tried it before, we encountered inferential distances, we gave up.”)
Suppose there are two execs at an org on the verge of building AGI. One says “MIRI seems wrong for many reasons, but we should try and talk to them anyways to see what we learn.” The other says “Nah, that’s epistemic learned helplessness, and the costs are prohibitive. Turn this baby on.” Which exec do you agree with?
This isn’t exactly hypothetical, I know someone at a top AGI org (I believe they “take seriously the idea that they are a computation/algorithm”) who reached out to MIRI and was basically ignored. It seems plausible to me that MIRI is alienating a lot of people this way, in fact. I really don’t get the impression they are spending excessive resources engaging people with different worldviews.
Anyway, one way to think about it is talking to people who disagree is just a much more efficient way to increase the accuracy of your beliefs. Suppose the population as a whole is 50⁄50 pro-Skub and anti-Skub. Suppose you learn that someone is pro-Skub. This should cause you to update in the direction that they’ve been exposed to more evidence for the pro-Skub position than the anti-Skub position. If they’re trying to learn facts about the world as quickly as possible, their time is much better spent reading an anti-Skub book than a pro-Skub book, since the pro-Skub book will have more facts they already know. An anti-Skub book also has more decision-relevant info. If they read a pro-Skub book, they’ll probably still be pro-Skub afterwards. If they read an anti-Skub book, they might change their position and therefore change their actions.
Talking to an informed anti-Skub in person is even more efficient, since the anti-Skub person can present the very most relevant/persuasive evidence that is the very most likely to change their actions.
Applying this thinking to yourself, if you’ve got a particular position you hold, that’s evidence you’ve been disproportionately exposed to facts that favor that position. If you want to get accurate beliefs quickly you should look for the strongest disconfirming evidence you can find.
None of this discussion even accounts for confirmation bias, groupthink, or information cascades! I’m getting a scary “because we read a website that’s nominally about biases, we’re pretty much immune to bias” vibe from your comment. Knowing about a bias and having implemented an effective, evidence-based debiasing intervention for it are very different.
BTW this is probably the comment that updated me the most in the direction that LW will become / already is a cult.
So I think my orientation on seeking out disagreement is roughly as follows. (This is going to be a rant I write in the middle of the night, so might be a little incoherent.)
There are two distinct tasks: 1)Generating new useful hypotheses/tools, and 2)Selecting between existing hypotheses/filtering out bad hypotheses.
There are a bunch of things that make people good at both these tasks simultaneously. Further, each of these tasks is partially helpful for doing the other. However, I still think of them as mostly distinct tasks.
I think skill at these tasks is correlated in general, but possibly anti-correlated after you filter on enough g correlates, in spite of the fact that they are each common subtasks of the other.
I don’t think this (anti-correlated given g) very confidently, but I do think it is good to track your own and others skill in the two tasks separately, because it is possible to have very different scores (and because of side effects of judging generators on reliability might make them less generative as a result of being afraid of being wrong, and similarly vise versa.)
I think that seeking out disagreement is especially useful for the selection task, and less useful for the generation task. I think that echo chambers are especially harmful for the selection task, but can sometimes be useful for the generation task. Working with someone who agrees with you on a bunch of stuff and shares your ontology allows you to build deeply faster. Someone with a lot of disagreement with you can cause you to get stuck on the basics and not get anywhere. (Sometimes disagreement can also be actively helpful for generation, but it is definitely not always helpful.)
I spend something like 90+% of my research time focused on the generation task. Sometimes I think my colleagues are seeing something that I am missing, and I seek out disagreement, so that I can get a new perspective, but the goal is to get a slightly different perspective on the thing I am working on, and not on really filtering based on which view is more true. I also sometimes do things like double-crux with people with fairly different world views, but even there, it feels like the goal is to collect new ways to think, rather than to change my mind. I think that for this task a small amount of focusing on people who disagree with you is pretty helpful, but even then, I think I get the most out of people who disagree with me a little bit, because I am more likely to be able to actually pick something up. Further, my focus is not really on actually understanding the other person, I just want to find new ways to think, so I will often translate things to something near by my ontology, and thus learn a lot, but still not be able to pass an ideological Turing test.
On the other hand, when you are not trying to find new stuff, but instead e.g. evaluate various different hypotheses about AI timelines, I think it is very important to try to understand views that are very far from your own, and take steps to avoid echo chamber effects. It is important to understand the view, the way the other person understands it, not just the way that conveniently fits with your ontology. This is my guess at the relevant skills, but I do not actually identify as especially good at this task. I am much better at generation, and I do a lot of outside-view style thinking here.
However, I think that currently, AI safety disagreements are not about two people having mostly the same ontology and disagreeing on some important variables, but rather trying to communicate across very different ontologies. This means that we have to build bridges, and the skills start to look more like generation skill. It doesn’t help to just say, “Oh, this other person thinks I am wrong, I should be less confident.” You actually have to turn that into something more productive, which means building new concepts, and a new ontology in which the views can productively dialogue. Actually talking to the person you are trying to bridge to is useful, but I think so is retreating to your echo chamber, and trying to make progress on just becoming less confused yourself.
For me, there is a handful of people who I think of as having very different views from me on AI safety, but are still close enough that I feel like I can understand them at all. When I think about how to communicate, I mostly think about bridging the gap to these people (which already feels like and impossibly hard task), and not as much the people that are really far away. Most of these people I would describe as sharing the philosophical stance I said MIRI selects for, but probably not all.
If I were focusing on resolving strategic disagreements, I would try to interact a lot more than I currently do with people who disagree with me. Currently, I am choosing to focus more on just trying to figure out how minds work in theory, which means I only interact with people who disagree with me a little. (Indeed, I currently also only interact with people who agree with me a little bit, and so am usually in an especially strong echo chamber, which is my own head.)
However, I feel pretty doomy about my current path, and might soon go back to trying to figure out what I should do, which means trying to leave the echo chamber. Often when I do this, I neither produce anything great nor change my mind, and eventually give up and go back to doing the doomy thing where at least I make some progress (at the task of figuring out how minds work in theory, which may or may not end up translating to AI safety at all).
Basically, I already do quite a bit of the “Here are a bunch of people who are about as smart as I am, and have thought about this a bunch, and have a whole bunch of views that differ from me and from each other. I should be not that confident” (although I should often take actions that are indistinguishable from confidence, since that is how you work with your inside view.) But learning from disagreements more than that is just really hard, and I don’t know how to do it, and I don’t think spending more time with them fixes it on its own. I think this would be my top priority if I had a strategy I was optimistic about, but I don’t, and so instead, I am trying to figure out how minds work, which seems like it might be useful for a bunch of different paths. (I feel like I have some learned helplessness here, but I think everyone else (not just MIRI) is also failing to learn (new ontologies, rather than just noticing mistakes) from disagreements, which makes me think it is actually pretty hard.)
I believe they are saying that cheering for seeking out disagreement is learned helplessness as opposed to doing a cost-benefit analysis about seeking out disagreement. I am not sure I get that part either.
I was also confused reading the comment, thinking that maybe they copied the wrong paragraph, and meant the 2nd paragraph.
I am interested in the fact that you find the comment so cult-y though, because I didn’t pick that up.
It’s a fairly incoherent comment which argues that we shouldn’t work to overcome our biases or engage with people outside our group, with strawmanning that seems really flimsy… and it has a bunch of upvotes. Seems like curiosity, argument, and humility are out, and hubris is in.
+1
I don’t understand this, but for some reason I’m interested. Could you say a couple sentences more? How does rampant learned helplessness about having correct beliefs make it more appealing to seek new information by seeking disagreement? Are you saying that there’s learned helplessness about a different strategy for relating to potential sources of information?
So, my model is that “epistemic learned helplessness” essentially stems from an inability to achieve high confidence in one’s own (gears-level) models. Specifically, by “high confidence” here I mean a level of confidence substantially higher than one would attribute to an ambient hypothesis in a particular space—if you’re not strongly confident that your model [in some domain] is better than the average competing model [in that domain], then obviously you’d prefer to adopt an exploration-based strategy (that is to say: one in which you seek out disagreeing hypotheses in order to increase the variance of your information intake) with respect to that domain.
I think this is correct, so far as it goes, as long as we are in fact restricting our focus to some domain or set of domains. That is to say: as humans, naturally it’s impossible to explore every domain in sufficient depth that we can form and hold high confidence in gears-level model for said domain, which in turn means there will obviously be some domains in which “epistemic learned helplessness” is simply the correct attitude to take. (And indeed, the original blog post in which Scott introduced the concept of “epistemic learned helplessness” does in fact contextualize it using history books as an example.)
Where I think this goes wrong, however, is when the proponent of “epistemic learned helplessness” fails to realize that this attitude’s appropriateness is actually a function of one’s confidence in some particular domain, and instead allows the attitude to seep into every domain. Once that happens, “inability to achieve confidence in own’s own models” ceases to be a rational reaction to a lack of knowledge, and instead turns into an omnipresent fog clouding over everything you think and do. (And the exploration-based strategy I outlined above ceases to be a rational reaction to a lack of confidence, and instead turns into a strategy that’s always correct and virtuous.)
This is the sense in which I characterized the result as
(Note the importance of the disclaimer “hard”. For example, I’ve yet to encounter anyone whose “epistemic learned helplessness” is so extreme that they stop to question e.g. whether they are in fact capable of driving a car. But that in itself is not particularly reassuring, especially when domains we care about include stuff labeled “hard”.)
Now for the rub: I think anyone working on AI alignment (or any technical question of comparable difficulty) mustn’t exhibit this attitude with respect to [the thing they’re working on]. If you have a problem where you’re not able to achieve high confidence in your own models of something (relative to competing ambient models), you’re not going to be able to follow your own thoughts far enough to do good work—not without being interrupted by thoughts like “But if I multiply the probability of this assumption being true, by the probability of that assumption being true, by the probability of that assumption being true...” and “But [insert smart person here] thinks this assumption is unlikely to be true, so what probability should I assign to it really?”
I think this is very bad. And since I think it’s very bad, naturally I will strongly oppose attempts to increase pressure in that particular direction—especially since I think pressure to think this way in this particular community is already ALARMINGLY HIGH. I think “epistemic learned helplessness” (which sometimes goes by more innocuous names as well, like fox epistemology or modest epistemology) is epistemically corrosive once it has breached quarantine, and by and large I think it has breached quarantine for a dismayingly large number of people (though thankfully my impression is that this has largely not occurred at MIRI).
It seems like you wanted me to respond to this comment, so I’ll write a quick reply.
This doesn’t seem true for me. I think through details of exotic hypotheticals all the time.
Maybe others are different. But it seems like maybe you’re proposing that people self-deceive in order to get themselves confident enough to explore the ramifications of a particular hypothesis. I think we should be a bit skeptical of intentional self-deception. And if self-deception is really necessary, let’s make it a temporary suspension of belief sort of thing, as opposed to a life belief that leads you to not talk to those with other views.
It’s been a while since I read Inadequate Equilibria. But I remember the message of the book being fairly nuanced. For example, it seems pretty likely to me that there’s no specific passage which contradicts the statement “hedgehogs make better predictions on average than foxes”.
I support people trying to figure things out for themselves, and I apologize if I unintentionally discouraged anyone from doing that—it wasn’t my intention. I also think people consider learning from disagreement to be virtuous for a good reason, not just due to “epistemic learned helplessness”. Also, learning from disagreement seems importantly different from generic deference—especially if you took the time to learn about their views and found yourself unpersuaded. Basically, I think people should account for both known unknowns (in the form of people who disagree whose views you don’t understand) and unknown unknowns, but it seems OK to not defer to the masses / defer to authorities if you have a solid grasp of how they came to their conclusion (this is my attempt to restate the thesis of Inadequate Equilibria as I remember it).
I don’t deny that learning from disagreement has costs. Probably some people do it too much. I encouraged MIRI to do it more on the margin, but it could be that my guess about their current margin is incorrect, who knows.
Thanks for the reply.
I want to clarify that this is not my proposal, and to the extent that it had been someone’s proposal, I would be approximately as wary about it as you are. I think self-deception is quite bad on average, and even on occasions when it’s good, that fact isn’t predictable in advance, making choosing to self-deceive pretty much always a negative expected-value action.
The reason I suspect you interpreted this as my proposal is that you’re speaking from a frame where “confidence in one’s model” basically doesn’t happen by default, so to get there people need to self-deceive, i.e. there’s no way for someone [in a sufficiently “hard” domain] to have a model and be confident in that model without doing [something like] artificially inflating their confidence higher than it actually is.
I think this is basically false. I claim that having (real, not artificial) confidence in a given model (even of something “hard”) is entirely possible, and moreover happens naturally, as part of the process of constructing a gears-level model to begin with. If your gears-level model actually captures some relevant fraction of the problem domain, I claim it will be obviously the case that it does so—and therefore a researcher holding that model would be very much justified in placing high confidence in [that part of] their model.
How much should such a researcher be swayed by the mere knowledge that other researchers disagree? I claim the ideal answer is “not at all”, for the same reason that argument screens off authority. And I agree that, from the perspective of somebody on the outside (who only has access to the information that two similarly-credentialed researchers disagree, without access to the gears in question), this can look basically like self-deception. But (I claim) from the inside the difference is very obvious, and not at all reminiscent of self-deception.
(Some fields do not admit good gears-level models at all, and therefore it’s basically impossible to achieve the epistemic state described above. For people in such fields, they might plausibly imagine that all fields have this property. But this isn’t the case—and in fact, I would argue that the division of the sciences into “harder” and “softer” is actually pointing at precisely this distinction: the “hardness” attributed to a field is in fact a measure of how possible it is to form a strong gears-level model.)
Does this mean “learning from disagreement” is useless? Not necessarily; gears-level models can also be wrong and/or incomplete, and one entirely plausible (and sometimes quite useful) mechanism by which to patch up incomplete models is to exchange gears with someone else, who may not be working with quite the same toolbox as you. But (I claim) for this process to actually help, it should done in a targeted way: ideally, you’re going into the conversation already with some idea of what you hope to get out of it, having picked your partner beforehand for their likeliness to have gears you personally are missing. If you’re merely “seeking out disagreement” for the purpose of fulfilling a quota, that (I claim) is unlikely to lead anywhere productive. (And I view your exhortations for MIRI to “seek out more disagreement on the margin” as proposing essentially just such a quota.)
(Standard disclaimer: I am not affiliated with MIRI, and my views do not necessarily reflect their views, etc.)
Thanks, this is encouraging.
I’ve found an unexpected benefit of trying to explain my thinking and overcome the inferential distance is that I think of arguments which change my mind. Just having another person to bounce ideas off of causes me to look at things differently, which sometimes produces new insights. See also the book passage I quoted here.
Note that I think the form of inferential distance is often about trying to communicate across different ontologies. Sometimes a person will even correctly get the arguments of their discussion partner to the point where they can internally inhabit that point of view, but it is still hard to get the argument to dialogue productively with your other views because the two viewpoints have such different ontologies.