Originality vs. Correctness

alkjash and habryka

6 Dec 2023 18:51 UTC

60 points

17 comments25 min readLW link

World Modeling Rationality

I talk with Alkjash about valuing original thinking vs getting things right. We discuss a few main threads:

What are the benefits of epistemic specialisation? What about generalism?
How much of the action in an actual human mind is in tweaking your distribution over hypotheses and how much over making sure you’re considering good hypotheses?
If your hope is to slot into an epistemic process that figures out what’s correct in part by you coming up with novel ideas, will processes that are out to get you make you waste your life?

Intellectual generals vs supersoldiers

alkjash

Over time I’ve noticed that I care less and less about epistemic rationality—i.e. being correct—and more and more about being original. Of course the final goal is to produce thoughts that are original AND correct, but I find the originality condition more stringent and worth optimizing for. This might be a feature of working in mathematics where verifying correctness is cheap and reliable.

habryka

Huh that feels like an interesting take. I don’t have a super strong take on originality vs. correctness, but I do think I live my life with more of a “if you don’t understand the big picture and your environment well, you’ll get got, and also the most important things are 10000x more important than the median important thing, so you really need to be able to notice those opportunities, which requires an accurate map of how things work in-aggregate”. Which like, isn’t in direct conflict with what you are saying, though maybe is.

habryka

I think I have two big sets of considerations that make me hesitant to optimize for originality over correctness (and also a bunch the other way around, but I’ll argue for one side here first):

The world itself is really heavy-tailed and having a good understanding of how most of the world works, while sacrificing deeper understanding of how a narrower slicer of the world works, seems worth it since behind every part of reality that you haven’t considered, a crucial consideration might lurk that completely shifts what you want to be doing with your life
1. The obvious example from an LW perspective is encountering the arguments for AI Risk vs. not and some related considerations around “living in the most important century”. But also broader things like encountering the tools of proof and empirical science and learning how to program.
The world is adversarial in the sense that if you are smart and competent, there are large numbers of people and institutions optimizing to get you to do things that are advantageous to them, ignoring your personal interests. Most smart people “get got” and end up orienting their lives around some random thing they don’t even care about that much, because they’ve gotten their OODA loop captured by some social environment that makes it hard for them to understand what is going on or learn much about what they actually want to do with their lives. I think navigating an adversarial environment like this requires situational awareness and broad maps of the world, and prioritizing originality over correctness IMO makes one substantially more susceptible to a large set of attacks.

alkjash

Some quick gut reactions that I’ll reflect/expand on:

I think the world is not as heavy-tailed for most human utility functions as you claim. Revealed preferences suggest that saving the world is probably within an OOM as good (to me and most other people) as living ten years longer, or something like this. Same with the difference between $1m and $1b.
One of the core heuristics I have is that your perspective (which is one that seems predominant on LW) is one of very low trust in “the intellectual community,” leading to every individual doing all the computations from the ground up for themselves. It feels to me that this is an extremely wasteful state compared to a small number of people looking at “high-level considerations” and most people training to be narrowly-focused supersoldiers. It feels to me like most LWers “want to be generals” and nobody “wants to be a good soldier,” and good intellectual work requires more good soldiers than good generals.

habryka

I think the world is not as heavy-tailed for most human utility functions as you claim. Revealed preferences suggest that saving the world is probably within an OOM as good (to me and most other people) as living ten years longer, or something like this. Same with the difference between $1m and $1b.

I mean, ultimately I do think people can care about whatever, but I think I currently don’t believe this, or I would only believe it about a very impoverished sense of self that’s very local and time-sensitive.

My model is that from the perspective of a future where scientific progress continues and we align AI or more broadly gain much greater control over the universe, we will deeply regret the tradeoffs we made in favor of short-term goals, and of course especially in as much as those might have jeopardized the future itself (but also otherwise).

And yeah, maybe it makes sense to model those future selves as different people who I don’t need to optimize for, but I think overall these kinds of decisions are usually clearer in hindsight, and having deeply regretted something one way or another is pretty decent evidence that it was indeed the wrong call from my current perspective.

I do agree that there are lots of fake unbounded goals that people adapt that do seem pathological to me. Like I do think I mostly buy it for the $1M and $1B thing. Money doesn’t really make things that much better after you reach a certain scale, for most people’s goals (my guess is about half of the people trying to get very rich are doing so for a reason that makes sense because they have non-impoverished goals that they would actually reflectively endorse, and about half do it because of something closer to a pathological interest in a kind of status game that I don’t think they would endorse)

alkjash

I think I basically agree with you on this line and don’t think it is my core objection.

alkjash

My heuristic is something like this: imagine two worlds, one where everyone spends a huge chunk of their time trying to find and lay claim to the next big heavy-tailed thing, and one where a small number of people do that and the rest all randomly assign themselves to some narrow craft to perfect. It seems to me that the second world results in the big heavy-tailed things being done better in the end, even if many individuals missed out by being captured in the wrong niche.

habryka

It feels to me like most LWers “want to be generals” and nobody “wants to be a good soldier,” and good intellectual work requires more good soldiers than good generals.

I do think this is totally true and I’ve experienced a lot of struggle related to it. Like, not just personally, but also, I think as people go I’ve been much more “heads-down” on building things than most other people in the LW community, and have had trouble finding other people to join me. Everyone wants to be a macrostrategy researcher.

But also, at the same time, one of the most common pieces of advice I give to young people in AI Alignment is to just try to write some blogposts that explain what you think the AI Alignment problem is, and where the hard parts are, and what you think might be ways to tackle it, in your own language and from your own perspective. And almost every time I’ve seen someone actually do that, I think they went on to do pretty great work. But almost no one does it, which is confusing.

I think maybe my explanation for this contradiction is more of a status game sense, where people want to do the thing that socially is most rewarded within this community oriented around existential risk, and that usually happens to be working at one of the prestigious research organization doing something that reliably and predictably makes you feel like you are making progress, or something (these days usually mechanistic interpretability), but I feel overall confused here.

habryka

Hmm, I find this contrast interesting, in that I feel like in some sense you are saying you want people to choose a narrow craft to perfect, and indeed that is what I am seeing with tons of people going into mechanistic interpretability or trying to make AI Alignment evaluations, but I feel like that is also really not optimizing for originality, Like, I think a lot of people are going there because other people are going there, and because they will be able to do work in a clearly established framework, and doing the big-picture things would give rise to much more original thought.

alkjash

Hmm...I have to say that AI Alignment overall confuses me and I remain unconvinced that there is a narrow craft at all to be perfected here. It is sort of like saying you will become the best calculator of zeros of the Riemann Zeta function. Alignment is already “only one problem” to me, and to break that down into tiny subfields feels kind of absurd.

habryka

Interesting. Probably a tangent, but alignment feels very diverse to me. More like building a train or a rocket than calculating the zeros of the Riemann Zeta function.

alkjash

Like, it seems to me that I learned more/studied more content in graduate school as a single person than there exists overall in Alignment currently. I don’t really see a point to specialize further when the field is this young.

habryka

Hmm, I feel like there is some map-territory confusion going on here. Like, yeah, I think the field is young, but that doesn’t make the territory less rich. I think you can spend a lifetime studying the degree to which biological evolution can be modeled as a coherent optimization process, and how one might control it, and to try to develop an ontology and understanding of how natural selection would give rise to various things like self-awareness and goal-directedness under different conditions. And that’s just a small part of the problem to study.

alkjash

I tend to think of research progress as breadth-first search—of course eventually the field will get deeper and there will be place for someone to specialize as you say. But in 1700 one did not specialize in analytic number theory, one just worked in number theory.

habryka

Ok, I mean, that seems reasonable to me, but then I guess I am saying “in the great big map of knowledge, we are still in the breadth-first search stage”. Like, why now go into analytic number theory when there is a whole field of ML to discover (I mean, the answer to that is because the ML might eat you, but ignore that part).

alkjash

Hmmm...I think there are more answers than that. Different people are optimized for different kinds of learning environments. I know lots of mathematicians who do amazing work who would flounder in more open-ended research environments. They are “very good soldiers.” Possibly one day when the alignment tide reaches them, some of them will instantly solve some hard AI-related problem that borders on their expertise.

habryka

Ok, but that feels more like a skill issue :P

Maybe it’s just deeply inherent to these people, but in as much as one might have a choice of being someone who could thrive in open-ended research environments, seems like you want to make that choice (to be clear, I am more following a line of argument here, I think my overall position is a bit different than this, but currently find myself curious where this line of argument leads)

alkjash

I think I’m saying something else, which is that you’re talking about the question of “what should one individual do to maximize their life” and I’m saying something like, everyone acting that way (at least naively) is a very bad place to coordinate because everyone wants to be Peter Singer and nobody wants to solve differential equations.

habryka

But, I mean, aren’t there really a lot of people solving differential equations and not many people being Peter Singer?

habryka

But also, I feel a bit like we’ve somehow inverted positions here. Weren’t you arguing for originality vs. correctness? I feel like Peter Singer is kind of an example of being really quite original, and maybe also kind of correct, but I mostly model him as being quite interestingly original.

Bayes optimality vs Discovering new hypothesis classes

alkjash

Haha, that’s kind of true. Let me maybe switch gears to a different heuristic about originality versus correctness which might be more fruitful. There’s a feeling I’ve had over time that a lot of the “epistemic rationality” stuff is the easy side of truth-seeking. It’s the part where someone somewhere already knows the truth and the goal is for the community to converge on that truth. The hard side of truth-seeking is getting the first person to come up with the right sentence—e = mc^2 - or something like this, which you can’t do with calibration games and Aumann etc. I guess in principle there’s a way to view coming up with creating new hypotheses as Solomonoff induction or w/e, but I think this is not a great way of viewing how that kind of truth-seeking works?

alkjash

Like, at least in math and hard sciences like physics and CS, it seems to me that the part about converging to the truth once someone knows it is really straightforward compared to the part involving coming up with the truth in the first place.

habryka

So, I am actually genuinely curious about this topic. I agree that there is kind of a straw version of mathematics where that is true, but like, I feel like usually before a mathematician is anywhere close to proving a novel conjecture, they have acquired high personal confidence that the conjecture is correct for non-rigorous proof reasons. And then like kind of the whole field of mathematics is just obsessively dedicated to achieving consensus and that’s what everyone works towards.

alkjash

I think this strongly disagrees with my personal experience, but perhaps because of the kinds of questions I tend to work on. There are several theorems I’ve proved where I was absolutely surprised on the direction of the answer right up to the moment of the proof coming together. It is very common in my community for mathematicians to spend serious work on both sides (true and false) of a conjecture because they have no idea what is true.

alkjash

I agree that what you said is also a common story, but not exclusively so.

habryka

I feel like when I’ve talked to people in theoretical CS this is a very common story. You really feel pretty confident you know that some hash function is hard to reverse, or the complexity class of some algorithm, and yeah, you might be wrong, but mostly you try to prove it, and that then sometimes is really hard.

habryka

I might have also been particularly exposed to these datapoints recently because Paul (Christiano)’s work is currently on trying to formalize the kind of arguments that tend to convince people that conjectures are true before they can prove them.

alkjash

I also want to say that to an extent, in the story you are describing I identify the statement of the conjecture as the interesting and important “hard part of truth-seeking.” I agree that there are many cases where when the correct conjecture is stated and the correct informal reasoning laid out for it, most of the work is done.

alkjash

Like, there’s sort of two mental pictures of truth-seeking that clash for me. One is [each person is holding their own little Metaculus portfolio and they go around acquiring information and updating all the little numbers by Bayes rule for each piece of info] and the other is [looking at the world and coming up with a mathematical structure that explains it] and … I’m not convinced that the first thing is really worth doing for most people except on occasion. And like, ok, the Solomonoff induction thing is still a thing and you could say that the latter thing is just doing the former thing for the set of all possible hypotheses on the world, but is that actually how minds work?

alkjash

the first thing is what I was rounding to “correctness” and the second to “originality”

habryka

Ok, I mean, I feel like when you’re trying to plan your next vacation, or choosing which job to take, or developing a product feature for your web-app, or choosing who to hire, or choosing your romantic partner, I really feel like you don’t want to only do the mathematical structure thing :P

alkjash

Ok sorry, you are completely right, when I say most people I meant something else altogether, which is “people trying to make hard intellectual progress.”

alkjash

So I think the thing I’m arguing for is that in a community of researchers, each individual researcher should be less concerned with “correctness” and outsource that to the community, and more concerned with simply coming up with original sentences worth checking.

habryka

Ok, I do think that is probably overfitting from specifically mathematics, where I feel like discovery and proof are really very close.

Like, even for a physicist, the lifetime of their hypothesis is like 20% coming up with the right one, and then like 80% trying to design and execute the experiment that proves it. And verification is costly enough and requires a lot of context so that probably you can’t get someone else to just verify it for you, you probably should go and do it yourself, or at the very least you need like 80% of your people working on verification (or like, 99% in the case of modern physics, where like one experiment requires constructing a $20B particle collider).

alkjash

Ok, that seems basically right, let me think about to what extent that changes my view on how people should approach research outside of math.

alkjash

I do think that I am overfitting a little on mathematics, but I also seem to experience that a lot of intellectual progress in other areas really hinges on some smart person writing down the right hypothesis to check.

habryka

So, I think I agree with this, but when I concretely visualize this, I do feel like “writing down the right hypothesis” is actually very hard in most fields that aren’t mathematics.

Like, let’s say you are a historian and you would like to understand “what caused the industrial revolution?”. Well, I think in order to explain almost any hypothesis that could come close to “the truth” here, you will probably need a bunch of knowledge in microeconomics and macroeconomics and probably also appeal to a bunch of common structures found in present-day companies, and a bunch of facts about human psychology, and probably also a bunch of facts about industrial manufacturing.

alkjash

Sorry, I am not arguing that “writing down the right hypothesis” is at all easy, it obviously does require lots of domain knowledge and study. I think I’m arguing that the practices promoted by early LW at least—the epistemic rationality stuff in the sequences e.g. - are only tangentially helpful for the core skills needed to do this hard work. And this kind of work is what I would call “the meat of truth-seeking.”

habryka

Ah, I see. And you are more like “if you want to understand the causes of the industrial revolution, then you better get ready to spend really a huge amount of time reading existing works and digesting data about stuff that is really primarily relevant to just the industrial revolution.

Like, yeah, the tools of LW rationality might help you a bit along the way, but most of the skills and costs here will be paid in the pursuit of just this question and likely won’t help you build a much better general map of the world”?

alkjash

Hmm...I think that’s part of it, it’s closer to “the tools of LW rationality are better at helping you converge towards the boundary of what is already known and worse at getting past that.”

habryka

So less like “you will need to read everything about the industrial revolution” and more like the following:

look, in order to get to a point where you have a shot at actually identifying the right hypothesis, you will need to be willing to go quite deep and far away from where other people are at.
Like, all throughout this journey, your subjective probability that your current best guess is the right one will probably be quite low, but what matters is actually fleshing out the hypothesis enough, and plucking anything out of the huge space of hypotheses that has any plausibility

alkjash

Hmm...I sense you’re getting closer but still kind of not saying the same thing. Let me give another analogy. Like, what I liked growing up about the Sequences was that it presented a pretty cut-and-dried algorithm for truth-seeking. You go around collecting information and there’s a formula you can apply to update your beliefs and this works great on things like “who will be elected President in 6 months.”

The thing I’ve found disappointing is that for the hard part, plucking anything out of the huge space of possible hypotheses, the Sequences don’t really have an algorithmic prescription of any substance comparable to the above. Like, read widely, think hard, and look for patterns?

habryka

I mean, science famously also doesn’t have any prescription here. The classical Popperian view of science is that you magically pluck a hypothesis out of hypothesis space, and then you test it. How you come up with your hypothesis is totally your call. I don’t know, have dreams in which god reveals the periodic table to you, I guess?

But I do think this is actually one domain where I have come around to seeing more value in academia than I used to. Like, there is the formal structure of academia, but there is also the apprentice-like relationship of PhD students to their advisors, and as far as I can tell, that relationship does genuinely impart a lot of skills that are relevant to hypothesis discovery in a way that abstract writing often has trouble with.

alkjash

Well, the thing I’d hope to be true is that, humans are just algorithms, some of these algorithms are good at consistently generating original true thoughts, we could write down a mathematical description of what they’re doing under the hood, and whatever we wrote down would be the core thing to call “truth-seeking.”

alkjash

re: academia, I think the advisor-student relationship is pretty much the core of the whole enterprise. Another great feature is that there is one person (your PhD advisor) who shares in your reputational gains even though the whole institution is responsible for your “upbringing.”

habryka

Yeah, from a “framework of thinking” point of view, I also think that the bayesian world-view has embeddedness as its obvious big hole, and I think that hole also correlates with not really admitting good explanations for hypothesis discovery. Like, a bayesian agent never discovers any hypotheses.

alkjash

What do you mean by embeddedness?

habryka

Embeddedness referring here to the thing covered in the Embedded Agency papers/comics. Stuff like the grain of truth problem (very relevant here), the inability to model any hypothesis that includes yourself (since that would require being able to predict your own actions), the problems of trying to separate reality into “environment” and “agents”, etc..

habryka

(Grain of truth problem being specifically the problem in bayesian inference that when people try to construct a formal prior, very frequently the correct hypothesis is not at all in their prior, because reality is so messy and complicated that you can’t really represent it in your prior)

alkjash

Ah yes, very relevant.

alkjash

Hmmm...I think I am treading old ground now, it feels like returning to the thoughts that generated Babble and Prune.

habryka

Oh, yeah, that makes sense. You have sure written a bunch about this. I hadn’t made the connection to the current situation before you said it.

Correctness as defence against the dark arts

habryka

I am not sure how interested you are in this, but I do feel like I have a lot of interest in the adversarial branch of my opening argument. Like, I do think a lot of my relationship to originality and correctness is the result of a fear that if I just “become a cog in a machine” that well, then I have to trust the machine, and I find it hard to trust a machine without having a broad understanding of all the things the machine does.

alkjash

Ok, I am pretty interested in that line of thought as well, let’s switch gears to there for a moment. I think there is a really difficult tension between “I’m a young person trying to find a narrow niche to specialize in” and “People are out to get me and the only way to prevent that is to become knowledgeable in everything.”

alkjash

Is this a reasonable way of conceptualizing it?

habryka

Yeah, approximately.

alkjash

So, I think I have a partial answer to begin with, which is that while the world itself is very complicated, there are not that many core skills that young people need to build. Like, there’s specialized domain knowledge that doesn’t transfer, but there’s a bunch of other things that transfer. So it seems honestly OK to me to just throw a dart at the board and say, I’m going to spend 5 years skill-building within this hierarchy of skill, keeping my eyes open in the meantime.

habryka

Well, I feel like the thing that I expect is that at the end of those 5 years I will have learned a bunch of the important skills, and also have somehow internalized some ideology about how my tribe is the best one, or how the only way to reason about the world is some narrow intellectual framework, or society needs to be structured a certain way, or why loyalty to the company is the greatest moral virtue, etc.

alkjash

So, I have a confession which is that basically I tried very hard to avoid this sort of capture by having one foot outside of academia blogging on LW and it helped me not internalize a bunch of the nonsense. I think my basic answer is that it is hard to be captured by two cults at the same time.

habryka

Yeah, I do think the advice of “join at least two kind of functional communities and sanity-check the prescriptions of one intellectual framework with the other one” is a really good one, though I personally really struggle with it.

alkjash

There is a way in which it makes one feel like an outsider in both worlds...

alkjash

The other mindset I have is something like, as long as I act in such a way that an alternative version of me who landed in the right community does great, that would be enough? Like, if I throw a dart at the dartboard and hit wrong, but I act in such a way that I would want everyone who happened to work on the right things to act, then I don’t have to worry too much about getting good at darts.

alkjash

I am not 100% sure where this falls in the spectrum between fun wishful thinking and good game theory.

alkjash

(the worry here is that getting good at darts is a really tough proposition and the cost of everyone getting good at darts is too much to pay)

habryka

I have really been giving this advice a lot recently, and maybe it just genuinely works, but man, I sure feel personally really incapable of implementing it.

Like, there is something about this kind of intellectual pluralism that prevents being principled or something. Like, maybe in the arena of intellectual ideas, and viewing things through the lens of originality, this seems great. But in the arena of justice and being able to notice evil and stand up to it, I somehow expect this to produce people without spines and unable to act with conviction on anything that isn’t obvious.

alkjash

It does remind me of the classic example that playing randomly is the only nash equilibrium in rock-paper-scissors. If indeed the environment is out to get you it seems like any deliberate process a naive young person can implement to make better decisions will just be countered by a more advanced adversarial exploit.

habryka

That is an interesting analogy.

alkjash

Can you say more about how this relates to justice and spinelessness? I don’t really follow the leap here.

habryka

Like, when I think about integrity, I think about the degree to which you have made your principles self-consistent. Like, you had some intuitions about how important it is to help people, and you realize that there are people far away who you can help a lot, so you think hard about that relationship, and maybe you update that you should really support a lot of people far away from you, because that’s what the two principles of “help people in need” and “people far away aren’t less important than people close by” imply.

And there is something about the mental motion of trying to have feet in multiple worldviews that is kind of intentionally trying to prevent this kind of consistency. Like, you have the virtues and values of the academic community, and that one really values rigor and being smart and citing people properly, and then you have the virtues and values of the LW community which really values looking at the big picture and doing fermi estimates, etc.

And those sets of virtues are just really far apart in conceptual space, and you probably can’t unify them, and they will in many ways “cover the same ground”. Like, academia has a principle of citing previous work and the LW community has a principle of “reasoning transparency/sharing your true rejection” and those are kind of related in what they are trying to achieve, but connecting them is a lot more work than is usually feasible for a person.

And I think overall this kind of implies that if I expect someone to be in a position where it would be good for them to do something hard, like speak openly about some corruption going on, which would be implied by making their principles consistent and acting with integrity, I expect a person who is grounded in multiple worldviews to only really be compelled to do that when all their worldviews demand that action as good, which I think is just quite rare.

But more importantly, I think the presence of contradictory principles in their everyday lives will have trained the instinct to act with integrity out of them (though this feels maybe like a thing a person could overcome by continuing to make each of their sets of moral axioms coherent, and being cognizant of the gap between their sets likely being hard to bridge)

alkjash

It feels like what you are pointing to is a real failure mode, but is also kind of typical-minding and surprisingly deontological. I suspect many people are capable of more or less separating their values from the values of their community, and I usually see value misalignments between myself and academia as an opportunity and not as a tension.

habryka

Yeah, I think you are right in pointing out the distinction between your values and the values of your community, which I think I kind of conflated above.

habryka

Ok, going back up a level, I was basically like “man, but I feel like situational awareness and having accurate models about what is going on is really important for not getting got by the cults of modern society”. And then you were like “here is this one weird trick that cults hate: just join two of them”, and then I was like “yeah, okay, that does actually seem like it would help a lot”.

And idk, maybe I am kind of convinced? Like, consistency checks are a really powerful tool and if I imagine a young person being like “I will just throw myself into intellectual exploration and go deep wherever I feel like without trying to orient too much to what is going on at large, but I will make sure to do this in two uncorrelated ways”, then I do notice I feel a lot less stressed about the outcome

alkjash

Hahahaha...hmmm...I think I want to say something like: you are not paranoid enough. If modern society is indeed out to get you then there is no escape that you can systematically recommend to young people, and so you might as well throw your hands up. Like, there are probably plenty of people like me that are natively contrarian enough to never be captured regardless, but a lot of young people (I see this more as I teach more undergrads) are just lambs to the slaughter.

habryka

I mean, to be clear, I do expect the default outcome of me trying to tell people this to be that they pretty quickly end up not being grounded in multiple communities. And I expect somehow getting to the point of having someone have the situational awareness to robustly understand why it is so important to be grounded in multiple communities to be the kind of thing that would require going quite broad and a lot of rationality skills and avoiding a ton of map-territory confusions and understanding various things about collective epistemology, etc.

alkjash

I want to make a point now about complexity creep because I love talking about complexity creep. It seems to me a kind of civilizational problem similar to cost disease. The original (I think) version is that games like Magic the Gathering must get progressively more complicated to keep the interests of invested players, making the gulf between new players and older players larger and larger until it is essentially impossible for new players to enter and the game dies out. It seems to me that the kinds of “situational awareness” demands there are now on young people to make good decisions are suffering the same kind of complexity creep.

habryka

Huh, say more? I think I need more spelling out of the analogy between the incentives of Wizards of the Coast (who make Magic the Gathering) and whatever creates the demands for situational awareness of young people.

alkjash

Hmmm...it doesn’t feel 100% the same but let me steelman it and see where it goes. The idea is that adults (non-new-players in this analogy) have been engaged in this game of signaling good causes and healthy communities against toxic causes and cults since the dawn of time and the game is running away in complexity.

alkjash

I think in the sense that WotC is responsible for complexity creep, the analogy falls apart. However, in the sense that the players themselves are responsible for complexity creep (e.g. the meta and required game knowledge growing ever more complicated and the margins ever narrower even as the literal rules of the game stay constant), then the analogy holds?

habryka

Ah, ok. Yeah, I think I see the analogy now. I think this seems approximately correct in that I sure feel like each year I discover another layer of memes and counter-memes embedded in the cultures I act in. Like, it’s a very common experience for me to be like “man, why does this crazy norm exist?” and then for the answer to be like “well, because at some point someone came up with this clever counterintuitive strategy and this was kind of the only way to block it”.

Related to recent events, I’ve been thinking a bunch about Casus Belli (OpenAI actually, not Israel stuff, which I haven’t thought much about). Like, for nation states in the modern world, there is a thing where you need a reason to go to war with another nation. And in some sense that reason is likely to actually be quite unrelated to the actual reason why you would like to go to war with them (which often is something like “I would like your oil reserves”). And so a naive version of me used to be tempted to be like “man, this casus belli thing feels pretty arbitrary, like the reason why countries say they are going to war clearly doesn’t explain most of the actual variance in why they want to go to war. This meme makes it much harder to talk about why the war is actually happening, because everyone keeps insisting they are invading a foreign country because it has terrorists or whatever, and not because of its oil reserves”.

Making this more concrete to a situation where I think it’s less than clear cut, I was thinking of the recent OpenAI situation and was thinking about it through a similar lens of “did the OpenAI board have a casus belli for firing Sam?”.

And the important thing in both of these cases is that like, it turns out that in adversarial situations where people try to attack each other, we developed messy and confusing norms about when an act of aggression is justified, and the shape of those lines is often quite complex and messy and historically contingent. But the lines really matter in that not having a casus belli against a country you would really like to invade for your own self-interest really does actually stop you from invading that country, and creating a casus belli causes wars.

alkjash

So it seems like you’re suggesting that there are lots and lots of different norms of “civilization and good behavior” that society can equilibriate towards, and it’s frustrating to trace out the history of the one set of norms we happened to land in, especially when it’s clearly arbitrary in some way.

habryka

But also, in that the casus belli thing in some sense is developed not only as a cooperative tool to avoid collateral damage, but also as a tool to control the populations of the relevant nations, or something. Like, you don’t say both that you would really like to invade the other country for their oil, and also now you finally have a casus belli.

You insist to everyone that the casus belli is the real reason. And the set of acceptable casus belli are selected for the kinds of things that you can use to raise an army.

alkjash

It feels to me that we’re getting sidetracked into a direction I am interested in—but checking if you’re feeling the same way?

habryka

I also find myself interested, but let me see whether I can tie this back into the adversarialness conversation, and depending on how that goes we might want to continue.

The complexity creep point seems apt to me in that I do think that the strategic landscape of institutions and cultures and groups trying to recruit competent people to their side are in a kind of arms race that makes it quite hard for a new player (i.e. a teenager or someone in their early 20s) to orient towards what’s going on and to not “get got”.

The casus belli point is one illustration of a quite complicated multi-layer game that is the kind of thing that a young person needs to learn how to navigate in this game, and if they mess up, they will face pretty bad personal consequences (maybe by being recruited to be a soldier in a war against a nation on false pretenses, or by escalating a war too early because they thought the casus belli thing was fake and clearly didn’t correspond to reality).

Tying this back to the correctness vs. originality point, this is all important because one of the reasons for correctness over originality in my worldview is that exactly this kind of complicated maneuvering and counter-maneuvering requires the ability to have a really robust grounding in your own sanity and ability to notice things being off and a lot of situational awareness and having a map that is large enough to capture the relevant considerations.

habryka

(Summarizing it this way, I kind of like the framing of “correctness vs. originality” as maybe something like “how big is your map” vs. “how detailed is your map”, though not sure how much of the variance it captures.)

alkjash

I think I agree with your assessment of the territory but funnily enough arrive at the opposite conclusion: that situational awareness and having a good map is less and less worth it.

habryka

Oh, interesting. Hmm. Like, the story here is something like:

look man, this game is fucked. Have you ever seen a modern Yu-Gi-Oh card? That thing is so off the rails there is no hope in learning it, or the investment is not worth it.
Try to play a different game. Especially don’t try to join this game as a live player of trying to build the kind of thing that would try to recruit young players to their side. Instead, try to find the dimensions and aspects of the world that are orthogonal to this kind of conflict. Yes, ideologies are fighting and recruiting and subverting each other all the time, but a thing like mathematics, or the hard sciences doesn’t really care. The wheel of mathematics ticks forwards, and it does not really care about the random motions and squabbles that happen all around it.

alkjash

Hmm...I’m not saying anything so detailed as that. Simply that the costs of playing the game are going up and up while the payouts are staying relatively constant, so naturally the incentive to play the game goes down. Perhaps it’s still important enough to participate anyway.

alkjash

Like, one frame I have for this kind of conversation is that bad things happen when people drift off into living in social reality exclusively.

alkjash

I have a model of mathematics that there is object reality you live in, there is the mathematical reality that you work with in your head, and then there is the social reality that the mathematical community plays in. A big failure mode is that mathematical reality and social reality tap into the same brain parts too much and people tend to become unmoored as a result, living too much in social reality.

habryka

Yeah, I totally agree with that. Ah, and one of the points of originality in this context is something like “Ground yourself in things that you yourself have discovered. Yes, the knowledge in other people’s brains is nicely digested and easily accessible, but it’s also messy and kind of itself adversarial. When you look at reality where no one else has looked, or you don’t care who else has looked, that is the place where you can also get unmediated training data for the laws of reasoning and build out the foundations of your map”

alkjash

Yes, I think that’s right. Receiving information from others has a bunch of hidden elements in social reality that you have to be careful about in a way that tinkering with programming languages yourself doesn’t.

habryka

Well, until you make Chat-GPT with that programming language :P

habryka

But yeah, that’s interesting. Maybe tying it back earlier into the “join two cults” point. One of the best ways to test and see flaws and cross-validate your implicit assumptions given to you by your social environment is to go and apply them to a place that no one has looked before.

alkjash

That’s right, yea. And one of the big benefits is that many of the norms and assumptions in one culture will seem like strokes of genius imported to the other.

habryka

Reminds me of the original royal society motto which I keep feeling kinship towards like once a week: “Nullius in verba” (“on no one’s words”)

alkjash

Explain?

alkjash

Like, don’t take anyone’s word for things, verify yourself?

habryka

I always interpreted it as a rejection of social reality, yeah.

habryka

Maybe it’s a non-crazy time to wrap things up? I feel pretty good about what we wrote.

habryka

Thank you! I really enjoyed it, especially as I’ve been having some rough days and this has felt like a fun exploration of ideas that made me more optimistic about things.

alkjash

I’m glad! It’s certainly a good break for me from taking care of the baby. :)

habryka

Nice

alkjash and habryka

6 Dec 2023 18:51 UTC

60 points

17 comments25 min readLW link

World Modeling Rationality

Ben Pace 6 Dec 2023 20:09 UTC
22 points
11
The other mindset I have is something like, as long as I act in such a way that an alternative version of me who landed in the right community does great, that would be enough? Like, if I throw a dart at the dartboard and hit wrong, but I act in such a way that I would want everyone who happened to work on the right things to act, then I don’t have to worry too much about getting good at darts.
There’s a sense I’m getting that Alkjash wants to give up on having a great map of the communities that are around, and just get on with nailing the work of the one that he’s in, so that if he’s ended up in one of the good ones, then the payoff is large, and so that he isn’t wasting most of his time trying to evaluate which one to be in (which is an adversarial game in which he basically expects to either lose or waste most of his resources on).
A key point to me that seemed not to be mentioned is that this takes the distribution of communities as static. Perhaps my life so far has seen the rise and fall of more communities than others get to experience, but I think one of the effects of modeling the communities and figuring out your own principles is not just that you figure out which is a good one to be part of right now, but that you set the standards and the incentives for what new ones can come into existence. If most people will basically go along with the status quo, then the competition for new games to play is weak.
I’ll try to make up an example that points at the effect. Suppose you’re a good academic who has harshly fought back against the bureaucratic attempts to make you into a manager. You’ve picked particular departments and universities and sub-fields where you can win this fight and actually do research. Nonetheless this has come with major costs for your career. Then suppose someone starts a new university and wants to attract you (a smart academic who is underpriced by the current system because of your standards) to join. Compared to the version of you that just did what the system wanted, where they just needed to offer you slightly more pay, they actually have a reason to build the sort of place that attracts people with higher standards. They can say “Look, the pay is lower, and we’re just getting started with the department, but I will ensure that the majority of professors have complete control over the number of PhDs they take (including 0).” You doing the work of (a) noticing the rot in your current institution and (b) clearly signaling that you will not accept the rot, provides a clear communication to whoever builds the next institution that this is worth avoiding and furthermore they can attract good people by doing so.
This is a general heuristic that I’ve picked up that it’s good to figure out what principles you care about and act according to them so people know what standards you will hold them to in future situations that you weren’t thinking about and couldn’t have predicted in advance.
I see this as an argument for the “broad map” over the “detailed map” side of the debate.
- alkjash 6 Dec 2023 21:50 UTC
  11 points
  3
  Parent
  I don’t have a complete reply to this yet, but wanted to clarify if it was not clear that the position in this dialogue was written with the audience (a particularly circumspect broad-map-building audience) in mind. I certainly think that the vast majority of young people outside this community would benefit from spending more time building broad maps of reality before committing to career/identity/community choices. So I certainly don’t prescribe giving up entirely.
  
  ETA: Maybe a useful analogy is that for Amazon shopping I have found doing serious research into products (past looking at purchase volume and average ratings) largely unhelpful. Usually if I read reviews carefully, I end up more confused than anything else as a large list of tail risks and second-order considerations are brought to my attention. Career choice I suspect is similar with much higher stakes.
Steven Byrnes 6 Dec 2023 20:00 UTC
18 points
17
“the tools of LW rationality are better at helping you converge towards the boundary of what is already known and worse at getting past that.”
For what it’s worth, in my neuroscience work, I think of myself as doing a lot of new-hypothesis-generation (or new-theoretical-framework-generation) and not so much evaluation-of-existing-hypotheses. If you believe me in that characterization, then my perspective is: I agree that The Sequences’ emphasis on Bayes rule is not too helpful for that activity, but there is lots of other stuff in The Sequences, and/or in later rationality stuff like CFAR-handbook and Scout Mindset, and/or in LW cultural norms, that constitute useful tools for the activity of new-hypothesis-generation. Examples include the “I notice that I’m confused” thing, ass numbers and probabilities, ITTs, changing-one’s-mind being generally praiseworthy, “my model says…”, and probably lots more that I’ve stopped noticing because they’re so in-the-water.
I don’t think any of those things are a royal road to great hypothesis-generation, by any means, but I do think they’re helpful on the margin.
(Compare that to my own physics PhD where I learned no useful skills or tools whatsoever for constructing new good hypotheses / theories. …Although to be fair, maybe I just had the wrong PIs for that, or never bothered to ask them for advice, or something.)
- SarahNibs 7 Dec 2023 3:18 UTC
  8 points
  4
  Parent
  I feel like alkjash’s characterization of “correctness” is just not at all what the material I read was pointing towards.
  The Sequences’ emphasis on Bayes rule
  Maybe I’m misremembering. But for me, the core Thing this part of the Sequences imparted was “intelligence, beliefs, information, etc—it’s not arbitrary. It’s lawful. It has structure. Here, take a look. Get a feel for what it means for those sorts of things to ‘have structure, be lawful’. Bake it into your patterns of thought, that feeling.”
  If a bunch of people are instead taking away as the core Thing “you can do explicit calculations to update your beliefs” I would feel pretty sad about that, I think?
  - Nicholas Kross 8 Dec 2023 18:39 UTC
    2 points
    0
    Parent
    Agreed. I think of it as:
    You need your mind to have at least barely enough correctness-structure/Lawfulness to make your ideas semi-correct, or at least easy to correct them later.
    Then you want to increase originality within that space.
    And if you need more original ideas, you go outside that space (e.g. by assuming your premises are false, or by taking drugs; yes, these are the same class of thing), and then clawing those ideas back into the Lawfulness zone.
    Reading things like this, and seeing how long it took them to remember “Babble vs Prune”, makes me wonder if people just forgot the existence of the “create, then edit” pattern. So people end up rounding off to “You don’t need to edit or learn more, because all of my creative ideas are also semi-correct in the first place”. Or “You can’t create good-in-hindsight ideas without editing tools X Y Z in your toolbelt”.
    The answer is probably closer to one of these than the other, and yadda yadda social engineering something something community beliefs, but man do people talk like they believe these trivially-false extreme cases.
Ben Pace 7 Dec 2023 7:01 UTC
8 points
2
Habryka: …if I expect someone to be in a position where it would be good for them to do something hard… I expect a person who is grounded in multiple worldviews to only really be compelled to do that when all their worldviews demand that action as good, which I think is just quite rare.
When I think of myself doing something like the “be in two cults” sort of move, I’m doing it differently:
- If something counts as a major win in one cult and is permissible (or slightly costly) in the other cult, this is a strong clue that I should do it.
- If something counts as a major failing in one cult and permissible (or slightly incentivized) in the other cult, this is a strong clue that I should avoid it.
Basically I think of both perspectives as generators for finding wins and avoiding failures.
What links here?
- Ben Pace's comment on Benito’s Shortform Feed by Ben Pace (7 Dec 2023 7:08 UTC; 2 points)
Nicholas Kross 8 Dec 2023 18:43 UTC
7 points
5
Both people here are making conflation/failure-to-decouple mistakes. E.g. tying “community coordination” together with “how to generate and/or filter one’s own ideas”.

Tabooing most of the topic-words/phrases under discussion, I reckon, would have made this dialogue 3-10x better.

(Will have more thoughts, possibly a response post, once I’m done reading/thinking-through this.)
Elizabeth 8 Dec 2023 6:06 UTC
6 points
3
imagine two worlds, one where everyone spends a huge chunk of their time trying to find and lay claim to the next big heavy-tailed thing, and one where a small number of people do that and the rest all randomly assign themselves to some narrow craft to perfect. It seems to me that the second world results in the big heavy-tailed things being done better in the end, even if many individuals missed out by being captured in the wrong niche.
I agree with this, and it seems to be one of my biggest procedural disagreements with EA and rationality. I’d love to hear some really gears-based arguments against it.
- Said Achmiz 11 Dec 2023 20:27 UTC
  3 points
  0
  Parent
  I can definitely see how a disagreement with EA can be based on this idea, but how does the rationality version work? Can you say more about that?
- zrezzed 8 Dec 2023 19:59 UTC
  1 point
  0
  Parent
  Which do you agree would be better? I’m assuming the latter, but correct me if I’m wrong.
  I haven’t thought this through, but a potential argument against: 1) agreement / alignment on what the heavy-tail problems are and their relative weight is a necessary condition for the latter to be a better strategy 2) neither this community, let alone the broader society have that thus 3) we should still focus on correctness overall.
  That does reflect my own thinking about these things.
Said Achmiz 7 Dec 2023 21:57 UTC
6 points
−9
This dialogue is very long and rambling (and is an example of why I am skeptical about the dialogues feature in general), so this comment only addresses a couple of bits of it. That said:

[alkjash] Of course the final goal is to produce thoughts that are original AND correct, but I find the originality condition more stringent and worth optimizing for. This might be a feature of working in mathematics where verifying correctness is cheap and reliable.

(Emphasis mine.)

Yes, I think that’s basically it. There doesn’t seem to be much more to say on the subject, actually; yes, if you work in math, where verifying correctness is cheap and reliably, then pursuing originality is obviously the way to go. Almost no other domain works like that, so pursuing originality over correctness is not the way to go in most other domains. End of discussion…?

One thing I’ll add from my own experience as a designer is that in design (UX design / web design in my case, though I strongly suspect—and what I’ve heard and read from others supports this—that it applies generally), if you try to do things correctly, and if you’re serious about this, you will find yourself doing original things, merely as an almost inevitable by-product of doing things correctly. This is partly because your assessment of what is correct may differ from others’, but mostly it’s because very few people are actually trying to do things correctly.

[habryka] … I think as people go I’ve been much more “heads-down” on building things than most other people in the LW community, and have had trouble finding other people to join me.

[habryka] But also, at the same time, one of the most common pieces of advice I give to young people in AI Alignment is to just try to write some blogposts that explain what you think the AI Alignment problem is, and where the hard parts are, and what you think might be ways to tackle it, in your own language and from your own perspective. And almost every time I’ve seen someone actually do that, I think they went on to do pretty great work. But almost no one does it, which is confusing.

This is consistent with my experience. As I wrote recently on a different forum, the number of people who are willing to actually do a technically simple thing, that they wish to see done, is very small. People mostly don’t do things, in general. If you assume that people just won’t take the initiative to do any given thing, you’ll usually predict outcomes correctly.

(This seems to be especially true among “rationalists”, unfortunately.)
What links here?
- Said Achmiz's comment on Banning Said Achmiz (and broader thoughts on moderation) by habryka (30 Aug 2025 23:44 UTC; -2 points)
- gwern 11 Dec 2023 22:26 UTC
  20 points
  13
  Parent
  This reminds me of the Replication Crisis. Psychologists placed a high value on novelty and surprising results, the sort of narratives you can build a TED Talk on, and thought that verifying correctness in psychology was cheap and reliable. You didn’t have to eat your veggies, you could just skip straight to the dessert. Turns out, that is not the case: the effects are small, analyst flexibility far higher than known, required sample sizes at least 1 order of magnitude (and often 2) larger than used, and verification is expensive and extremely difficult. The areas of psychology that most prize novelty and cool stories, like social psychology, have turned out to be appallingly riddled with entire fake fields of study and now have a bad tummyache from trying to digest decades of dessert; meanwhile, the most boring, tedious, mathematical areas, like psychophysics, don’t even know what ‘the Replication Crisis’ is.
zrezzed 8 Dec 2023 4:35 UTC
4 points
0
And idk, maybe I am kind of convinced? Like, consistency checks are a really powerful tool and if I imagine a young person being like “I will just throw myself into intellectual exploration and go deep wherever I feel like without trying to orient too much to what is going on at large, but I will make sure to do this in two uncorrelated ways”, then I do notice I feel a lot less stressed about the outcome
I worry this gives up too much. Being embedded in multiple communities/cultures with differing or even conflicting values and world views is exceedingly common. Noticing that, and explicitly playing with the idea of “holding multiple truths” in your head is less common, but still something perhaps even most people would recognize.
But most people would not recognize the importance of this dialogue. Navigating the tension between academics and politics does not seem sufficient to correctly orient humanity. The conflicting values of business and family do not seem to produce anything that resembles truth seeking.
I feel pretty strongly that letting go of correctness in favor of any heuristic means you will end up with the wrong map, not just a smaller or fuzzier one. I don’t think that’s advice that should be universally given, and I’m not even sure how useful it is at all.
- MiguelDev 9 Dec 2023 8:15 UTC
  1 point
  0
  Parent
  I feel pretty strongly that letting go of correctness in favor of any heuristic means you will end up with the wrong map, not just a smaller or fuzzier one. I don’t think that’s advice that should be universally given, and I’m not even sure how useful it is at all.
  I think correctness applies—until it reaches a hard limit. Understanding what an intellectual community like LessWrong was able to generate as clusters of valuable knowledge is the most correct thing to do but in order to generate novel solutions, one must accept with bravery^[1] that the ideas in this forum might have some holes that original ideas will emerge.
  1. ^
    I assume that many people will be scared of challenging what is considered to be normal / generally accepted principles in LessWrong, but this I think is necessary in tackling grand challenges like solving the alignment problem.
riceissa 7 Dec 2023 6:45 UTC
2 points
0
I feel like doing correctness right requires originality after a certain point, so the two don’t feel too distinct to me. Early in one’s intellectual development it might make sense to just “shop around” for different worldviews by reading widely, but after a while you are going to be routinely bumping into things that aren’t on the collective map.
The casus belli example Habryka gives in the “Correctness as defence against the dark arts” strikes me as an example of … how originality helps defend against the dark arts! (I am guessing here that Habryka did not just read some old rationalist blog post called “Casus belli, how people use it to manipulate each other, and how to avoid getting got”, but that he formed this connection himself.) More generally but also personally, I feel like several times in my life (including now) I have been in bad situations where the world just does not seem to have a solution to my problem, where no amount of google-fu, reading books, seeking societally-established forms of help (therapists, doctors, etc.) has helped. The only way out seems to be to do original thinking.
I also want to highlight that the mental motions of Correctness reasoning seems to be susceptible to the dark arts (to be clear, I think Habryka himself is smart enough to avoid this kind of thing, but I want to highlight this for others). I feel this most whenever I go on Twitter. Like, I go on there thinking “ok, for whatever reason many people (even Wei Dai now, apparently) are discoursing on here now, so I better read it to not fall behind, I better enlarge my hypothesis space by taking in new ideas and increase the range of thoughts I can think!” (this is the kind of thing I mean by “mental motions of Correctness reasoning”—I am mainly motivated by making my map bigger, not making it more detailed). But then after a while I feel like the social environment is tugging me to feel a certain way, value certain things, believe certain things (or else I’m a bad person) (maybe I only had this line of thought because Qiaochu tugged me in a certain direction!). I started out wanting to just explore and try to make my map Correct, but turns out the territory contained adversarial computations… This sort of thing, it seems to me, is even worse for the non-LessWrong population. Again it seems to me most healthy to mostly just be thinking for myself and then periodically check in on Twitter discourse to see what’s up (this is aspirational).
ZY 26 Oct 2024 18:44 UTC
1 point
0
Could you define what you mean by “correctness” in this context? I think there might be some nuances into this, in terms of what “correct” means, and under what context
xpym 8 Dec 2023 9:22 UTC
1 point
0
David Chapman has been banging on for years now against “Bayesianism”/early LW-style rationality being particularly useful for novel scientific advances, and, separately, against utilitarianism being a satisfactory all-purpose system of ethics. He proposes another “royal road”, something something Kegan stage 5 (and maybe also Buddhism for some reason), but, frustratingly, his writings so far are rich on expositions and problem statements but consist of many IOUs on detailed solution approaches. I think that he makes a compelling case that these are open problems, insufficiently acknowledged and grappled with even by non-mainstream communities like the LW-sphere, but is probably overconfident about postmodernism/himself having much useful to offer in the way of answers.