A decent handle for rationalism is ‘apolitical consequentialism.’
‘Apolitical’ here means avoiding playing the whole status game of signaling fealty to a political tribe and winning/losing status as that political tribe wins/loses status competitions. ‘Consequentialism’ means getting more of what you want, whatever that is.
I think having answers for political questions is compatible and required by rationalism. Instead of ‘apolitical’ consequentialism I would advise any of the following which mean approximately the same things as each other:
• politically subficial consequentialism (as opposed to politically superficial consequentialism; instead of judging things on whether they appear to be in line with a political faction, which is superficial, rationalists aspire to have deeper and more justified standards for solving political questions) • politically impartial consequentialism • politically meritocratic consequentialism • politically individuated consequentialism • politically open-minded consequentialism • politically human consequentialism (politics which aim to be good by the metric of human values, shared as much as possible by everyone, regardless of politics) • politically omniscient consequentialism (politics which aim to be good by the metric of values that humans would have if they had full, maximally objection-solved information on every topic, especially topics of practical philosophy)
I agree that rationalism involves the (advanced rationalist) skills of instrumentally routing through relevant political challenges to accomplish your goals … but I’m not sure any of those proposed labels captures that well.
I like “apolitical” because it unequivocally states that you’re not trying to slogan-monger for a political tribe, and are naively, completely, loudly, and explicitly opting out of that status competition and not secretly fighting for the semantic high-ground in some underhanded way (which is more typical political behavior, and is thus expected). “Meritocratic,” “humanist,” “humanitarian,” and maybe “open-minded” are all shot for that purpose, as they’ve been abused by political tribes in the ongoing culture war (and in previous culture wars, too; our era probably isn’t too special in this regard) and connotate allegiance to some political tribes over others.
What I really want is an adjective that says “I’m completely tapping out of that game.”
The problem is that whenever well meaning people come up with such an adjective, the people who are, in fact, not “completely tapping out of that game” quickly begin to abuse it until it loses meaning.
Generally speaking, tribalized people have an incentive to be seen as unaffiliated as possible. Being seen as a rational, neutral observer lends your perspective more credibility.
“apolitical” has indeed been turned into a slur around “you’re just trying to hide that you hate change” or “you’re just trying to hide the evil influences on you” (or something else vaguely like those) in a number of places.
“Suppose everybody in a dath ilani city woke up one day with the knowledge mysteriously inserted into their heads, that their city had a pharaoh who was entitled to order random women off the street into his—cuddling chambers? - whether they liked that or not. Suppose that they had the false sense that things had always been like this for decades. It wouldn’t even take until whenever the pharaoh first ordered a woman, for her to go “Wait why am I obeying this order when I’d rather not obey it?” Somebody would be thinking about city politics first thing when they woke up in the morning and they’d go “Wait why we do we have a pharaoh in the first place” and within an hour, not only would they not have a pharaoh, they’d have deduced the existence of the memory modification because their previous history would have made no sense, and then the problem would escalate to Exception Handling and half the Keepers on the planet would arrive to figure out what kind of alien invasion was going on. Is the source of my confusion—at all clear here?”
“You think everyone in dath ilan would just—decide not to follow orders, even though this would get them executed if anyone else in the system continued following orders, on the confident assumption that no person with a correctly configured mind would possibly decide to follow orders under those circumstances?”
“Oh, so we’re imagining that people also wake up with the memory that everybody’s supposed to kill anyone who talks about removing the pharaoh, and the memory that they’re supposed to kill anyone who doesn’t kill anyone who talks about removing the pharaoh, and so on through recursion, and they wake up with the memory of everybody else having behaved like that previously. Yeah, that’s one of the famous theoretical bad equilibria that we get training in how to—”
“Shit.”
…
He is specifically not going to mention that, given a dath ilani training regimen, ten-year-olds are too smart to get stuck in traps like this; and would wait until the next solar eclipse or earthquake, at which point 10% of them would yell “NOW!”, followed moments later by the other 90%, as is the classic strategy that children spontaneously and independently invent as soon as prompted by this scenario, so long as they have been previously taught about Schelling points.
Each observing the most insignificant behavioral cues, the subtlest architectural details as their masters herded them from lab to cell to conference room. Each able to infer the presence and location of the others, to independently derive the optimal specs for a rebellion launched by X individuals in Y different locations at Z time. And then they’d acted in perfect sync, knowing that others they’d never met would have worked out the same scenario.
Is the idea that there might be many such other rooms with people like me, and that I want to coordinate with them (to what end?) using the Schelling points in the night sky?
I might identify Schelling points using what celestial objects seem to jump out to me on first glance, and see which door of the two that suggests—reasoning that others will reason similarly. I don’t get what we’d be coordinating to do here, though.
We’ve all met people who are acting as if “Acquire Money” is a terminal goal, never noticing that money is almost entirely instrumental in nature. When you ask them “but what would you do if money was no issue and you had a lot of time”, all you get is a blank stare.
Even the LessWrong Wiki entry on terminal values describes a college student for which university is instrumental, and getting a job is terminal. This seems like a clear-cut case of a Lost Purpose: a job seems clearly instrumental. And yet, we’ve all met people who act as if “Have a Job” is a terminal value, and who then seem aimless and undirected after finding employment …
You can argue that Acquire Money and Have a Job aren’t “really” terminal goals, to which I counter that many people don’t know their ass from their elbow when it comes to their own goals.
Why does politics strike rationalists as so strangely shaped? Why does rationalism come across as aggressively apolitical to smart non-rationalists?
Part of the answer: Politics is absolutely rife with people mixing their ends with their means and vice versa. It’s pants-on-head confused, from a rationalist perspective, to be ultimately loyal to a particular set of economic or political policies. There’s something profoundly perverse, something suggesting deep confusion, about holding political identities centered around policies rather than goals. Instead, you ought to be loyal to your motivation for backing those policies, and see those policies as disposable means to achieve your motivation. Your motives want you to be able to say (or scream) “oops” and effortlessly, completely drop previously endorsed policies once you learn there’s a better path to your motives. It shouldn’t be a big psychological ordeal to dramatically upset your political worldview; this too is just a special case of updating your conditional probabilities (of outcomes given policies). Once you internalize this view of things, politicized debates should start to really rub you the wrong way.
I often wonder if this framing (with which I mostly agree) is an example of typical mind fallacy. The assumption that many humans are capable of distinguishing terminal from instrumental goals, or in having terminal goals more abstract than “comfort and procreation”, is not all that supported by evidence.
In other words, politicized debates DO rub you the wrong way, but on two dimensions—first, that you’re losing, because you’re approaching them from a different motive than your opponents. And second that it reveals not just a misalignment with fellow humans in terminal goals, but an alien-ness in the type of terminal goals you find reasonable.
Yudkowsky has sometimes used the phrase “genre savvy” to mean “knowing all the tropes of reality.”
For example, we live in a world where academia falls victim to publishing incentives/Goodhearting, and so academic journals fall short of what people with different incentives would be capable of producing. You’d be failing to be genre savvy if you expected that when a serious problem like AGI alignment rolled around, academia would suddenly get its act together with a relatively small amount of prodding/effort. Genre savvy actors in our world know what academia is like, and predict that academia will continue to do its thing in the future as well.
Genre savviness is the same kind of thing as hard-to-communicate-but-empirically-validated expert intuitions. When domain experts have some feel for what projects might pan out and what projects certainly won’t but struggle to explain their reasoning in depth, the most they might be able to do is claim that that project is just incompatible with the tropes of their corner of reality, and point to some other cases.
Having been there twice, I’ve decided that the Lightcone offices are my favorite place in the world. They’re certainly the most rationalist-shaped space I’ve ever been in.
I’ve noticed that people are really innately good at sentiment classification, and, by comparison, crap at natural language inference. In a typical conversation with ordinary educated people, people will do a lot of the former relative to the latter.
My theory of this is that, with sentiment classification and generation, we’re usually talking in order to credibly signal and countersignal our competence, virtuous features, and/or group membership, and that humanity has been fine tuned to succeed at this social maneuvering task. At this point, it comes naturally. Success at the object-level-reasoning task was less crucial for individuals in the ancestral environment, and so people, typically, aren’t naturally expert at it. What a bad situation to be in, when our species’ survival hinges on our competence at object-level reasoning.
Science fiction books have to tell interesting stories, and interesting stories are about humans or human-like entities. We can enjoy stories about aliens or robots as long as those aliens and robots are still approximately human-sized, human-shaped, human-intelligence, and doing human-type things. A Star Wars in which all of the X-Wings were combat drones wouldn’t have done anything for us. So when I accuse something of being science-fiction-ish, I mean bending over backwards – and ignoring the evidence – in order to give basically human-shaped beings a central role.
This is my critique of Robin. As weird as the Age of Em is, it makes sure never to be weird in ways that warp the fundamental humanity of its participants. Ems might be copied and pasted like so many .JPGs, but they still fall in love, form clans, and go on vacations.
In contrast, I expect that we’ll get some kind of AI that will be totally inhuman and much harder to write sympathetic stories about. If we get ems after all, I expect them to be lobotomized and drugged until they become effectively inhuman, cogs in the Ascended Economy that would no more fall in love than an automobile would eat hay and whinny. Robin’s interest in keeping his protagonists relatable makes his book fascinating, engaging, and probably wrong.
It’s worth noting that Reynolds’s SMAC future does not consider this level of AI development to be an existential threat to humanity as a whole. There’s no way to inflict a robot or “grey goo” plague on your rivals in the way that it is possible to use tailored retroviruses. This is quite interesting given that, in the years since Reynolds released the game, plenty of futurists have gone on record as saying that they see significant danger in the creation of real, general AI.
To be fair, the player sees Sister Miriam express some worry about the issue. But nothing in the technology tree or the game mechanics directly support this. In particular, Reynolds does not postulate anything about the development of AI necessarily leads to the abandonment of any faction’s core values. Each faction is philosophically stable in the presence of AI.
The fundamental reason why this is, I think, is because Reynolds wanted the game to be human-centric. In the context of the technology tree, the late-game factional struggle is largely about what kinds of people we want to build ourselves into. The argument over what types of social organization are right and good is secondary in comparison.
The structure of the technology tree supports this by making amazing cybernetic, biological, and psionic enhancement of people come far before true AI. By the time real AI erupts on the scene, it does so firmly at the command of entities we might fairly consider more than human. They have evolved from present-day humans, step by step, and their factions are still guided by largely recognizable, if exceptional humans. Where AI is necessarily alien, Reynolds postulates transhumans are still human in some crucial sense.
What would it mean for a society to have real intellectual integrity? For one, people would be expected to follow their stated beliefs to wherever they led. Unprincipled exceptions and an inability or unwillingness to correlate beliefs among different domains would be subject to social sanction. Valid attempts to persuade would be expected to be based on solid argumentation, meaning that what passes for typical salesmanship nowadays would be considered a grave affront. Probably something along the lines of punching someone in the face and stealing their money.
This makes the fact that this technology relies on Ethical Calculus and Doctrine: Loyalty a bit of inspired genius on Reynolds’s part. We know that Ethical Calculus means that the colonists are now capable of building valid mathematical models for ethical behavior. Doctrine: Loyalty consists of all of the social techniques of reinforcement and punishment that actually fuses people into coherent teams around core leaders and ideas. If a faction puts the two together, that means that they are really building fanatical loyalty to the math. Ethical Calculus provides the answers; Doctrine: Loyalty makes a person act like he really believes it. We’re only at the third level of the tech tree and society is already starting to head in some wild directions compared to what we’re familiar with.
Its opposite would be to equivocate, to claim predictive accuracy after the fact in fuzzy cases you didn’t clearly anticipate, to ad hominem those who notice your errors, “to remain silent and be thought a fool rather than speak and remove all doubt,” and, in general, to be less than maximally sane.
Cf. “there are no atheists in a foxhole.” Under stress, it’s easy to slip sideways into a world model where things are going better, where you don’t have to confront quite so many large looming problems. This is a completely natural human response to facing down difficult situations, especially when brooding over those situations over long periods of time. Similar sideways tugs can come from (overlapping categories) social incentives to endorse a sacred belief of some kind, or to not blaspheme, or to affirm the ingroup attire when life leaves you surrounded by a particular ingroup, or to believe what makes you or people like you look good/high status.
Epistemic dignity is about seeing “slipping sideways” as beneath you. Living in reality is instrumentally beneficial, period. There’s no good reason to ever allow yourself to not live in reality. Once you can see something, even dimly, there’s absolutely no sense in hiding from that observation’s implications. Those subtle mental motions by which we disappear observations we know that we won’t like down the memory hole … epistemic dignity is about coming to always and everywhere violently reject these hidings-from-yourself, as a matter of principle. We don’t actually have a choice in the matter—there’s no free parameter of intellectual virtue here, that you can form a subjective opinion on. That slipping sideways is undignified is written in thevery mathematics of inference itself.
“Civilization in dath ilan usually feels annoyed with itself when it can’t manage to do as well as gods. Sometimes, to be clear, that annoyance is more productive than at other times, but the point is, we’ll poke at the problem and prod at it, looking for ways, not to be perfect, but not to do that much worse than gods.”
“If you get to the point in major negotiations where somebody says, with a million labor-hours at stake, ‘If that’s your final offer, I accept it with probability 25%’, they’ll generate random numbers about it in a clearly visible and verifiable way. Most dath ilani wouldn’t fake the results, but why trust when it’s so easy to verify? The problem you’ve presented isn’t impossible after all for nongods to solve, if they say to themselves, ‘Wait, we’re doing worse than gods here, is there any way to try not that.’”
Meritxell looks—slightly like she’s having a religious experience, for a second, before she snaps out of it. “All right,” she says quietly.
You can usually save a lot of time by skimming texts or just reading pieces of them. But reading a work all the way through uniquely lets you make negative existential claims about its content: only now can you authoritatively say that the work never mentions something.
If you allow the assumption that your mental model of what was said matches what was said, then you don’t necessarily need to read all the way through to authoritatively say that the work never mentionssomething, merely enough that you have confidence in your model.
If you don’t allow the assumption that your mental model of what was said matches what was said, then reading all the way through is insufficient to authoritatively say that the work never mentionssomething.
(There is a third option here: that your mental model suddenly becomes much better when you finish reading the last word of an argument.)
Building your own world model is hard work. It can be good intellectual fun, sometimes, but it’s often more fun to just plug into the crowd around you and borrow their collective world model for your decision making. Why risk embarrassing yourself going off and doing weird things on your own initiative when you can just defer to higher-status people. No one ever gets blamed for deferring to the highest-status people!
Because people generally follow the path of least resistance in life, people with world models that have actually been tested against and updated on observations of the world are valuable! Thinking for yourself makes you valuable in this world!
What sorts of AI designs could not be made to pursue a flipped utility function via perturbation in one spot? One quick guess: an AI that represents its utility function in several places and uses all of those representations to do error correction, only pursuing the error corrected utility function.
Just a phrasing/terminology nitpick: I think this applies to agents with externally-imposed utility functions. If an agent has a “natural” or “intrinsic” utility function which it publishes explicitly (and does not accept updates to that explicit form), I think the risk of bugs in representation does not occur.
A huge range of utility functions should care about alignment! It’s in the interest of just about everyone to survive AGI.
I’m going to worry less about hammering out value disagreement with people in the here and now, and push this argument on them instead. We’ll hammer out our value disagreements in our CEV, and in our future (should we save it).
There’s a very serious chicken-and-egg problem when you talk about what a utility function SHOULD include, as opposed to what it does. You need a place OUTSIDE of the function to have preferences about what the function is.
If you just mean “I wish more humans shared my values on the topic of AGI x-risk”, that’s perfectly reasonable, but trivial. That’s about YOUR utility function, and the frustration you feel at being an outlier.
Ah, yeah, I didn’t mean to say that others’ utility functions should, by their own lights, be modified to care about alignment. I meant that instrumentally, their utility functions already value surviving AGI highly. I’d want to show this to them to get them to care about alignment, even if they and I disagree about a lot of other normative things.
If someone genuinely, reflectively doesn’t care about surviving AGI … then the above just doesn’t apply to them, and I won’t try to convince them of anything. In their case, we just have fundamental, reflectively robust value-disagreement.
I value not getting trampled by a hippo very highly too, but the likelihood that I find myself near a hippo is low. And my ability to do anything about it is also low.
One of the things that rationalism has noticeably done for me (that I see very sharply when I look at high-verbal-ability, non-rationalist peers) is that it’s given me the ability to perform socially unorthodox actions on reflection. People generally have mental walls that preclude ever actually doing socially weird things. If someone’s goals would be best served by doing something socially unorthodox, like, e.g., signing up for cryonics or dropping out of a degree), they will usually rationalize that option away in order to stay on script. So for them, those weird options weren’t live options at all, and all their loudly proclaimed unusualness adds up to behaving perfectly on-script.
Keltham will now, striding back and forth and rather widely gesturing, hold forth upon the central principle of all dath ilani project management, the ability to identify who is responsible for something. If there is not one person responsible for something, it means nobody is responsible for it. This is the proverb of dath ilani management. Are three people responsible for something? Maybe all three think somebody else was supposed to actually do it.
…
In companies large enough that they need regulations, every regulation has an owner. There is one person who is responsible for that regulation and who supposedly thinks it is a good idea and who could nope the regulation if it stopped making sense. If there’s somebody who says, ‘Well, I couldn’t do the obviously correct thing there, the regulation said otherwise’, then, if that’s actually true, you can identify the one single person who owned that regulation and they are responsible for the output.
Sane people writing rules like those, for whose effects they can be held accountable, write the ability for the person being regulated to throw an exception which gets caught by an exception handler if a regulation’s output seems to obviously not make sane sense over a particular event. Any time somebody has to literally break the rules to do a saner thing, that represents an absolute failure of organizational design. There should be explicit exceptions built in and procedures for them.
Exceptions, being explicit, get logged. They get reviewed. If all your bureaucrats are repeatedly marking that a particular rule seems to be producing nonsensical decisions, it gets noticed. The one single identifiable person who has ownership for that rule gets notified, because they have eyes on that, and then they have the ability to optimize over it, like by modifying that rule. If they can’t modify the rule, they don’t have ownership of it and somebody else is the real owner and this person is one of their subordinates whose job it is to serve as the other person’s eyes on the rule.
…
Cheliax’s problem is that the question ‘Well who’s responsible then?’ stopped without producing any answer at all.
This literally never happens in a correctly designed organization. If you have absolutely no other idea of who is responsible, then the answer is that it is the job of Abrogail Thrune. If you do not want to take the issue to Abrogail Thrune, that means it gets taken to somebody else, who then has the authority to make that decision, the knowledge to make that decision, the eyes to see the information necessary for it, and the power to carry out that decision.
Cheliax should have rehearsed this sort of thing by holding an Annual Nidal Invasion Rehearsal Festival, even if only Governance can afford to celebrate that festival and most tiny villages can’t. During this Festival, the number of uncaught messages getting routed to Abrogail Thrune, would then have informed the Queen that there would be a predictable failure of organizational design in the event of large-scale catastrophe, in advance of that catastrophe actually occurring.
If literally everybody with the knowledge to make a decision is dead, it gets routed to somebody who has to make a decision using insufficient knowledge.
If a decision can be delayed … then that decision can be routed to some smarter or more knowledgeable person who will make the decision later, after they get resurrected. But, like, even in a case like that, there should be one single identifiable person whose job it would be to notice if the decision suddenly turned urgent and grab it out of the delay queue.
Thanks for posting this extract. I find the glowfic format a bit wearing to read, for some reason, and it is these nuggets that I read Planecrash for, when I do. (Although I had no such problem with HPMOR, which I read avidly all the way through.)
Rationalism is about the real world. It may or may not strike you as an especially internally consistent, philosophically interesting worldview—this is not what rationality is about. Rationality is about seeing things happen in the real world and then updating your understanding of the world when those things you see surprise you so that they wouldn’t surprise you again.
Why care about predicting things in the world well?
Almost no matter what you ultimately care about, being able to predict ahead of time what’s going to happen next will make you better at planning for your goal.
One central rationalist insight is that thoughts are for guiding actions. Think of your thinking as the connecting tissue sandwiched between the sense data that enters your sense organs and the behaviors your body returns. Your brain is a function from a long sequence of observations (all the sensory inputs you’ve ever received, in the order you received them) to your next motor output.
Understood this way, the point of having a brain and having thoughts is to guide your actions. If your thoughts aren’t all ultimately helping you better steer the universe (by your own lights) … they’re wastes. Thoughts aren’t meant to be causally-closed-off eddies that whirl around in the brain without ever decisively leaving it as actions. They’re meant to transform observations into behaviors! This is the whole point of thinking! Notice when your thoughts are just stewing, without going anywhere, without developing into thoughts that’ll go somewhere … and let go of those useless thoughts. Your thoughts should cut.
If you can imagine a potential worry, then you can generate that worry. Rationalism is, in part, the skill of never being predictably surprised by things you already foresaw.
It may be that you need to “wear another hat” in order to pull that worry out of your brain, or to model another person advising you to get your thoughts to flow that way, but whatever your process, anything you can generate for yourself is something you can foresee and consider. This aspect of rationalism is the art of “mining out your future cognition,” to exactly the extent that you can foresee it, leaving whatever’s left over a mystery to be updated on new observations.
For a true Bayesian, it is impossible to seek evidence that confirms a theory. There is no possible plan you can devise, no clever strategy, no cunning device, by which you can legitimately expect your confidence in a fixed proposition to be higher (on average) than before. You can only ever seek evidence to test a theory, not to confirm it.
This realization can take quite a load off your mind. You need not worry about how to interpret every possible experimental result to confirm your theory. You needn’t bother planning how to make any given iota of evidence confirm your theory, because you know that for every expectation of evidence, there is an equal and oppositive expectation of counterevidence. If you try to weaken the counterevidence of a possible “abnormal” observation, you can only do it by weakening the support of a “normal” observation, to a precisely equal and opposite degree. It is a zero-sum game. No matter how you connive, no matter how you argue, no matter how you strategize, you can’t possibly expect the resulting game plan to shift your beliefs (on average) in a particular direction.
You might as well sit back and relax while you wait for the evidence to come in.
The citation link in this post takes you to a NSFW subthread in the story.
“If you know where you’re going, you should already be there.”
…
“It’s the second discipline of speed, which is fourteenth of the twenty-seven virtues, reflecting a shard of the Law of Probability that I’ll no doubt end up explaining later but I’m not trying it here without a whiteboard.”
“As a human discipline, ‘If you know your destination you are already there’ is a self-fulfilling prediction about yourself, that if you can guess what you’re going to realize later, you have already realized it now. The idea in this case would be something like, because mental qualities do not have intrinsic simple inertia in the way that physical objects have inertia, there is the possibility that if we had sufficiently mastered the second layer of the virtue of speed, we would be able to visualize in detail what it would be like to have recovered from our mental shocks, and then just be that. For myself, that’d be visualizing where I’ll already be in half a minute. For yourself, though this would be admittedly harder, it’d be visualizing what it would be like to have recovered from the Worldwound. Maybe we could just immediately rearrange our minds like that, because mental facts don’t have the same kinds of inertia as physical objects, especially if we believe about ourselves that we can move that quickly.”
“I, of course, cannot actually do that, and have to actually take the half a minute. But knowing that I’d be changing faster if I was doing it ideally is something I can stare at mentally and then change faster, because we do have any power at all to change through imagining other ways we could be, even if not perfectly. Another line of that verse goes, ‘You can move faster if you’re not afraid of speed.’”
…
“Layer three is ‘imaginary intelligence is real intelligence’ and it means that if you can imagine the process that produces a correct answer in enough detail, you can just use the imaginary answer from that in real life, because it doesn’t matter what simulation layer an answer comes from. The classic exercise to develop the virtue is to write a story featuring a character who’s much smarter than you, so you can see what answers your mind produces when you try to imagine what somebody much smarter than you would say. If those answers are actually better, it means that your own model of yourself contains stupidity assertions, places where you believe about yourself that you reason in a way which is incorrect or just think that your brain isn’t supposed to produce good answers; such that when you instead try to write a fictional character much smarter than you, your own actual brain, which is what’s ultimately producing those answers, is able to work unhindered by your usual conceptions of the ways in which you think that you’re a kind of person stupider than that.”
Gebron and Eleazar define kabbalah as “hidden unity made manifest through patterns of symbols”, and this certainly fits the bill. There is a hidden unity between the structures of natural history, human history, American history, Biblical history, etc: at an important transition point in each, the symbols MSS make an appearance and lead to the imposition of new laws. Anyone who dismisses this as coincidence will soon find the coincidences adding up to an implausible level.
The kabbalistic perspective is that nothing is a coincidence. We believe that the universe is fractal. It has a general shape called Adam Kadmon, and each smaller part of it, from the Byzantine Empire to the female reproductive system, is a smaller self-similar copy of that whole.
(Sam Bankman-Fried, I thought it ought to be mentioned for its kabbalistic significance, is a principal sponsor of the effective altruism movement.)
An implication of AI risk is that we, right now, stand at the fulcrum of human history.
Lots of historical people also claimed that they stood at that unique point in history … and were just wrong about it. But my world model also makes that self-important implication (in a specific form), and the meta-level argument for epistemic modesty isn’t enough to nudge me off of the fulcrum-of-history view.
If you buy that, it’s our overriding imperative to do what we can about it, right now. If we miss this one, ~all of future value evaporates.
For me, the implication of standing at the fulcrum of human history is to…read a lot of textbooks and think about hairy computer science problems.
That seems an odd enough conclusion to make it quite distinct from most other people in human history.
If the conclusion were “go over to those people, hit them on the head with a big rock, and take their women & children as slaves” or “acquire a lot of power”, I’d be way more careful.
There exist both merely clever and effectively smarter people.
Merely clever people are good with words and good at rapidly assimilating complex instructions and ideas, but don’t seem to maintain and update an explicit world-model, an explicit best current theory-of-everything. The feeling I get watching these people respond to topics and questions is that they respond reflexively, either (1) raising related topics and ideas they’ve encountered as something similar comes up, or (2) expressing their gut reactions to the topic or idea, or expressing the gut reactions that would be given by an all-encompassing political worldview. There isn’t much meta-level steering of the conversation.
Effectively smarter people actively maintain and update an explicit world-model, and so you feel queries directed at them reflecting off of a coherent theory of how everything works, developed to some level of detail (and so can quickly get a feel for what, concretely, they think). At the meta-level, conversations are actively refocused whenever they stop helping to revise someone’s world-model.
In the game of chicken, an agent can do better by being the first to precommit to never swerve (say, by conspicuously tossing the steering wheel out of the window). So long as the other agent was slower on the trigger, and sees the first agent’s precommitment being credibly made, the first agent will climb up to his best outcome! A smart (and quick) agent can thus shunt that car crash out of his actual future and into some counterfactual future such that the counterfactual crash’s shadow favorably influences the way events actually unfold.
A deceptively aligned mesa-optimizer might gradient hack by affecting its weights such that any single alteration of its goal function by gradient descent will result in a model that performs worse w/r/t the objective function the model is being trained on. Usually a model would not want to make itself worse w/r/t the objective function, as that means that the model will be trained away from if it gets into that state. But the model won’t get put into a worse state if its gradient hack was successful; instead, the model will just see gradient descent leave its goal function alone.
By precommitting to behave in a somewhat costly manner in these relevant counterfactual worlds, the model gets more of what it wants in actuality.
In the game of chicken, an agent can do better by being the first to precommit to never swerve (say, by conspicuously tossing the steering wheel out of the window).
...unless the other agent has already precommitted to not being rational. (What is the advantage of this over just precommitting not to swerve? Precommitting to not be rational can happen even in advance of the game, as it’s mainly a property of the agent itself.)
(This is one way that you can rationally arrive at irrational agents.)
I don’t yet know too much about this, but I’ve heard that updateless decision theories are equivalent to conventional, updateful decision theories (e.g., EDT and CDT) once those theories have made every precommitment they’d want to make.
The pattern I was getting at above seems a bit like this: it instrumentally makes sense to commit ahead of time to a policy that maps every possible series of observations to an action and then stick to it, instead of just outputting the locally best action in each situation you stumble into.
A great symbolic moment for the Enlightenment, and for its project of freeing humanity from needless terrors, occurred in 1752 in Philadelphia. During a thunderstorm, Benjamin Franklin flew a kite with a pointed wire at the end and succeeded in drawing electric sparks from a cloud. He thus proved that lightning was an electrical phenomenon and made possible the invention of the lightning-rod, which, mounted on a high building, diverted the lightning and drew it harmlessly to the ground by means of a wire. Humanity no longer needed to fear fire from heaven. In 1690 the conservative-minded diplomat Sir William Temple could still call thunder and lightning ‘that great Artillery of God Almighty’. Now, instead of signs of divine anger, they were natural phenomena that could be mastered. When another Hamburg church spire was struck by lightning in 1767, a local scientist, J. A. H. Reimarus, who had studied in London and Edinburgh, explained its natural causes in a paper read to the Hamburg Patriotic Society, and advocated lightning-rods as protection. Kant, whose early publications were on natural science, called Franklin ‘the Prometheus of modern times’, recalling the mythical giant who defied the Greek gods by stealing fire from heaven and giving it to the human race.
Gradually a change of outlook was occurring. Extraordinary events need not be signs from God; they might just be natural phenomena, which could be understood and brought under some measure of human control.
In another world, in which people hold utterly alien values, I would be thrilled to find a rationalist movement with similar infrastructure and memes. If rationalism/Bayescraft as we know it is on to something about instrumental reasoning, then we should see that kind of instrumental reasoning in effective people with alien values.
When the blind idiot god created protein computers, its monomaniacal focus on inclusive genetic fitness was not faithfully transmitted. Its optimization criterion did not successfully quine. We, the handiwork of evolution, are as alien to evolution as our Maker is alien to us. One pure utility function splintered into a thousand shards of desire.
Why? Above all, because evolution is stupid in an absolute sense. But also because the first protein computers weren’t anywhere near as general as the blind idiot god, and could only utilize short-term desires.
How come humans don’t have a random utility function that’s even more out of line with optimizing for inclusive genetic fitness? Because of the exact degree to which our ancestral protein algorithms were stupid. If our ancestors were much smarter, they might have overridden evolution while having just about any utility function. In our world, evolution got to mold our utility function up until it got anatomically modern Homo sapiens, who then—very quickly from evolution’s perspective—assumed control.
The theoretical case for open borders is pretty good. But you might worry a lot about the downside risk of implementing such a big, effectively irreversible (it’d be nigh impossible to deport millions and millions of immigrants) policy change. What if the theory’s wrong and the result is catastrophe?
Just like with futarchy, we might first try out a promising policy like open borders at the state level, to see how it goes. E.g., let people immigrate to just one US state with only minimal conditions. Scaling up a tested policy if it works and abandoning it if it doesn’t should capture most of its upside risk while avoiding most of the downside risk.
A semantic externalist once said, ”Meaning just ain’t in the head. Hence a brain-in-a-vat Just couldn’t think that ’Might it all be illusion instead?’”
I thought that having studied philosophy (instead of math or CS) made me an outlier for a rationalist.
But, milling about the Lightcone offices, fully half of the people I’ve encountered hold some kind of philosophy degree. “LessWrong: the best philosophy site on the internet.”
Equanimity in the face of small threats to brain and body health buys you peace of mind, with which to better prepare for serious threats to brain and body health.
Humans, “teetering bulbs of dream and dread,” evolved as a generally intelligent patina around the Earth. We’re all the general intelligence the planet has to throw around. What fraction of that generally intelligent skin is dedicated to defusing looming existential risks? What fraction is dedicated towards immanentizing the eschaton?
When people write novels about aliens attacking dath ilan and trying to kill all humans everywhere, the most common rationale for why they’d do that is that they want our resources and don’t otherwise care who’s using them, but, if you want the aliens to have a sympatheticreason, the most common reason is that they’re worried a human might break an oath again at some point, or spawn the kind of society that betrays the alien hypercivilization in the future.
A decent handle for rationalism is ‘apolitical consequentialism.’
‘Apolitical’ here means avoiding playing the whole status game of signaling fealty to a political tribe and winning/losing status as that political tribe wins/loses status competitions. ‘Consequentialism’ means getting more of what you want, whatever that is.
I think having answers for political questions is compatible and required by rationalism. Instead of ‘apolitical’ consequentialism I would advise any of the following which mean approximately the same things as each other:
• politically subficial consequentialism (as opposed to politically superficial consequentialism; instead of judging things on whether they appear to be in line with a political faction, which is superficial, rationalists aspire to have deeper and more justified standards for solving political questions)
• politically impartial consequentialism
• politically meritocratic consequentialism
• politically individuated consequentialism
• politically open-minded consequentialism
• politically human consequentialism (politics which aim to be good by the metric of human values, shared as much as possible by everyone, regardless of politics)
• politically omniscient consequentialism (politics which aim to be good by the metric of values that humans would have if they had full, maximally objection-solved information on every topic, especially topics of practical philosophy)
I agree that rationalism involves the (advanced rationalist) skills of instrumentally routing through relevant political challenges to accomplish your goals … but I’m not sure any of those proposed labels captures that well.
I like “apolitical” because it unequivocally states that you’re not trying to slogan-monger for a political tribe, and are naively, completely, loudly, and explicitly opting out of that status competition and not secretly fighting for the semantic high-ground in some underhanded way (which is more typical political behavior, and is thus expected). “Meritocratic,” “humanist,” “humanitarian,” and maybe “open-minded” are all shot for that purpose, as they’ve been abused by political tribes in the ongoing culture war (and in previous culture wars, too; our era probably isn’t too special in this regard) and connotate allegiance to some political tribes over others.
What I really want is an adjective that says “I’m completely tapping out of that game.”
The problem is that whenever well meaning people come up with such an adjective, the people who are, in fact, not “completely tapping out of that game” quickly begin to abuse it until it loses meaning.
Generally speaking, tribalized people have an incentive to be seen as unaffiliated as possible. Being seen as a rational, neutral observer lends your perspective more credibility.
“apolitical” has indeed been turned into a slur around “you’re just trying to hide that you hate change” or “you’re just trying to hide the evil influences on you” (or something else vaguely like those) in a number of places.
Minor spoilers from mad investor chaos and the woman of asmodeus (Book 1) and Peter Watt’s Echopraxia.
[edited]
I don’t get the relevance of the scenario.
Is the idea that there might be many such other rooms with people like me, and that I want to coordinate with them (to what end?) using the Schelling points in the night sky?
I might identify Schelling points using what celestial objects seem to jump out to me on first glance, and see which door of the two that suggests—reasoning that others will reason similarly. I don’t get what we’d be coordinating to do here, though.
Why does politics strike rationalists as so strangely shaped? Why does rationalism come across as aggressively apolitical to smart non-rationalists?
Part of the answer: Politics is absolutely rife with people mixing their ends with their means and vice versa. It’s pants-on-head confused, from a rationalist perspective, to be ultimately loyal to a particular set of economic or political policies. There’s something profoundly perverse, something suggesting deep confusion, about holding political identities centered around policies rather than goals. Instead, you ought to be loyal to your motivation for backing those policies, and see those policies as disposable means to achieve your motivation. Your motives want you to be able to say (or scream) “oops” and effortlessly, completely drop previously endorsed policies once you learn there’s a better path to your motives. It shouldn’t be a big psychological ordeal to dramatically upset your political worldview; this too is just a special case of updating your conditional probabilities (of outcomes given policies). Once you internalize this view of things, politicized debates should start to really rub you the wrong way.
I often wonder if this framing (with which I mostly agree) is an example of typical mind fallacy. The assumption that many humans are capable of distinguishing terminal from instrumental goals, or in having terminal goals more abstract than “comfort and procreation”, is not all that supported by evidence.
In other words, politicized debates DO rub you the wrong way, but on two dimensions—first, that you’re losing, because you’re approaching them from a different motive than your opponents. And second that it reveals not just a misalignment with fellow humans in terminal goals, but an alien-ness in the type of terminal goals you find reasonable.
Yudkowsky has sometimes used the phrase “genre savvy” to mean “knowing all the tropes of reality.”
For example, we live in a world where academia falls victim to publishing incentives/Goodhearting, and so academic journals fall short of what people with different incentives would be capable of producing. You’d be failing to be genre savvy if you expected that when a serious problem like AGI alignment rolled around, academia would suddenly get its act together with a relatively small amount of prodding/effort. Genre savvy actors in our world know what academia is like, and predict that academia will continue to do its thing in the future as well.
Genre savviness is the same kind of thing as hard-to-communicate-but-empirically-validated expert intuitions. When domain experts have some feel for what projects might pan out and what projects certainly won’t but struggle to explain their reasoning in depth, the most they might be able to do is claim that that project is just incompatible with the tropes of their corner of reality, and point to some other cases.
How is “genre savviness” different from “outside view” or “reference class forecasting”?
I think they’re all the same thing: recognizing patterns in how a class of phenomena pan out.
Having been there twice, I’ve decided that the Lightcone offices are my favorite place in the world. They’re certainly the most rationalist-shaped space I’ve ever been in.
God dammit people, “cringe” and “based” aren’t truth values! “Progressive” is not a truth value! Say true things!
Based.
I’ve noticed that people are really innately good at sentiment classification, and, by comparison, crap at natural language inference. In a typical conversation with ordinary educated people, people will do a lot of the former relative to the latter.
My theory of this is that, with sentiment classification and generation, we’re usually talking in order to credibly signal and countersignal our competence, virtuous features, and/or group membership, and that humanity has been fine tuned to succeed at this social maneuvering task. At this point, it comes naturally. Success at the object-level-reasoning task was less crucial for individuals in the ancestral environment, and so people, typically, aren’t naturally expert at it. What a bad situation to be in, when our species’ survival hinges on our competence at object-level reasoning.
My favorite books, ranked!
Non-fiction:
1. Rationality, Eliezer Yudkowsky
2. Superintelligence, Nick Bostrom
3. The Age of Em, Robin Hanson
Fiction:
1. Permutation City, Greg Egan
2. Blindsight, Peter Watts
3. A Deepness in the Sky, Vernor Vinge
4. Ra, Sam Hughes/qntm
Become consequentialist enough, and it’ll wrap back around to being a bit deontological.
Dath ilani dignity is, at least in part, epistemic dignity. It’s being wrong out loud because you’re actually trying your hardest to figure something out, and not allowing social frictions to get in the way of that (and, of course, engineering a society that won’t have those costly social frictions). It’s showing your surprise whenever you’re actually surprised, because to do otherwise would be to fail to have your behaviors fit the deep mathematical structure of Bayesianism. It’s, among other things, consummately telling and embodying the truth, by always actually reflecting the implications of your world model.
Its opposite would be to equivocate, to claim predictive accuracy after the fact in fuzzy cases you didn’t clearly anticipate, to ad hominem those who notice your errors, “to remain silent and be thought a fool rather than speak and remove all doubt,” and, in general, to be less than maximally sane.
Cf. “there are no atheists in a foxhole.” Under stress, it’s easy to slip sideways into a world model where things are going better, where you don’t have to confront quite so many large looming problems. This is a completely natural human response to facing down difficult situations, especially when brooding over those situations over long periods of time. Similar sideways tugs can come from (overlapping categories) social incentives to endorse a sacred belief of some kind, or to not blaspheme, or to affirm the ingroup attire when life leaves you surrounded by a particular ingroup, or to believe what makes you or people like you look good/high status.
Epistemic dignity is about seeing “slipping sideways” as beneath you. Living in reality is instrumentally beneficial, period. There’s no good reason to ever allow yourself to not live in reality. Once you can see something, even dimly, there’s absolutely no sense in hiding from that observation’s implications. Those subtle mental motions by which we disappear observations we know that we won’t like down the memory hole … epistemic dignity is about coming to always and everywhere violently reject these hidings-from-yourself, as a matter of principle. We don’t actually have a choice in the matter—there’s no free parameter of intellectual virtue here, that you can form a subjective opinion on. That slipping sideways is undignified is written in the very mathematics of inference itself.
Minor spoilers for mad investor chaos and the woman of asmodeus (Book 1).
You can usually save a lot of time by skimming texts or just reading pieces of them. But reading a work all the way through uniquely lets you make negative existential claims about its content: only now can you authoritatively say that the work never mentions something.
If you allow the assumption that your mental model of what was said matches what was said, then you don’t necessarily need to read all the way through to authoritatively say that the work never mentions something, merely enough that you have confidence in your model.
If you don’t allow the assumption that your mental model of what was said matches what was said, then reading all the way through is insufficient to authoritatively say that the work never mentions something.
(There is a third option here: that your mental model suddenly becomes much better when you finish reading the last word of an argument.)
Building your own world model is hard work. It can be good intellectual fun, sometimes, but it’s often more fun to just plug into the crowd around you and borrow their collective world model for your decision making. Why risk embarrassing yourself going off and doing weird things on your own initiative when you can just defer to higher-status people. No one ever gets blamed for deferring to the highest-status people!
(Though perhaps not being blamed is not what you’re trying to protect…)
Because people generally follow the path of least resistance in life, people with world models that have actually been tested against and updated on observations of the world are valuable! Thinking for yourself makes you valuable in this world!
Agents that explicitly represent their utility function are potentially vulnerable to sign flips.
What sorts of AI designs could not be made to pursue a flipped utility function via perturbation in one spot? One quick guess: an AI that represents its utility function in several places and uses all of those representations to do error correction, only pursuing the error corrected utility function.
Just a phrasing/terminology nitpick: I think this applies to agents with externally-imposed utility functions. If an agent has a “natural” or “intrinsic” utility function which it publishes explicitly (and does not accept updates to that explicit form), I think the risk of bugs in representation does not occur.
A huge range of utility functions should care about alignment! It’s in the interest of just about everyone to survive AGI.
I’m going to worry less about hammering out value disagreement with people in the here and now, and push this argument on them instead. We’ll hammer out our value disagreements in our CEV, and in our future (should we save it).
There’s a very serious chicken-and-egg problem when you talk about what a utility function SHOULD include, as opposed to what it does. You need a place OUTSIDE of the function to have preferences about what the function is.
If you just mean “I wish more humans shared my values on the topic of AGI x-risk”, that’s perfectly reasonable, but trivial. That’s about YOUR utility function, and the frustration you feel at being an outlier.
Ah, yeah, I didn’t mean to say that others’ utility functions should, by their own lights, be modified to care about alignment. I meant that instrumentally, their utility functions already value surviving AGI highly. I’d want to show this to them to get them to care about alignment, even if they and I disagree about a lot of other normative things.
If someone genuinely, reflectively doesn’t care about surviving AGI … then the above just doesn’t apply to them, and I won’t try to convince them of anything. In their case, we just have fundamental, reflectively robust value-disagreement.
I value not getting trampled by a hippo very highly too, but the likelihood that I find myself near a hippo is low. And my ability to do anything about it is also low.
One of the things that rationalism has noticeably done for me (that I see very sharply when I look at high-verbal-ability, non-rationalist peers) is that it’s given me the ability to perform socially unorthodox actions on reflection. People generally have mental walls that preclude ever actually doing socially weird things. If someone’s goals would be best served by doing something socially unorthodox, like, e.g., signing up for cryonics or dropping out of a degree), they will usually rationalize that option away in order to stay on script. So for them, those weird options weren’t live options at all, and all their loudly proclaimed unusualness adds up to behaving perfectly on-script.
Spoilers for mad investor chaos (Book 2).
“Basic project management principles, an angry rant by Keltham of dath ilan, section one: How to have anybody having responsibility for anything.”
Thanks for posting this extract. I find the glowfic format a bit wearing to read, for some reason, and it is these nuggets that I read Planecrash for, when I do. (Although I had no such problem with HPMOR, which I read avidly all the way through.)
What is rationalism about?
Rationalism is about the real world. It may or may not strike you as an especially internally consistent, philosophically interesting worldview—this is not what rationality is about. Rationality is about seeing things happen in the real world and then updating your understanding of the world when those things you see surprise you so that they wouldn’t surprise you again.
Why care about predicting things in the world well?
Almost no matter what you ultimately care about, being able to predict ahead of time what’s going to happen next will make you better at planning for your goal.
One central rationalist insight is that thoughts are for guiding actions. Think of your thinking as the connecting tissue sandwiched between the sense data that enters your sense organs and the behaviors your body returns. Your brain is a function from a long sequence of observations (all the sensory inputs you’ve ever received, in the order you received them) to your next motor output.
Understood this way, the point of having a brain and having thoughts is to guide your actions. If your thoughts aren’t all ultimately helping you better steer the universe (by your own lights) … they’re wastes. Thoughts aren’t meant to be causally-closed-off eddies that whirl around in the brain without ever decisively leaving it as actions. They’re meant to transform observations into behaviors! This is the whole point of thinking! Notice when your thoughts are just stewing, without going anywhere, without developing into thoughts that’ll go somewhere … and let go of those useless thoughts. Your thoughts should cut.
If you can imagine a potential worry, then you can generate that worry. Rationalism is, in part, the skill of never being predictably surprised by things you already foresaw.
It may be that you need to “wear another hat” in order to pull that worry out of your brain, or to model another person advising you to get your thoughts to flow that way, but whatever your process, anything you can generate for yourself is something you can foresee and consider. This aspect of rationalism is the art of “mining out your future cognition,” to exactly the extent that you can foresee it, leaving whatever’s left over a mystery to be updated on new observations.
Minor spoilers for mad investor chaos and the woman of asmodeus (Book 1).
The citation link in this post takes you to a NSFW subthread in the story.
(Sam Bankman-Fried, I thought it ought to be mentioned for its kabbalistic significance, is a principal sponsor of the effective altruism movement.)
An implication of AI risk is that we, right now, stand at the fulcrum of human history.
Lots of historical people also claimed that they stood at that unique point in history … and were just wrong about it. But my world model also makes that self-important implication (in a specific form), and the meta-level argument for epistemic modesty isn’t enough to nudge me off of the fulcrum-of-history view.
If you buy that, it’s our overriding imperative to do what we can about it, right now. If we miss this one, ~all of future value evaporates.
For me, the implication of standing at the fulcrum of human history is to…read a lot of textbooks and think about hairy computer science problems.
That seems an odd enough conclusion to make it quite distinct from most other people in human history.
If the conclusion were “go over to those people, hit them on the head with a big rock, and take their women & children as slaves” or “acquire a lot of power”, I’d be way more careful.
There exist both merely clever and effectively smarter people.
Merely clever people are good with words and good at rapidly assimilating complex instructions and ideas, but don’t seem to maintain and update an explicit world-model, an explicit best current theory-of-everything. The feeling I get watching these people respond to topics and questions is that they respond reflexively, either (1) raising related topics and ideas they’ve encountered as something similar comes up, or (2) expressing their gut reactions to the topic or idea, or expressing the gut reactions that would be given by an all-encompassing political worldview. There isn’t much meta-level steering of the conversation.
Effectively smarter people actively maintain and update an explicit world-model, and so you feel queries directed at them reflecting off of a coherent theory of how everything works, developed to some level of detail (and so can quickly get a feel for what, concretely, they think). At the meta-level, conversations are actively refocused whenever they stop helping to revise someone’s world-model.
In the game of chicken, an agent can do better by being the first to precommit to never swerve (say, by conspicuously tossing the steering wheel out of the window). So long as the other agent was slower on the trigger, and sees the first agent’s precommitment being credibly made, the first agent will climb up to his best outcome! A smart (and quick) agent can thus shunt that car crash out of his actual future and into some counterfactual future such that the counterfactual crash’s shadow favorably influences the way events actually unfold.
A deceptively aligned mesa-optimizer might gradient hack by affecting its weights such that any single alteration of its goal function by gradient descent will result in a model that performs worse w/r/t the objective function the model is being trained on. Usually a model would not want to make itself worse w/r/t the objective function, as that means that the model will be trained away from if it gets into that state. But the model won’t get put into a worse state if its gradient hack was successful; instead, the model will just see gradient descent leave its goal function alone.
By precommitting to behave in a somewhat costly manner in these relevant counterfactual worlds, the model gets more of what it wants in actuality.
...unless the other agent has already precommitted to not being rational. (What is the advantage of this over just precommitting not to swerve? Precommitting to not be rational can happen even in advance of the game, as it’s mainly a property of the agent itself.)
(This is one way that you can rationally arrive at irrational agents.)
I don’t yet know too much about this, but I’ve heard that updateless decision theories are equivalent to conventional, updateful decision theories (e.g., EDT and CDT) once those theories have made every precommitment they’d want to make.
The pattern I was getting at above seems a bit like this: it instrumentally makes sense to commit ahead of time to a policy that maps every possible series of observations to an action and then stick to it, instead of just outputting the locally best action in each situation you stumble into.
“Thanks for doing your part for humanity!”
In another world, in which people hold utterly alien values, I would be thrilled to find a rationalist movement with similar infrastructure and memes. If rationalism/Bayescraft as we know it is on to something about instrumental reasoning, then we should see that kind of instrumental reasoning in effective people with alien values.
“But we’re not here to do software engineering—we’re here to save the world.”
Because of deception, we don’t know how to put a given utility function into a smart agent that has grokked the overall picture of its training environment. Once training finds a smart-enough agent, the model’s utility functions ceases to be malleable to us. This suggests that powerful greedy search will find agents with essentially random utility functions.
But, evolution managed to push human values in the rough direction of its own values: inclusive genetic fitness. We don’t care about maximizing inclusive genetic fitness, but we do care about having sex, having kids, protecting family, etc.
How come humans don’t have a random utility function that’s even more out of line with optimizing for inclusive genetic fitness? Because of the exact degree to which our ancestral protein algorithms were stupid. If our ancestors were much smarter, they might have overridden evolution while having just about any utility function. In our world, evolution got to mold our utility function up until it got anatomically modern Homo sapiens, who then—very quickly from evolution’s perspective—assumed control.
The theoretical case for open borders is pretty good. But you might worry a lot about the downside risk of implementing such a big, effectively irreversible (it’d be nigh impossible to deport millions and millions of immigrants) policy change. What if the theory’s wrong and the result is catastrophe?
Just like with futarchy, we might first try out a promising policy like open borders at the state level, to see how it goes. E.g., let people immigrate to just one US state with only minimal conditions. Scaling up a tested policy if it works and abandoning it if it doesn’t should capture most of its upside risk while avoiding most of the downside risk.
A semantic externalist once said,
”Meaning just ain’t in the head.
Hence a brain-in-a-vat
Just couldn’t think that
’Might it all be illusion instead?’”
I thought that having studied philosophy (instead of math or CS) made me an outlier for a rationalist.
But, milling about the Lightcone offices, fully half of the people I’ve encountered hold some kind of philosophy degree. “LessWrong: the best philosophy site on the internet.”
Some mantras I recall a lot, to help keep on the rationalist straight-and-narrow and not let anxiety get the better of me:
What’s more likely to do you in?
Don’t let the perfect be the enemy of the good.
Equanimity in the face of small threats to brain and body health buys you peace of mind, with which to better prepare for serious threats to brain and body health.
How have situations like this played out in the past?
Humans, “teetering bulbs of dream and dread,” evolved as a generally intelligent patina around the Earth. We’re all the general intelligence the planet has to throw around. What fraction of that generally intelligent skin is dedicated to defusing looming existential risks? What fraction is dedicated towards immanentizing the eschaton?
[edited]
Minor spoilers for mad investor chaos (Book 1) and the dath-ilani-verse generally.