When I look at people who have contributed most to alignment-related issues—whether directly, like Eliezer Yudkowsky and Paul Christiano—or theoretically, like Toby Ord and Katja Grace—or indirectly, like Sam Bankman-Fried and Holden Karnofsky—what all of these people have in common is focusing mostly on object-level questions. They all seem to me to have a strong understanding of their own biases, in the sense that gets trained by natural intelligence, really good scientific work, and talking to other smart and curious people like themselves. But as far as I know, none of them have made it a focus of theirs to fight egregores, defeat hypercreatures, awaken to their own mortality, refactor their identity, or cultivate their will. In fact, all them (except maybe Eliezer) seem like the kind of people who would be unusually averse to thinking in those terms. And if we pit their plumbing or truck-manuevering skills against those of an average person, I see no reason to think they would do better (besides maybe high IQ and general ability).
It’s seemed to me that the more that people talk about “rationality training” more exotic than what you would get at a really top-tier economics department, the more those people tend to get kind of navel-gazey, start fighting among themselves, and not accomplish things of the same caliber as the six people I named earlier. I’m not just saying there’s no correlation with success, I’m saying there’s a negative correlation.
(Could this be explained by people who are naturally talented not needing to worry about how to gain talent? Possibly, but this isn’t how it works in other areas—for example, all top athletes, no matter how naturally talented, have trained a lot.)
You’ve seen the same data I have, so I’m curious what makes you think this line of research/thought/effort will be productive.
I think your pushback is ignoring an important point. One major thing the big contributors have in common is that they tend to be unplugged from the stuff Valentine is naming!
So even if folks mostly don’t become contributors by asking “how can I come more truthfully from myself and not what I’m plugged into”, I think there is an important cluster of mysteries here. Examples of related phenomena:
Why has it worked out that just about everyone who claims to take AGI seriously is also vehement about publishing every secret they discover?
Why do we fear an AI arms race, rather than expect deescalation and joint ventures?
Why does the industry fail to understand the idea of aligned AI, and instead claim that “real” alignment work is adversarial-examples/fairness/performance-fine-tuning?
I think Val’s correct on the point that our people and organizations are plugged into some bad stuff, and that it’s worth examining that.
But as far as I know, none of them have made it a focus of theirs to fight egregores, defeat hypercreatures
Egregore is an occult concept representing a distinct non-physical entity that arises from a collective group of people.
I do know one writer who talks a lot about demons and entities from beyond the void. It’s you, and it happens in some of, IMHO, the most valuable pieces you’ve written.
I worry that Caplan is eliding the important summoner/demon distinction. This is an easy distinction to miss, since demons often kill their summoners and wear their skin.
That civilization is dead. It summoned an alien entity from beyond the void which devoured its summoner and is proceeding to eat the rest of the world.
And Ginsberg answers: “Moloch”. It’s powerful not because it’s correct – nobody literally thinks an ancient Carthaginian demon causes everything – but because thinking of the system as an agent throws into relief the degree to which the system isn’t an agent.
But the current rulers of the universe – call them what you want, Moloch, Gnon, whatever – want us dead, and with us everything we value. Art, science, love, philosophy, consciousness itself, the entire bundle. And since I’m not down with that plan, I think defeating them and taking their place is a pretty high priority.
1.) We humans aren’t conscious of all the consequences of our actions, both because the subconscious has an important role in making our choices, and because our world is enormously complex so all consequences are practically unknowable
2.) In a society of billions, these unforeseeable forces combine in something larger than humans can explicitly plan and guide: “the economy”, “culture”, “the market”, “democracy”, “memes”
3.) These larger-than-human-systems prefer some goals that are often antithetical to human preferences. You describe it perfectly in Seeing Like A State: the state has a desire for legibility and ‘rationally planned’ designs that are at odds with the human desire for organic design. And thus, the ‘supersystem’ isn’t merely an aggregate of human desires, it has some qualities of being an actual separate agent with its own preferences. It could be called a hypercreature, an egregore, Moloch or the devil.
4.) We keep hurting ourselves, again and again and again. We keep falling into multipolar traps, we keep choosing for Moloch, which you describe as “the god of child sacrifice, the fiery furnace into which you can toss your babies in exchange for victory in war”. And thus, we have not accomplished for ourselves what we want to do with AI. Humanity is not aligned with human preferences. This is what failure looks like.
5.) If we fail to align humanity, if we fail to align major governments and corporations, if we don’t even recognize our own misalignment, how big is the chance that we will manage to align AGI with human preferences? Total nuclear war has not been avoided by nuclear technicians who kept perfect control over their inventions—it has been avoided by the fact that the US government in 1945 was reasonably aligned with human preferences. I dare not imagine the world where the Nazi government was the first to get its hands on nuclear weapons.
And thus, I think it would be very, very valuable to put a lot more effort into ‘aligning humanity’. How do we keep our institutions and our grassroots movement “free from Moloch”? How do we get and spread reliable, non-corrupt authorities and politicians? How do we stop falling into multipolar traps, how do we stop suffering unnecessarily?
Best case scenario: this effort will turn out to be vital to AGI alignment
Worst case scenario: this effort will turn out to be irrelevant to AGI alignment, but in the meanwhile, we made the world a much better place
I sadly don’t have time to really introspect what is going in me here, but something about this comment feels pretty off to me. I think in some sense it provides an important counterpoint to the OP, but also, I feel like it also stretches the truth quite a bit:
Toby Ord primarily works on influencing public opinion and governments, and very much seems to view the world through a “raising the sanity waterline” lense. Indeed, I just talked to him last morning where I tried to convince him that misuse risk from AI, and the risk from having the “wrong actor” get the AI is much less than he thinks it is, which feels like a very related topic.
Eliezer has done most of his writing on the meta-level, on the art of rationality, on the art of being a good and moral person, and on how to think about your own identity.
Sam Bankman-Fried is also very active in political activism, and (my guess) is quite concerned about the information landscape. I expect he would hate the terms used in this post, but I expect there to be a bunch of similarities in his model of the world and the one outlined in this post, in terms of trying to raise the sanity waterline and improve the world’s decision-making in a much broader sense (there is a reason why he was one of the biggest contributors to the Clinton and Biden campaigns).
I think it is true that the other three are mostly focusing on object-level questions.
I… also dislike something about the meta-level of arguing from high-status individuals. I expect it to make the discussion worse, and also make it harder for people to respond with counter arguments, because counter arguments arguments could be read as attacking the high-status people, which is scary.
I dislike the language used in the OP, and sure feel like it actively steers attention in unproductive ways that make me not want to engage with it, but I do have a strong sense that it’s going to be very hard to actually make progress on building a healthy field of AI Alignment, because the world will repeatedly try to derail the field into being about defeating the other monkeys, or being another story about why you should work at the big AI companies, or why you should give person X or movement Y all of your money, which feels to me related to what the OP is talking about.
Hmm, I feel like the revision would have to be in Scott’s comment. I was just responding to the names that Scott mentioned, and I think everything I am saying here is still accurate.
Given the link, I think you’re objecting to something I don’t care about. I don’t mean to claim that x-rationality is great and has promise to Save the World. Maybe if more really is possible and we do something pretty different to seriously develop it. Maybe. But frankly I recognize stupefying egregores here too and I don’t expect “more and better x-rationality” to do a damn thing to counter those for the foreseeable future.
So on this point I think I agree with you… and I don’t feel whatsoever dissuaded from what I’m saying.
The rest of what you’re saying feels like it’s more targeting what I care about though:
When I look at people who have contributed most to alignment-related issues […] what all of these people have in common is focusing mostly on object-level questions.
Right. And as I said in the OP, stupefaction often entails alienation from object-level reality.
It’s also worth noting that LW exists mostly because Eliezer did in fact notice his own stupidity and freaked the fuck out. He poured a huge amount of energy into taking his internal mental weeding seriously in order to never ever ever be that stupid again. He then wrote all these posts to articulate a mix of (a) what he came to realize and (b) the ethos behind how he came to realize it.
That’s exactly the kind of thing I’m talking about.
A deep art of sanity worth honoring might involve some techniques, awareness of biases, Bayesian reasoning, etc. Maybe. But focusing on that invites Goodhart. I think LW suffers from this in particular.
I’m pointing at something I think is more central: casting off systematic stupidity and replacing it with systematic clarity.
And yeah, I’m pretty sure one effect of that would be grounding one’s thinking in the physical world. Symbolic thinking in service to working with real things, instead of getting lost in symbols as some kind of weirdly independently real.
But as far as I know, none of them have made it a focus of theirs to fight egregores, defeat hypercreatures, awaken to their own mortality, refactor their identity, or cultivate their will.
I think you’re objecting to my aesthetic, not my content.
I think it’s well-established that my native aesthetic rubs many (most?) people in this space the wrong way. At this point I’ve completely given up on giving it a paint job to make it more palatable here.
But if you (or anyone else) cares to attempt a translation, I think you’ll find what you’re saying here to be patently false.
Raising the sanity waterline is exactly about fighting and defeating egregores=hypercreatures. Basically every debiasing technique is like that. Singlethink is a call to arms.
The talk about cryonics and xrisk is exactly an orientation to mortality. (Though this clearly was never Eliezer’s speciality and no one took up the mantle in any deep way, so this is still mostly possessed.)
The whole arc about free will is absolutely about identity. Likewise the stuff on consciousness and p-zombies. Significant chunks of the “Mysterious Answers to Mysterious Questions” Sequence were about how people protect their identities with stupidity and how not to fall prey to that.
The “cultivate their will” part confuses me. I didn’t mean to suggest doing that. I think that’s anti-helpful for the most part. Although frankly I think all the stuff about “Tsuyoku Naritai!” and “Shut up and do the impossible” totally fits the bill of what I imagine when reading your words there… but yeah, no, I think that’s just dumb and don’t advise it.
Although I very much do think that getting damn clear on what you can and cannot do is important, as is ceasing to don responsibility for what you can’t choose and fully accepting responsibility for what you can. That strikes me as absurdly important and neglected. As far as I can tell, anyone who affects anything for real has to at least stumble onto an enacted solution for this in at least some domain.
And if we pit their plumbing or truck-manuevering skills against those of an average person, I see no reason to think they would do better (besides maybe high IQ and general ability).
You seem to be weirdly strawmanning me here.
The trucking thing was a reverence to Zvi’s repeated rants about how politicians didn’t seem to be able to think about the physical world enough to solve the Canadian Trucker Convoy clog in Toronto. I bet that Holden, Eliezer, Paul, etc. would all do massively better than average at sorting out a policy that would physically work. And if they couldn’t, I would worry about their “contributions” to alignment being more made of social illusion than substance.
Things like plumbing are physical skills. So is, say, football. I don’t expect most football players to magically be better at plumbing. Maybe some correlation, but I don’t really care.
If the wizards you’ve named were no better than average at that, then that would also make me worry about their “contributions” to alignment.
It’s seemed to me that the more that people talk about “rationality training” more exotic than what you would get at a really top-tier economics department, the more those people tend to get kind of navel-gazey, start fighting among themselves, and not accomplish things of the same caliber as the six people I named earlier. I’m not just saying there’s no correlation with success, I’m saying there’s a negative correlation.
I totally agree. Watching this dynamic play out within CFAR was a major factor in my checking out from it.
That’s part of what I mean by “this space is still possessed”. Stupefaction still rules here. Just differently.
You’ve seen the same data I have, so I’m curious what makes you think this line of research/thought/effort will be productive.
I think you and I are imagining different things.
I don’t think a LW or CFAR or MIRI flavored project that focuses on thinking about egregores and designing counters to stupefaction is promising. I think that’d just be a different flavor of the same stupid sauce.
(I had different hopes back in 2016, but I’ve been thoroughly persuaded otherwise by now.)
I don’t mean to prescribe a collective action solution at all, honestly. I’m not proposing a research direction. I’m describing a problem.
The closest thing to a solution-shaped object I’m putting forward is: Look at the goddamned question.
Part of what inspired me to write this piece at all was seeing a kind of blindness to these memetic forces in how people talk about AI risk and alignment research. Making bizarre assertions about what things need to happen on the god scale of “AI researchers” or “governments” or whatever, roughly on par with people loudly asserting opinions about what POTUS should do.
It strikes me as immensely obvious that memetic forces precede AGI. If the memetic landscape slants down mercilessly toward existential oblivion here, then the thing to do isn’t to prepare to swim upward against a future avalanche. It’s to orient to the landscape.
If there’s truly no hope, then just enjoy the ride. No point in worrying about any of it.
But if there is hope, it’s going to come from orienting to the right question.
And it strikes me as quite obvious that the technical problem of AI alignment isn’t that question. True, it’s a question that if we could answer it might address the whole picture. But that’s a pretty damn big “if”, and that “might” is awfully concerning.
I do feel some hope about people translating what I’m saying into their own way of thinking, looking at reality, and pondering. I think a realistic solution might organically emerge from that. Or rather, what I’m doing here is an iteration of this solution method. The process of solving the Friendliness problem in human culture has the potential to go superexponential since (a) Moloch doesn’t actually plan except through us and (b) the emergent hatching Friendly hypercreature(s) would probably get better at persuading people of its cause as more individuals allow it to speak through them.
But that’s the wrong scale for individuals to try anything on.
I think all any of us can actually do is try to look at the right question, and hold the fact that we care about having an answer but don’t actually have one.
Maybe. It might be that if you described what you wanted more clearly, it would be the same thing that I want, and possibly I was incorrectly associating this with the things at CFAR you say you’re against, in which case sorry.
But I still don’t feel like I quite understand your suggestion. You talk of “stupefying egregores” as problematic insofar as they distract from the object-level problem. But I don’t understand how pivoting to egregore-fighting isn’t also a distraction from the object-level problem. Maybe this is because I don’t understand what fighting egregores consists of, and if I knew, then I would agree it was some sort of reasonable problem-solving step.
I agree that the Sequences contain a lot of useful deconfusion, but I interpret them as useful primarily because they provide a template for good thinking, and not because clearing up your thinking about those things is itself necessary for doing good work. I think of the cryonics discussion the same way I think of the Many Worlds discussion—following the motions of someone as they get the right answer to a hard question trains you to do this thing yourself.
I’m sorry if “cultivate your will” has the wrong connotations, but you did say “The problem that’s upstream of this is the lack of will”, and I interpreted a lot of your discussion of de-numbing and so on as dealing with this.
Part of what inspired me to write this piece at all was seeing a kind of blindness to these memetic forces in how people talk about AI risk and alignment research. Making bizarre assertions about what things need to happen on the god scale of “AI researchers” or “governments” or whatever, roughly on par with people loudly asserting opinions about what POTUS should do. It strikes me as immensely obvious that memetic forces precede AGI. If the memetic landscape slants down mercilessly toward existential oblivion here, then the thing to do isn’t to prepare to swim upward against a future avalanche. It’s to orient to the landscape.
The claim “memetic forces precede AGI” seems meaningless to me, except insofar as memetic forces precede everything (eg the personal computer was invented because people wanted personal computers and there was a culture of inventing things). Do you mean it in a stronger sense? If so, what sense?
I also don’t understand why it’s wrong to talk about what “AI researchers” or “governments” should do. Sure, it’s more virtuous to act than to chat randomly about stuff, but many Less Wrongers are in positions to change what AI researchers do, and if they have opinions about that, they should voice them. This post of yours right now seems to be about what “the rationalist community” should do, and I don’t think it’s a category error for you to write it.
Maybe this would easier if you described what actions we should take conditional on everything you wrote being right.
There’s also the skulls to consider. As far as I can tell, this post’s recommendations are that we, who are already in a valley littered with a suspicious number of skulls,
turn right towards a dark cave marked ‘skull avenue’ whose mouth is a giant skull, and whose walls are made entirely of skulls that turn to face you as you walk past them deeper into the cave.
The success rate of movments aimed at improving the longterm future or improving rationality has historically been… not great but there’s at least solid concrete emperical reasons to think specific actions will help and we can pin our hopes on that.
The success rate of, let’s build a movement to successfully uncouple ourselves from society’s bad memes and become capable of real action and then our problems will be solvable, is 0. Not just in that thinking that way didn’t help but in that with near 100% success you just end up possessed by worse memes if you make that your explicit final goal (rather than ending up doing that as a side effect of trying to get good at something). And there’s also no concrete paths to action to pin our hopes on.
“The success rate of, let’s build a movement to successfully uncouple ourselves from society’s bad memes and become capable of real action and then our problems will be solvable, is 0.“
I’m not sure if this is an exact analog, but I would have said the scientific revolution and the age of enlightenment were two (To be honest, I’m not entirely sure where one ends and the other begins, and there may be some overlap, but I think of them as two separate but related things) pretty good examples of this that resulted in the world becoming a vastly better place, largely through the efforts of individuals who realized that by changing the way we think about things we can better put to use human ingenuity. I know this is a massive oversimplification, but I think it points in the direction of there potentially being value in pushing the right memes onto society.
The success rate of developing and introducing better memes into society is indeed not 0. The key thing there is that the scientific revolutionaries weren’t just as an abstract thinking “we must uncouple from society first, and then we’ll know what to do”. Rather, they wanted to understand how objects fell, how animals evolved and lots of other specific problems and developed good memes to achieve those ends.
I’m by no means an expert on the topic, but I would have thought it was a result of both object-level thinking producing new memes that society recognized as true, but also some level of abstract thinking along the lines of “using God and the Bible as an explanation for every phenomenon doesn’t seem to be working very well, maybe we should create a scientific method or something.”
I think there may be a bit of us talking past each other, though. From your response, perhaps what I consider “uncoupling from society’s bad memes” you consider to be just generating new memes. It feels like generally a conversation where it’s hard to pin down what exactly people are trying to describe (starting from the OP, which I find very interesting, but am still having some trouble understanding specifically) which is making it a bit hard to communicate.
Now that I’ve had a few days to let the ideas roll around in the back of my head, I’m gonna take a stab at answering this.
I think there are a few different things going on here which are getting confused.
1) What does “memetic forces precede AGI” even mean?
“Individuals”, “memetic forces”, and “that which is upstream of memetics” all act on different scales. As an example of each, I suggest “What will I eat for lunch?”, “Who gets elected POTUS?”, and “Will people eat food?”, respectively.
“What will I eat for lunch?” is an example of an individual decision because I can actually choose the outcome there. While sometimes things like “veganism” will tell me what I should eat, and while I might let that have influence me, I don’t actually have to. If I realize that my life depends on eating steak, I will actually end up eating steak.
“Who gets elected POTUS” is a much tougher problem. I can vote. I can probably persuade friends to vote. If I really dedicate myself to the cause, and I do an exceptionally good job, and I get lucky, I might be able to get my ideas into the minds of enough people that my impact is noticeable. Even then though, it’s a drop in the bucket and pretty far outside my ability to “choose” who gets elected president. If I realize that my life depends on a certain person getting elected who would not get elected without my influence… I almost certainly just die. If a popular memeplex decides that a certain candidate threatens it, that actually can move enough people to plausibly change the outcome of an election.
However there’s a limitation to which memeplexes can become dominant and what they can tell people to do. If a hypercreature tells people to not eat meat, it may get some traction there. If it tries to tell people not to eat at all, it’s almost certainly going to fail and die. Not only will it have a large rate of attrition from adherents dying, but it’s going to be a real hard sell to get people to take its ideas on, and therefore it will have a very hard time spreading.
My reading of the claim “memetic forces precede AGI” is that like getting someone elected POTUS, the problem is simply too big for there to be any reasonable chance that a few guys in a basement can just go do it on their own when not supported by friendly hypercreatures. Val is predicting that our current set of hypercreatures won’t allow that task to be possible without superhuman abilities, and that our only hope is that we end up with sufficiently friendly hypercreatures that this task becomes humanly possible. Kinda like if your dream was to run an openly gay weed dispensary, it’s humanly possible today, but not so further in the past or in Saudi Arabia today; you need that cultural support or it ain’t gonna happen.
2) “Fight egregores” sure sounds like “trying to act on the god level” if anything does. How is this not at least as bad as “build FAI”? What could we possibly do which isn’t foolishly trying to act above our level?
This is a confusing one, because our words for things like “trying” are all muddled together. I think basically, yes, trying to “fight egregores” is “trying to act on the god level”, and is likely to lead to problems. However, that doesn’t mean you can’t make progress against egregores.
So, the problem with “trying to act on a god level” isn’t so much that you’re not a god and therefore “don’t have permission to act on this level” or “ability to touch this level”, it’s that you’re not a god and therefore attempting to act as if you were a god fundamentally requires you to fail to notice and update on that fact. And because you’re failing to update, you’re doing something that doesn’t make sense in light of the information at hand. And not just any information either; it’s information that’s telling you that what you’re trying to do will not work. So of course you’re not going to get where you want if you ignore the road signs saying “WRONG WAY!”.
What you can do, which will help free you from the stupifying factors and unfriendly egregores, and (Val claims) will have the best chance of leading to a FAI, is to look at what’s true. Rather than “I have to do this, or we all die! I must do the impossible”, just “Can I do this? Is it impossible? If so, and I’m [likely] going to die, I can look at that anyway. Given what’s true, what do I want to do?”
If this has a ”...but that doesn’t solve the problem” bit to it, that’s kinda the point. You don’t necessarily get to solve the problem. That’s the uncomfortable thing we should not flinch away from updating on. You might not be able to solve the problem. And then what?
(Not flinching from these things is hard. And important)
3) What’s wrong with talking about what AI researchers should do? There’s actually a good chance they listen! Should they not voice their opinions on the matter? Isn’t that kinda what you’re doing here by talking about what the rationality community should do?
Yes. Kinda. Kinda not.
There’s a question of how careful one has to be, and Val is making a case for much increased caution but not really stating it this way explicitly. Bear with me here, since I’m going to be making points that necessarily seem like “unimportant nitpicking pedantry” relative to an implicit level of caution that is more tolerant to rounding errors of this type, but I’m not actually presupposing anything here about whether increased caution is necessary in general or as it applies to AGI. It is, however, necessary in order to understand Val’s perspective on this, since it is central to his point.
If you look closely, Val never said anything about what the rationality community “should” do. He didn’t use the word “should” once.
He said things like “We can’t align AGI. That’s too big.” and “So, I think raising the sanity waterline is upstream of AI alignment.” and “We have an advantage in that this war happens on and through us. So if we take responsibility for this, we can influence the terrain and bias egregoric/memetic evolution to favor Friendlines”. These things seem to imply that we shouldn’t try to align AGI and should instead do something like “take responsibility” so we can “influcence the terrain and bias egregoric/memetic evolution to favor friendliness”, and as far as rounding errors go, that’s not a huge one. However, he did leave the decision of what to do with the information he presented up to you, and consciously refrained from imbuing it with any “shouldness”. The lack of “should” in his post or comments is very intentional, and is an example of him doing the thing he views as necessary for FAI to have a chance of working out.
In (my understanding of) Val’s perspective, this “shouldness” is a powerful stupifying factor that works itself into everything—if you let it. It prevents you from seeing the truth, and in doing so blocks you from any path which might succeed. It’s so damn seductive and self protecting that we all get drawn into it all the time and don’t really realize—or worse, rationalize and believe that “it’s not really that big a deal; I can achieve my object level goals anyway (or I can’t anyway, and so it makes no difference if I look)”. His claim is that it is that big a deal, because you can’t achieve your goals—and that you know you can’t, which is the whole reason you’re stuck in your thoughts of “should” in the first place. He’s saying that the annoying effort to be more precise about what exactly we are aiming to share and holding ourselves to be squeaky clean from any “impotent shoulding” at things is actually a necessary precondition for success. That if we try to “Shut up and do the impossible”, we fail. That if we “Think about what we should do”, we fail. That if we “try to convince people”, even if we are right and pointing at the right thing, we fail. That if we allow ourselves to casually “should” at things, instead of recognizing it as so incredibly dangerous as to avoid out of principle, we get seduced into being slaves for unfriendly egregores and fail.
That last line is something I’m less sure Val would agree with. He seems to be doing the “hard line avoid shoulding, aim for maximally clean cognition and communication” thing and the “make a point about doing it to highlight the difference” thing, but I haven’t heard him say explicitly that he thinks it has to be a hard line thing.
And I don’t think it does, or should be (case in point). Taking a hard line can be evidence of flinching from a different truth, or a lack of self trust to only use that way of communicating/relating to things in a productive way. I think by not highlighting the fact that it can be done wisely, he clouds his point and makes his case less compelling than it could be. However, I do think he’s correct about it being both a deceptively huge deal and also something that takes a very high level of caution before you start to recognize the issues with lower levels of caution.
I feel seen. I’ll tweak a few details here & there, but you have the essence.
Thank you.
If this has a ”...but that doesn’t solve the problem” bit to it, that’s kinda the point. You don’t necessarily get to solve the problem. That’s the uncomfortable thing we should not flinch away from updating on. You might not be able to solve the problem. And then what?
Agreed.
Two details:
“…we should not flinch away…” is another instance of the thing. This isn’t just banishing the word “should”: the ability not to flinch away from hard things is a skill, and trying to bypass development of that skill with moral panic actually makes everything worse.
The orientation you’re pointing at here biases one’s inner terrain toward Friendly superintelligences. It’s also personally helpful and communicable. This is an example of a Friendly meme that can give rise to a Friendly superintelligence. So while sincerely asking “And then what?” is important, as is holding the preciousness of the fact that we don’t yet have an answer, that is enough. We don’t have to actually answer that question to participate in feeding Friendliness in the egregoric wars. We just have to sincerely ask.
That if we allow ourselves to casually “should” at things, instead of recognizing it as so incredibly dangerous as to avoid out of principle, we get seduced into being slaves for unfriendly egregores and fail.
That last line is something I’m less sure Val would agree with.
Admittedly I’m not sure either.
Generally speaking, viewing things as “so incredibly dangerous as to avoid out of principle” ossifies them too much. Ossified things tend to become attack surfaces for unFriendly superintelligences.
In particular, being scared of how incredibly dangerous something is tends to be stupefying.
But I do think seeing this clearly naturally creates a desire to be more clear and to drop nearly all “shoulding” — not so much the words as the spirit.
(Relatedly: I actually didn’t know I never used the word “should” in the OP! I don’t actually have anything against the word per se. I just try to embody this stuff. I’m delighted to see I’ve gotten far enough that I just naturally dropped using it this way.)
…I haven’t heard him say explicitly that he thinks it has to be a hard line thing.
And I don’t think it does, or should be (case in point). Taking a hard line can be evidence of flinching from a different truth, or a lack of self trust to only use that way of communicating/relating to things in a productive way. I think by not highlighting the fact that it can be done wisely, he clouds his point and makes his case less compelling than it could be.
I’m not totally sure I follow. Do you mean a hard line against “shoulding”?
If so, I mostly just agree with you here.
That said, I think trying to make my point more compelling would in fact be an example of the corruption I’m trying to purify myself of. Instead I want to be correct and clear. That might happen to result in what I’m saying being more compelling… but I need to be clean of the need for that to happen in order for it to unfold in a Friendly way.
However. I totally believe that there’s a way I could have been clearer.
And given how spot-on the rest of what you’ve been saying feels to me, my guess is you’re right about how here.
Although admittedly I don’t have a clear image of what that would have looked like.
“…we should not flinch away…” is another instance of the thing.
Doh! Busted.
Thanks for the reminder.
This isn’t just banishing the word “should”: the ability not to flinch away from hard things is a skill, and trying to bypass development of that skill with moral panic actually makes everything worse.
Agreed.
We don’t have to actually answer that question to participate in feeding Friendliness in the egregoric wars. We just have to sincerely ask.
Good point. Agreed, and worth pointing out explicitly.
I’m not totally sure I follow. Do you mean a hard line against “shoulding”?
Yes. You don’t really need it, things tend to work better without it, and the fact no one even noticed that that it didn’t show up in this post is a good example of that. At the same time, “I shouldn’t ever use ‘should’” obviously has the exact same problems, and it’s possible to miss that you’re taking that stance if you don’t ever say it out loud. I watched some of your videos after Kaj linked one, and… it’s not that it looked like you were doing that, but it looked like you might be doing that. Like there wasn’t any sort of self caricaturing or anything that showed me that “Val is well aware of this failure mode, and is actively steering clear”, so I couldn’t rule it out and wanted to mark it as a point of uncertainty and a thing you might want to watch out for.
That said, I think trying to make my point more compelling would in fact be an example of the corruption I’m trying to purify myself of. Instead I want to be correct and clear. That might happen to result in what I’m saying being more compelling… but I need to be clean of the need for that to happen in order for it to unfold in a Friendly way.
Ah, but I never said you should try to make your point more compelling! What do you notice when you ask yourself why “X would have effect Y” led you to respond with a reason to not do X? ;)
Don’t have the time to write a long comment just now, but I still wanted to point out that describing either Yudkowsky or Christiano as doing mostly object-level research seems incredibly wrong. So much of what they’re doing and have done focused explicitly on which questions to ask, which question not to ask, which paradigm to work in, how to criticize that kind of work… They rarely published posts that are only about the meta-level (although Arbital does contain a bunch of pages along those lines and Prosaic AI Alignment is also meta) but it pervades their writing and thinking.
More generally, when you’re creating a new field of science of research, you tend to do a lot of philosophy of science type stuff, even if you don’t label it explicitly that way. Galileo, Carnot, Darwin, Boltzmann, Einstein, Turing all did it.
(To be clear, I’m pointing at meta-stuff in the sense of “philosophy of science for alignment” type things, not necessarily the more hardcore stuff discussed in the original post)
That’s true, but if you are doing philosophy it is better to admit to it, and learn from existing philosophy, rather than deriding and dismissing the whole field.
This seems irrelevant to the point, yes? I think adamShimi is challenging Scott’s claim that Paul & Eliezer are mostly focusing on object-level questions. It sounds like you’re challenging whether they’re attending to non-object-level questions in the best way. That’s a different question. Am I missing your point?
Eliezer, at least, now seems quite pessimistic about that object-level approach. And in the last few months he’s been writing a ton of fiction about introducing a Friendly hypercreature to an unfriendly world.
When I look at people who have contributed most to alignment-related issues—whether directly… or indirectly, like Sam Bankman-Fried
Perhaps I have missed it, but I’m not aware that Sam has funded any AI alignment work thus far.
If so this sounds like giving him a large amount of credit in advance of doing the work, which is generous but not the order credit allocation should go.
I wasn’t convinced of this ten years ago and I’m still not convinced.
When I look at people who have contributed most to alignment-related issues—whether directly, like Eliezer Yudkowsky and Paul Christiano—or theoretically, like Toby Ord and Katja Grace—or indirectly, like Sam Bankman-Fried and Holden Karnofsky—what all of these people have in common is focusing mostly on object-level questions. They all seem to me to have a strong understanding of their own biases, in the sense that gets trained by natural intelligence, really good scientific work, and talking to other smart and curious people like themselves. But as far as I know, none of them have made it a focus of theirs to fight egregores, defeat hypercreatures, awaken to their own mortality, refactor their identity, or cultivate their will. In fact, all them (except maybe Eliezer) seem like the kind of people who would be unusually averse to thinking in those terms. And if we pit their plumbing or truck-manuevering skills against those of an average person, I see no reason to think they would do better (besides maybe high IQ and general ability).
It’s seemed to me that the more that people talk about “rationality training” more exotic than what you would get at a really top-tier economics department, the more those people tend to get kind of navel-gazey, start fighting among themselves, and not accomplish things of the same caliber as the six people I named earlier. I’m not just saying there’s no correlation with success, I’m saying there’s a negative correlation.
(Could this be explained by people who are naturally talented not needing to worry about how to gain talent? Possibly, but this isn’t how it works in other areas—for example, all top athletes, no matter how naturally talented, have trained a lot.)
You’ve seen the same data I have, so I’m curious what makes you think this line of research/thought/effort will be productive.
I think your pushback is ignoring an important point. One major thing the big contributors have in common is that they tend to be unplugged from the stuff Valentine is naming!
So even if folks mostly don’t become contributors by asking “how can I come more truthfully from myself and not what I’m plugged into”, I think there is an important cluster of mysteries here. Examples of related phenomena:
Why has it worked out that just about everyone who claims to take AGI seriously is also vehement about publishing every secret they discover?
Why do we fear an AI arms race, rather than expect deescalation and joint ventures?
Why does the industry fail to understand the idea of aligned AI, and instead claim that “real” alignment work is adversarial-examples/fairness/performance-fine-tuning?
I think Val’s correct on the point that our people and organizations are plugged into some bad stuff, and that it’s worth examining that.
I do know one writer who talks a lot about demons and entities from beyond the void. It’s you, and it happens in some of, IMHO, the most valuable pieces you’ve written.
It seems pretty obvious to me:
1.) We humans aren’t conscious of all the consequences of our actions, both because the subconscious has an important role in making our choices, and because our world is enormously complex so all consequences are practically unknowable
2.) In a society of billions, these unforeseeable forces combine in something larger than humans can explicitly plan and guide: “the economy”, “culture”, “the market”, “democracy”, “memes”
3.) These larger-than-human-systems prefer some goals that are often antithetical to human preferences. You describe it perfectly in Seeing Like A State: the state has a desire for legibility and ‘rationally planned’ designs that are at odds with the human desire for organic design. And thus, the ‘supersystem’ isn’t merely an aggregate of human desires, it has some qualities of being an actual separate agent with its own preferences. It could be called a hypercreature, an egregore, Moloch or the devil.
4.) We keep hurting ourselves, again and again and again. We keep falling into multipolar traps, we keep choosing for Moloch, which you describe as “the god of child sacrifice, the fiery furnace into which you can toss your babies in exchange for victory in war”. And thus, we have not accomplished for ourselves what we want to do with AI. Humanity is not aligned with human preferences. This is what failure looks like.
5.) If we fail to align humanity, if we fail to align major governments and corporations, if we don’t even recognize our own misalignment, how big is the chance that we will manage to align AGI with human preferences? Total nuclear war has not been avoided by nuclear technicians who kept perfect control over their inventions—it has been avoided by the fact that the US government in 1945 was reasonably aligned with human preferences. I dare not imagine the world where the Nazi government was the first to get its hands on nuclear weapons.
And thus, I think it would be very, very valuable to put a lot more effort into ‘aligning humanity’. How do we keep our institutions and our grassroots movement “free from Moloch”? How do we get and spread reliable, non-corrupt authorities and politicians? How do we stop falling into multipolar traps, how do we stop suffering unnecessarily?
Best case scenario: this effort will turn out to be vital to AGI alignment
Worst case scenario: this effort will turn out to be irrelevant to AGI alignment, but in the meanwhile, we made the world a much better place
I sadly don’t have time to really introspect what is going in me here, but something about this comment feels pretty off to me. I think in some sense it provides an important counterpoint to the OP, but also, I feel like it also stretches the truth quite a bit:
Toby Ord primarily works on influencing public opinion and governments, and very much seems to view the world through a “raising the sanity waterline” lense. Indeed, I just talked to him last morning where I tried to convince him that misuse risk from AI, and the risk from having the “wrong actor” get the AI is much less than he thinks it is, which feels like a very related topic.
Eliezer has done most of his writing on the meta-level, on the art of rationality, on the art of being a good and moral person, and on how to think about your own identity.
Sam Bankman-Fried is also very active in political activism, and (my guess) is quite concerned about the information landscape. I expect he would hate the terms used in this post, but I expect there to be a bunch of similarities in his model of the world and the one outlined in this post, in terms of trying to raise the sanity waterline and improve the world’s decision-making in a much broader sense (there is a reason why he was one of the biggest contributors to the Clinton and Biden campaigns).
I think it is true that the other three are mostly focusing on object-level questions.
I… also dislike something about the meta-level of arguing from high-status individuals. I expect it to make the discussion worse, and also make it harder for people to respond with counter arguments, because counter arguments arguments could be read as attacking the high-status people, which is scary.
I dislike the language used in the OP, and sure feel like it actively steers attention in unproductive ways that make me not want to engage with it, but I do have a strong sense that it’s going to be very hard to actually make progress on building a healthy field of AI Alignment, because the world will repeatedly try to derail the field into being about defeating the other monkeys, or being another story about why you should work at the big AI companies, or why you should give person X or movement Y all of your money, which feels to me related to what the OP is talking about.
The Sam Bankman Fried reads differently now his massive fraud with FTX is public, might be worth a comment/revision?
I can’t help but see Sam disagreeing with a message as a positive for the message (I know it’s a fallacy, but the feelings still there)
Hmm, I feel like the revision would have to be in Scott’s comment. I was just responding to the names that Scott mentioned, and I think everything I am saying here is still accurate.
Given the link, I think you’re objecting to something I don’t care about. I don’t mean to claim that x-rationality is great and has promise to Save the World. Maybe if more really is possible and we do something pretty different to seriously develop it. Maybe. But frankly I recognize stupefying egregores here too and I don’t expect “more and better x-rationality” to do a damn thing to counter those for the foreseeable future.
So on this point I think I agree with you… and I don’t feel whatsoever dissuaded from what I’m saying.
The rest of what you’re saying feels like it’s more targeting what I care about though:
Right. And as I said in the OP, stupefaction often entails alienation from object-level reality.
It’s also worth noting that LW exists mostly because Eliezer did in fact notice his own stupidity and freaked the fuck out. He poured a huge amount of energy into taking his internal mental weeding seriously in order to never ever ever be that stupid again. He then wrote all these posts to articulate a mix of (a) what he came to realize and (b) the ethos behind how he came to realize it.
That’s exactly the kind of thing I’m talking about.
A deep art of sanity worth honoring might involve some techniques, awareness of biases, Bayesian reasoning, etc. Maybe. But focusing on that invites Goodhart. I think LW suffers from this in particular.
I’m pointing at something I think is more central: casting off systematic stupidity and replacing it with systematic clarity.
And yeah, I’m pretty sure one effect of that would be grounding one’s thinking in the physical world. Symbolic thinking in service to working with real things, instead of getting lost in symbols as some kind of weirdly independently real.
I think you’re objecting to my aesthetic, not my content.
I think it’s well-established that my native aesthetic rubs many (most?) people in this space the wrong way. At this point I’ve completely given up on giving it a paint job to make it more palatable here.
But if you (or anyone else) cares to attempt a translation, I think you’ll find what you’re saying here to be patently false.
Raising the sanity waterline is exactly about fighting and defeating egregores=hypercreatures. Basically every debiasing technique is like that. Singlethink is a call to arms.
The talk about cryonics and xrisk is exactly an orientation to mortality. (Though this clearly was never Eliezer’s speciality and no one took up the mantle in any deep way, so this is still mostly possessed.)
The whole arc about free will is absolutely about identity. Likewise the stuff on consciousness and p-zombies. Significant chunks of the “Mysterious Answers to Mysterious Questions” Sequence were about how people protect their identities with stupidity and how not to fall prey to that.
The “cultivate their will” part confuses me. I didn’t mean to suggest doing that. I think that’s anti-helpful for the most part. Although frankly I think all the stuff about “Tsuyoku Naritai!” and “Shut up and do the impossible” totally fits the bill of what I imagine when reading your words there… but yeah, no, I think that’s just dumb and don’t advise it.
Although I very much do think that getting damn clear on what you can and cannot do is important, as is ceasing to don responsibility for what you can’t choose and fully accepting responsibility for what you can. That strikes me as absurdly important and neglected. As far as I can tell, anyone who affects anything for real has to at least stumble onto an enacted solution for this in at least some domain.
You seem to be weirdly strawmanning me here.
The trucking thing was a reverence to Zvi’s repeated rants about how politicians didn’t seem to be able to think about the physical world enough to solve the Canadian Trucker Convoy clog in Toronto. I bet that Holden, Eliezer, Paul, etc. would all do massively better than average at sorting out a policy that would physically work. And if they couldn’t, I would worry about their “contributions” to alignment being more made of social illusion than substance.
Things like plumbing are physical skills. So is, say, football. I don’t expect most football players to magically be better at plumbing. Maybe some correlation, but I don’t really care.
But I do expect that someone who’s mastering a general art of sanity and clarity to be able to think about plumbing in practical physical terms. Instead of “The dishwasher doesn’t work” and vaguely hoping the person with the right cosmic credentials will cast a magic spell, there’d be a kind of clarity about what Gears one does and doesn’t understand, and turning to others because you see they see more relevant Gears.
If the wizards you’ve named were no better than average at that, then that would also make me worry about their “contributions” to alignment.
I totally agree. Watching this dynamic play out within CFAR was a major factor in my checking out from it.
That’s part of what I mean by “this space is still possessed”. Stupefaction still rules here. Just differently.
I think you and I are imagining different things.
I don’t think a LW or CFAR or MIRI flavored project that focuses on thinking about egregores and designing counters to stupefaction is promising. I think that’d just be a different flavor of the same stupid sauce.
(I had different hopes back in 2016, but I’ve been thoroughly persuaded otherwise by now.)
I don’t mean to prescribe a collective action solution at all, honestly. I’m not proposing a research direction. I’m describing a problem.
The closest thing to a solution-shaped object I’m putting forward is: Look at the goddamned question.
Part of what inspired me to write this piece at all was seeing a kind of blindness to these memetic forces in how people talk about AI risk and alignment research. Making bizarre assertions about what things need to happen on the god scale of “AI researchers” or “governments” or whatever, roughly on par with people loudly asserting opinions about what POTUS should do.
It strikes me as immensely obvious that memetic forces precede AGI. If the memetic landscape slants down mercilessly toward existential oblivion here, then the thing to do isn’t to prepare to swim upward against a future avalanche. It’s to orient to the landscape.
If there’s truly no hope, then just enjoy the ride. No point in worrying about any of it.
But if there is hope, it’s going to come from orienting to the right question.
And it strikes me as quite obvious that the technical problem of AI alignment isn’t that question. True, it’s a question that if we could answer it might address the whole picture. But that’s a pretty damn big “if”, and that “might” is awfully concerning.
I do feel some hope about people translating what I’m saying into their own way of thinking, looking at reality, and pondering. I think a realistic solution might organically emerge from that. Or rather, what I’m doing here is an iteration of this solution method. The process of solving the Friendliness problem in human culture has the potential to go superexponential since (a) Moloch doesn’t actually plan except through us and (b) the emergent hatching Friendly hypercreature(s) would probably get better at persuading people of its cause as more individuals allow it to speak through them.
But that’s the wrong scale for individuals to try anything on.
I think all any of us can actually do is try to look at the right question, and hold the fact that we care about having an answer but don’t actually have one.
Does that clarify?
Maybe. It might be that if you described what you wanted more clearly, it would be the same thing that I want, and possibly I was incorrectly associating this with the things at CFAR you say you’re against, in which case sorry.
But I still don’t feel like I quite understand your suggestion. You talk of “stupefying egregores” as problematic insofar as they distract from the object-level problem. But I don’t understand how pivoting to egregore-fighting isn’t also a distraction from the object-level problem. Maybe this is because I don’t understand what fighting egregores consists of, and if I knew, then I would agree it was some sort of reasonable problem-solving step.
I agree that the Sequences contain a lot of useful deconfusion, but I interpret them as useful primarily because they provide a template for good thinking, and not because clearing up your thinking about those things is itself necessary for doing good work. I think of the cryonics discussion the same way I think of the Many Worlds discussion—following the motions of someone as they get the right answer to a hard question trains you to do this thing yourself.
I’m sorry if “cultivate your will” has the wrong connotations, but you did say “The problem that’s upstream of this is the lack of will”, and I interpreted a lot of your discussion of de-numbing and so on as dealing with this.
The claim “memetic forces precede AGI” seems meaningless to me, except insofar as memetic forces precede everything (eg the personal computer was invented because people wanted personal computers and there was a culture of inventing things). Do you mean it in a stronger sense? If so, what sense?
I also don’t understand why it’s wrong to talk about what “AI researchers” or “governments” should do. Sure, it’s more virtuous to act than to chat randomly about stuff, but many Less Wrongers are in positions to change what AI researchers do, and if they have opinions about that, they should voice them. This post of yours right now seems to be about what “the rationalist community” should do, and I don’t think it’s a category error for you to write it.
Maybe this would easier if you described what actions we should take conditional on everything you wrote being right.
There’s also the skulls to consider. As far as I can tell, this post’s recommendations are that we, who are already in a valley littered with a suspicious number of skulls,
https://forum.effectivealtruism.org/posts/ZcpZEXEFZ5oLHTnr9/noticing-the-skulls-longtermism-edition
https://slatestarcodex.com/2017/04/07/yes-we-have-noticed-the-skulls/
turn right towards a dark cave marked ‘skull avenue’ whose mouth is a giant skull, and whose walls are made entirely of skulls that turn to face you as you walk past them deeper into the cave.
The success rate of movments aimed at improving the longterm future or improving rationality has historically been… not great but there’s at least solid concrete emperical reasons to think specific actions will help and we can pin our hopes on that.
The success rate of, let’s build a movement to successfully uncouple ourselves from society’s bad memes and become capable of real action and then our problems will be solvable, is 0. Not just in that thinking that way didn’t help but in that with near 100% success you just end up possessed by worse memes if you make that your explicit final goal (rather than ending up doing that as a side effect of trying to get good at something). And there’s also no concrete paths to action to pin our hopes on.
“The success rate of, let’s build a movement to successfully uncouple ourselves from society’s bad memes and become capable of real action and then our problems will be solvable, is 0.“
I’m not sure if this is an exact analog, but I would have said the scientific revolution and the age of enlightenment were two (To be honest, I’m not entirely sure where one ends and the other begins, and there may be some overlap, but I think of them as two separate but related things) pretty good examples of this that resulted in the world becoming a vastly better place, largely through the efforts of individuals who realized that by changing the way we think about things we can better put to use human ingenuity. I know this is a massive oversimplification, but I think it points in the direction of there potentially being value in pushing the right memes onto society.
The success rate of developing and introducing better memes into society is indeed not 0. The key thing there is that the scientific revolutionaries weren’t just as an abstract thinking “we must uncouple from society first, and then we’ll know what to do”. Rather, they wanted to understand how objects fell, how animals evolved and lots of other specific problems and developed good memes to achieve those ends.
I’m by no means an expert on the topic, but I would have thought it was a result of both object-level thinking producing new memes that society recognized as true, but also some level of abstract thinking along the lines of “using God and the Bible as an explanation for every phenomenon doesn’t seem to be working very well, maybe we should create a scientific method or something.”
I think there may be a bit of us talking past each other, though. From your response, perhaps what I consider “uncoupling from society’s bad memes” you consider to be just generating new memes. It feels like generally a conversation where it’s hard to pin down what exactly people are trying to describe (starting from the OP, which I find very interesting, but am still having some trouble understanding specifically) which is making it a bit hard to communicate.
Now that I’ve had a few days to let the ideas roll around in the back of my head, I’m gonna take a stab at answering this.
I think there are a few different things going on here which are getting confused.
1) What does “memetic forces precede AGI” even mean?
“Individuals”, “memetic forces”, and “that which is upstream of memetics” all act on different scales. As an example of each, I suggest “What will I eat for lunch?”, “Who gets elected POTUS?”, and “Will people eat food?”, respectively.
“What will I eat for lunch?” is an example of an individual decision because I can actually choose the outcome there. While sometimes things like “veganism” will tell me what I should eat, and while I might let that have influence me, I don’t actually have to. If I realize that my life depends on eating steak, I will actually end up eating steak.
“Who gets elected POTUS” is a much tougher problem. I can vote. I can probably persuade friends to vote. If I really dedicate myself to the cause, and I do an exceptionally good job, and I get lucky, I might be able to get my ideas into the minds of enough people that my impact is noticeable. Even then though, it’s a drop in the bucket and pretty far outside my ability to “choose” who gets elected president. If I realize that my life depends on a certain person getting elected who would not get elected without my influence… I almost certainly just die. If a popular memeplex decides that a certain candidate threatens it, that actually can move enough people to plausibly change the outcome of an election.
However there’s a limitation to which memeplexes can become dominant and what they can tell people to do. If a hypercreature tells people to not eat meat, it may get some traction there. If it tries to tell people not to eat at all, it’s almost certainly going to fail and die. Not only will it have a large rate of attrition from adherents dying, but it’s going to be a real hard sell to get people to take its ideas on, and therefore it will have a very hard time spreading.
My reading of the claim “memetic forces precede AGI” is that like getting someone elected POTUS, the problem is simply too big for there to be any reasonable chance that a few guys in a basement can just go do it on their own when not supported by friendly hypercreatures. Val is predicting that our current set of hypercreatures won’t allow that task to be possible without superhuman abilities, and that our only hope is that we end up with sufficiently friendly hypercreatures that this task becomes humanly possible. Kinda like if your dream was to run an openly gay weed dispensary, it’s humanly possible today, but not so further in the past or in Saudi Arabia today; you need that cultural support or it ain’t gonna happen.
2) “Fight egregores” sure sounds like “trying to act on the god level” if anything does. How is this not at least as bad as “build FAI”? What could we possibly do which isn’t foolishly trying to act above our level?
This is a confusing one, because our words for things like “trying” are all muddled together. I think basically, yes, trying to “fight egregores” is “trying to act on the god level”, and is likely to lead to problems. However, that doesn’t mean you can’t make progress against egregores.
So, the problem with “trying to act on a god level” isn’t so much that you’re not a god and therefore “don’t have permission to act on this level” or “ability to touch this level”, it’s that you’re not a god and therefore attempting to act as if you were a god fundamentally requires you to fail to notice and update on that fact. And because you’re failing to update, you’re doing something that doesn’t make sense in light of the information at hand. And not just any information either; it’s information that’s telling you that what you’re trying to do will not work. So of course you’re not going to get where you want if you ignore the road signs saying “WRONG WAY!”.
What you can do, which will help free you from the stupifying factors and unfriendly egregores, and (Val claims) will have the best chance of leading to a FAI, is to look at what’s true. Rather than “I have to do this, or we all die! I must do the impossible”, just “Can I do this? Is it impossible? If so, and I’m [likely] going to die, I can look at that anyway. Given what’s true, what do I want to do?”
If this has a ”...but that doesn’t solve the problem” bit to it, that’s kinda the point. You don’t necessarily get to solve the problem. That’s the uncomfortable thing we should not flinch away from updating on. You might not be able to solve the problem. And then what?
(Not flinching from these things is hard. And important)
3) What’s wrong with talking about what AI researchers should do? There’s actually a good chance they listen! Should they not voice their opinions on the matter? Isn’t that kinda what you’re doing here by talking about what the rationality community should do?
Yes. Kinda. Kinda not.
There’s a question of how careful one has to be, and Val is making a case for much increased caution but not really stating it this way explicitly. Bear with me here, since I’m going to be making points that necessarily seem like “unimportant nitpicking pedantry” relative to an implicit level of caution that is more tolerant to rounding errors of this type, but I’m not actually presupposing anything here about whether increased caution is necessary in general or as it applies to AGI. It is, however, necessary in order to understand Val’s perspective on this, since it is central to his point.
If you look closely, Val never said anything about what the rationality community “should” do. He didn’t use the word “should” once.
He said things like “We can’t align AGI. That’s too big.” and “So, I think raising the sanity waterline is upstream of AI alignment.” and “We have an advantage in that this war happens on and through us. So if we take responsibility for this, we can influence the terrain and bias egregoric/memetic evolution to favor Friendlines”. These things seem to imply that we shouldn’t try to align AGI and should instead do something like “take responsibility” so we can “influcence the terrain and bias egregoric/memetic evolution to favor friendliness”, and as far as rounding errors go, that’s not a huge one. However, he did leave the decision of what to do with the information he presented up to you, and consciously refrained from imbuing it with any “shouldness”. The lack of “should” in his post or comments is very intentional, and is an example of him doing the thing he views as necessary for FAI to have a chance of working out.
In (my understanding of) Val’s perspective, this “shouldness” is a powerful stupifying factor that works itself into everything—if you let it. It prevents you from seeing the truth, and in doing so blocks you from any path which might succeed. It’s so damn seductive and self protecting that we all get drawn into it all the time and don’t really realize—or worse, rationalize and believe that “it’s not really that big a deal; I can achieve my object level goals anyway (or I can’t anyway, and so it makes no difference if I look)”. His claim is that it is that big a deal, because you can’t achieve your goals—and that you know you can’t, which is the whole reason you’re stuck in your thoughts of “should” in the first place. He’s saying that the annoying effort to be more precise about what exactly we are aiming to share and holding ourselves to be squeaky clean from any “impotent shoulding” at things is actually a necessary precondition for success. That if we try to “Shut up and do the impossible”, we fail. That if we “Think about what we should do”, we fail. That if we “try to convince people”, even if we are right and pointing at the right thing, we fail. That if we allow ourselves to casually “should” at things, instead of recognizing it as so incredibly dangerous as to avoid out of principle, we get seduced into being slaves for unfriendly egregores and fail.
That last line is something I’m less sure Val would agree with. He seems to be doing the “hard line avoid shoulding, aim for maximally clean cognition and communication” thing and the “make a point about doing it to highlight the difference” thing, but I haven’t heard him say explicitly that he thinks it has to be a hard line thing.
And I don’t think it does, or should be (case in point). Taking a hard line can be evidence of flinching from a different truth, or a lack of self trust to only use that way of communicating/relating to things in a productive way. I think by not highlighting the fact that it can be done wisely, he clouds his point and makes his case less compelling than it could be. However, I do think he’s correct about it being both a deceptively huge deal and also something that takes a very high level of caution before you start to recognize the issues with lower levels of caution.
I feel seen. I’ll tweak a few details here & there, but you have the essence.
Thank you.
Agreed.
Two details:
“…we should not flinch away…” is another instance of the thing. This isn’t just banishing the word “should”: the ability not to flinch away from hard things is a skill, and trying to bypass development of that skill with moral panic actually makes everything worse.
The orientation you’re pointing at here biases one’s inner terrain toward Friendly superintelligences. It’s also personally helpful and communicable. This is an example of a Friendly meme that can give rise to a Friendly superintelligence. So while sincerely asking “And then what?” is important, as is holding the preciousness of the fact that we don’t yet have an answer, that is enough. We don’t have to actually answer that question to participate in feeding Friendliness in the egregoric wars. We just have to sincerely ask.
Admittedly I’m not sure either.
Generally speaking, viewing things as “so incredibly dangerous as to avoid out of principle” ossifies them too much. Ossified things tend to become attack surfaces for unFriendly superintelligences.
In particular, being scared of how incredibly dangerous something is tends to be stupefying.
But I do think seeing this clearly naturally creates a desire to be more clear and to drop nearly all “shoulding” — not so much the words as the spirit.
(Relatedly: I actually didn’t know I never used the word “should” in the OP! I don’t actually have anything against the word per se. I just try to embody this stuff. I’m delighted to see I’ve gotten far enough that I just naturally dropped using it this way.)
I’m not totally sure I follow. Do you mean a hard line against “shoulding”?
If so, I mostly just agree with you here.
That said, I think trying to make my point more compelling would in fact be an example of the corruption I’m trying to purify myself of. Instead I want to be correct and clear. That might happen to result in what I’m saying being more compelling… but I need to be clean of the need for that to happen in order for it to unfold in a Friendly way.
However. I totally believe that there’s a way I could have been clearer.
And given how spot-on the rest of what you’ve been saying feels to me, my guess is you’re right about how here.
Although admittedly I don’t have a clear image of what that would have looked like.
Doh! Busted.
Thanks for the reminder.
Agreed.
Good point. Agreed, and worth pointing out explicitly.
Yes. You don’t really need it, things tend to work better without it, and the fact no one even noticed that that it didn’t show up in this post is a good example of that. At the same time, “I shouldn’t ever use ‘should’” obviously has the exact same problems, and it’s possible to miss that you’re taking that stance if you don’t ever say it out loud. I watched some of your videos after Kaj linked one, and… it’s not that it looked like you were doing that, but it looked like you might be doing that. Like there wasn’t any sort of self caricaturing or anything that showed me that “Val is well aware of this failure mode, and is actively steering clear”, so I couldn’t rule it out and wanted to mark it as a point of uncertainty and a thing you might want to watch out for.
Ah, but I never said you should try to make your point more compelling! What do you notice when you ask yourself why “X would have effect Y” led you to respond with a reason to not do X? ;)
Don’t have the time to write a long comment just now, but I still wanted to point out that describing either Yudkowsky or Christiano as doing mostly object-level research seems incredibly wrong. So much of what they’re doing and have done focused explicitly on which questions to ask, which question not to ask, which paradigm to work in, how to criticize that kind of work… They rarely published posts that are only about the meta-level (although Arbital does contain a bunch of pages along those lines and Prosaic AI Alignment is also meta) but it pervades their writing and thinking.
More generally, when you’re creating a new field of science of research, you tend to do a lot of philosophy of science type stuff, even if you don’t label it explicitly that way. Galileo, Carnot, Darwin, Boltzmann, Einstein, Turing all did it.
(To be clear, I’m pointing at meta-stuff in the sense of “philosophy of science for alignment” type things, not necessarily the more hardcore stuff discussed in the original post)
That’s true, but if you are doing philosophy it is better to admit to it, and learn from existing philosophy, rather than deriding and dismissing the whole field.
This seems irrelevant to the point, yes? I think adamShimi is challenging Scott’s claim that Paul & Eliezer are mostly focusing on object-level questions. It sounds like you’re challenging whether they’re attending to non-object-level questions in the best way. That’s a different question. Am I missing your point?
Eliezer, at least, now seems quite pessimistic about that object-level approach. And in the last few months he’s been writing a ton of fiction about introducing a Friendly hypercreature to an unfriendly world.
Perhaps I have missed it, but I’m not aware that Sam has funded any AI alignment work thus far.
If so this sounds like giving him a large amount of credit in advance of doing the work, which is generous but not the order credit allocation should go.