We’re already in AI takeoff
Back in 2016, CFAR pivoted to focusing on xrisk. I think the magic phrase at the time was:
“Rationality for its own sake, for the sake of existential risk.”
I was against this move. I also had no idea how power works. I don’t know how to translate this into LW language, so I’ll just use mine: I was secret-to-me vastly more interested in being victimized at people/institutions/the world than I was in doing real things.
But the reason I was against the move is solid. I still believe in it.
I want to spell that part out a bit. Not to gripe about the past. The past makes sense to me. But because the idea still applies.
I think it’s a simple idea once it’s not cloaked in bullshit. Maybe that’s an illusion of transparency. But I’ll try to keep this simple-to-me and correct toward more detail when asked and I feel like it, rather than spelling out all the details in a way that turns out to have been unneeded.
Which is to say, this’ll be kind of punchy and under-justified.
The short version is this:
We’re already in AI takeoff. The “AI” is just running on human minds right now. Sorting out AI alignment in computers is focusing entirely on the endgame. That’s not where the causal power is.
Maybe that’s enough for you. If so, cool.
I’ll say more to gesture at the flesh of this.
What kind of thing is wokism? Or Communism? What kind of thing was Naziism in WWII? Or the flat Earth conspiracy movement? Q Anon?
If you squint a bit, you might see there’s a common type here.
In a Facebook post I argued that it’s fair to view these things as alive. Well, really, I just described them as living, which kind of is the argument. If your woo allergy keeps you from seeing that… well, good luck to you. But if you’re willing to just assume I mean something non-woo, you just might see something real there.
These hyperobject creatures are undergoing massive competitive evolution. Thanks Internet. They’re competing for resources. Literal things like land, money, political power… and most importantly, human minds.
I mean something loose here. Y’all are mostly better at details than I am. I’ll let you flesh those out rather than pretending I can do it well.
But I’m guessing you know this thing. We saw it in the pandemic, where friendships got torn apart because people got hooked by competing memes. Some “plandemic” conspiracy theorist anti-vax types, some blind belief in provably incoherent authorities, the whole anti-racism woke wave, etc.
This is people getting possessed.
And the… things… possessing them are highly optimizing for this.
To borrow a bit from fiction: It’s worth knowing that in their original vision for The Matrix, the Wachowski siblings wanted humans to be processors, not batteries. The Matrix was a way of harvesting human computing power. As I recall, they had to change it because someone argued that people wouldn’t understand their idea.
I think we’re in a scenario like this. Not so much the “in a simulation” part. (I mean, maybe. But for what I’m saying here I don’t care.) But yes with a functionally nonhuman intelligence hijacking our minds to do coordinated computations.
(And no, I’m not positing a ghost in the machine, any more than I posit a ghost in the machine of “you” when I pretend that you are an intelligent agent. If we stop pretending that intelligence is ontologically separate from the structures it’s implemented on, then the same thing that lets “superintelligent agent” mean anything at all says we already have several.)
We’re already witnessing orthogonality.
The talk of “late-stage capitalism” points at this. The way greenwashing appears for instance is intelligently weaponized Goodhart. It’s explicitly hacking people’s signals in order to extract what the hypercreature in question wants from people (usually profit).
The way China is drifting with a social credit system and facial recognition tech in its one party system, it appears to be threatening a Shriek. Maybe I’m badly informed here. But the point is the possibility.
In the USA, we have to file income taxes every year even though we have the tech to make it a breeze. Why? “Lobbying” is right, but that describes the action. What’s the intelligence behind the action? What agent becomes your intentional opponent if you try to change this? You might point at specific villains, but they’re not really the cause. The CEO of TurboTax doesn’t stay the CEO if he doesn’t serve the hypercreature’s hunger.
I’ll let you fill in other examples.
If the whole world were unified on AI alignment being an issue, it’d just be a problem to solve.
The problem that’s upstream of this is the lack of will.
Same thing with cryonics really. Or aging.
But AI is particularly acute around here, so I’ll stick to that.
The problem is that people’s minds aren’t clear enough to look at the problem for real. Most folk can’t orient to AI risk without going nuts or numb or splitting out gibberish platitudes.
I think this is part accidental and part hypercreature-intentional.
The accidental part is like how advertisements do a kind of DDOS attack on people’s sense of inherent self-worth. There isn’t even a single egregore to point at as the cause of that. It’s just that many, many such hypercreatures benefit from the deluge of subtly negative messaging and therefore tap into it in a sort of (for them) inverse tragedy of the commons. (Victory of the commons?)
In the same way, there’s a very particular kind of stupid that (a) is pretty much independent of g factor and (b) is super beneficial for these hypercreatures as a pathway to possession.
And I say “stupid” both because it’s evocative but also because of ties to terms like “stupendous” and “stupefy”. I interpret “stupid” to mean something like “stunned”. Like the mind is numb and pliable.
It so happens that the shape of this stupid keeps people from being grounded in the physical world. Like, how do you get a bunch of trucks out of a city? How do you fix the plumbing in your house? Why six feet for social distancing? It’s easier to drift to supposed-to’s and blame minimization. A mind that does that is super programmable.
The kind of clarity that you need to de-numb and actually goddamn look at AI risk is pretty anti all this. It’s inoculation to zombiism.
So for one, that’s just hard.
But for two, once a hypercreature (of this type) notices this immunity taking hold, it’ll double down. Evolve weaponry.
That’s the “intentional” part.
This is where people — having their minds coopted for Matrix-like computation — will pour their intelligence into dismissing arguments for AI risk.
This is why we can’t get serious enough buy-in to this problem.
Which is to say, the problem isn’t a need for AI alignment research.
The problem is current hypercreature unFriendliness.
From what I’ve been able to tell, AI alignment folk for the most part are trying to look at this external thing, this AGI, and make it aligned.
I think this is doomed.
Not just because we’re out of time. That might be.
But the basic idea was already self-defeating.
Who is aligning the AGI? And to what is it aligning?
This isn’t just a cute philosophy problem.
A common result of egregoric stupefaction is identity fuckery. We get this image of ourselves in our minds, and then we look at that image and agree “Yep, that’s me.” Then we rearrange our minds so that all those survival instincts of the body get aimed at protecting the image in our minds.
How did you decide which bits are “you”? Or what can threaten “you”?
I’ll hop past the deluge of opinions and just tell you: It’s these superintelligences. They shaped your culture’s messages, probably shoved you through public school, gripped your parents to scar you in predictable ways, etc.
It’s like installing a memetic operating system.
If you don’t sort that out, then that OS will drive how you orient to AI alignment.
My guess is, it’s a fuckton easier to sort out Friendliness/alignment within a human being than it is on a computer. Because the stuff making up Friendliness is right there.
And by extension, I think it’s a whole lot easier to create/invoke/summon/discover/etc. a Friendly hypercreature than it is to solve digital AI alignment. The birth of science was an early example.
I’m pretty sure this alignment needs to happen in first person. Not third person. It’s not (just) an external puzzle, but is something you solve inside yourself.
A brief but hopefully clarifying aside:
Stephen Jenkinson argues that most people don’t know they’re going to die. Rather, they know that everyone else is going to die.
That’s what changes when someone gets a terminal diagnosis.
I mean, if I have a 100% reliable magic method for telling how you’re going to die, and I tell you “Oh, you’ll get a heart attack and that’ll be it”, that’ll probably feel weird but it won’t fill you with dread. If anything it might free you because now you know there’s only one threat to guard against.
But there’s a kind of deep, personal dread, a kind of intimate knowing, that comes when the doctor comes in with a particular weight and says “I’ve got some bad news.”
It’s immanent.
You can feel that it’s going to happen to you.
Not the idea of you. It’s not “Yeah, sure, I’m gonna die someday.”
It becomes real.
You’re going to experience it from behind the eyes reading these words.
From within the skin you’re in as you witness this screen.
When I talk about alignment being “first person and not third person”, it’s like this. How knowing your mortality doesn’t happen until it happens in first person.
Any kind of “alignment” or “Friendliness” or whatever that doesn’t put that first person ness at the absolute very center isn’t a thing worth crowing about.
I think that’s the core mistake anyway. Why we’re in this predicament, why we have unaligned superintelligences ruling the world, and why AGI looks so scary.
It’s in forgetting the center of what really matters.
It’s worth noting that the only scale that matters anymore is the hypercreature one.
I mean, one of the biggest things a single person can build on their own is a house. But that’s hard, and most people can’t do that. Mostly companies build houses.
Solving AI alignment is fundamentally a coordination problem. The kind of math/programming/etc. needed to solve it is literally superhuman, the way the four color theorem was (and still kind of is) superhuman.
“Attempted solutions to coordination problems” is a fine proto-definition of the hypercreatures I’m talking about.
So if the creatures you summon to solve AI alignment aren’t Friendly, you’re going to have a bad time.
And for exactly the same reason that most AGIs aren’t Friendly, most emergent egregores aren’t either.
As individuals, we seem to have some glimmer of ability to lean toward resonance with one hypercreature or another. Even just choosing what info diet you’re on can do this. (Although there’s an awful lot of magic in that “just choosing” part.)
But that’s about it.
We can’t align AGI. That’s too big.
It’s too big the way the pandemic was too big, and the Ukraine/Putin war is too big, and wokeism is too big.
When individuals try to act on the “god” scale, they usually just get possessed. That’s the stupid simple way of solving coordination problems.
So when you try to contribute to solving AI alignment, what egregore are you feeding?
If you don’t know, it’s probably an unFriendly one.
(Also, don’t believe your thoughts too much. Where did they come from?)
So, I think raising the sanity waterline is upstream of AI alignment.
It’s like we’ve got gods warring, and they’re threatening to step into digital form to accelerate their war.
We’re freaking out about their potential mech suits.
But the problem is the all-out war, not the weapons.
We have an advantage in that this war happens on and through us. So if we take responsibility for this, we can influence the terrain and bias egregoric/memetic evolution to favor Friendliness.
Anything else is playing at the wrong level. Not our job. Can’t be our job. Not as individuals, and it’s individuals who seem to have something mimicking free will.
Sorting that out in practice seems like the only thing worth doing.
Not “solving xrisk”. We can’t do that. Too big. That’s worth modeling, since the gods need our minds in order to think and understand things. But attaching desperation and a sense of “I must act!” to it is insanity. Food for the wrong gods.
Ergo why I support rationality for its own sake, period.
That, at least, seems to target a level at which we mere humans can act.
- Slack matters more than any outcome by 31 Dec 2022 20:11 UTC; 151 points) (
- Gliders in Language Models by 25 Nov 2022 0:38 UTC; 30 points) (
- 3 Mar 2022 9:13 UTC; 16 points) 's comment on Important, actionable research questions for the most important century by (EA Forum;
- Godshatter Versus Legibility: A Fundamentally Different Approach To AI Alignment by 9 Apr 2022 21:43 UTC; 15 points) (
- “Dark Constitution” for constraining some superintelligences by 10 Jan 2024 16:02 UTC; 3 points) (
- 2 Jun 2024 20:20 UTC; 2 points) 's comment on mesaoptimizer’s Shortform by (
- 16 Aug 2022 0:17 UTC; 2 points) 's comment on Seeking PCK (Pedagogical Content Knowledge) by (
- 30 Dec 2023 16:28 UTC; 2 points) 's comment on Here’s the exit. by (
I wasn’t convinced of this ten years ago and I’m still not convinced.
When I look at people who have contributed most to alignment-related issues—whether directly, like Eliezer Yudkowsky and Paul Christiano—or theoretically, like Toby Ord and Katja Grace—or indirectly, like Sam Bankman-Fried and Holden Karnofsky—what all of these people have in common is focusing mostly on object-level questions. They all seem to me to have a strong understanding of their own biases, in the sense that gets trained by natural intelligence, really good scientific work, and talking to other smart and curious people like themselves. But as far as I know, none of them have made it a focus of theirs to fight egregores, defeat hypercreatures, awaken to their own mortality, refactor their identity, or cultivate their will. In fact, all them (except maybe Eliezer) seem like the kind of people who would be unusually averse to thinking in those terms. And if we pit their plumbing or truck-manuevering skills against those of an average person, I see no reason to think they would do better (besides maybe high IQ and general ability).
It’s seemed to me that the more that people talk about “rationality training” more exotic than what you would get at a really top-tier economics department, the more those people tend to get kind of navel-gazey, start fighting among themselves, and not accomplish things of the same caliber as the six people I named earlier. I’m not just saying there’s no correlation with success, I’m saying there’s a negative correlation.
(Could this be explained by people who are naturally talented not needing to worry about how to gain talent? Possibly, but this isn’t how it works in other areas—for example, all top athletes, no matter how naturally talented, have trained a lot.)
You’ve seen the same data I have, so I’m curious what makes you think this line of research/thought/effort will be productive.
I think your pushback is ignoring an important point. One major thing the big contributors have in common is that they tend to be unplugged from the stuff Valentine is naming!
So even if folks mostly don’t become contributors by asking “how can I come more truthfully from myself and not what I’m plugged into”, I think there is an important cluster of mysteries here. Examples of related phenomena:
Why has it worked out that just about everyone who claims to take AGI seriously is also vehement about publishing every secret they discover?
Why do we fear an AI arms race, rather than expect deescalation and joint ventures?
Why does the industry fail to understand the idea of aligned AI, and instead claim that “real” alignment work is adversarial-examples/fairness/performance-fine-tuning?
I think Val’s correct on the point that our people and organizations are plugged into some bad stuff, and that it’s worth examining that.
I do know one writer who talks a lot about demons and entities from beyond the void. It’s you, and it happens in some of, IMHO, the most valuable pieces you’ve written.
It seems pretty obvious to me:
1.) We humans aren’t conscious of all the consequences of our actions, both because the subconscious has an important role in making our choices, and because our world is enormously complex so all consequences are practically unknowable
2.) In a society of billions, these unforeseeable forces combine in something larger than humans can explicitly plan and guide: “the economy”, “culture”, “the market”, “democracy”, “memes”
3.) These larger-than-human-systems prefer some goals that are often antithetical to human preferences. You describe it perfectly in Seeing Like A State: the state has a desire for legibility and ‘rationally planned’ designs that are at odds with the human desire for organic design. And thus, the ‘supersystem’ isn’t merely an aggregate of human desires, it has some qualities of being an actual separate agent with its own preferences. It could be called a hypercreature, an egregore, Moloch or the devil.
4.) We keep hurting ourselves, again and again and again. We keep falling into multipolar traps, we keep choosing for Moloch, which you describe as “the god of child sacrifice, the fiery furnace into which you can toss your babies in exchange for victory in war”. And thus, we have not accomplished for ourselves what we want to do with AI. Humanity is not aligned with human preferences. This is what failure looks like.
5.) If we fail to align humanity, if we fail to align major governments and corporations, if we don’t even recognize our own misalignment, how big is the chance that we will manage to align AGI with human preferences? Total nuclear war has not been avoided by nuclear technicians who kept perfect control over their inventions—it has been avoided by the fact that the US government in 1945 was reasonably aligned with human preferences. I dare not imagine the world where the Nazi government was the first to get its hands on nuclear weapons.
And thus, I think it would be very, very valuable to put a lot more effort into ‘aligning humanity’. How do we keep our institutions and our grassroots movement “free from Moloch”? How do we get and spread reliable, non-corrupt authorities and politicians? How do we stop falling into multipolar traps, how do we stop suffering unnecessarily?
Best case scenario: this effort will turn out to be vital to AGI alignment
Worst case scenario: this effort will turn out to be irrelevant to AGI alignment, but in the meanwhile, we made the world a much better place
I sadly don’t have time to really introspect what is going in me here, but something about this comment feels pretty off to me. I think in some sense it provides an important counterpoint to the OP, but also, I feel like it also stretches the truth quite a bit:
Toby Ord primarily works on influencing public opinion and governments, and very much seems to view the world through a “raising the sanity waterline” lense. Indeed, I just talked to him last morning where I tried to convince him that misuse risk from AI, and the risk from having the “wrong actor” get the AI is much less than he thinks it is, which feels like a very related topic.
Eliezer has done most of his writing on the meta-level, on the art of rationality, on the art of being a good and moral person, and on how to think about your own identity.
Sam Bankman-Fried is also very active in political activism, and (my guess) is quite concerned about the information landscape. I expect he would hate the terms used in this post, but I expect there to be a bunch of similarities in his model of the world and the one outlined in this post, in terms of trying to raise the sanity waterline and improve the world’s decision-making in a much broader sense (there is a reason why he was one of the biggest contributors to the Clinton and Biden campaigns).
I think it is true that the other three are mostly focusing on object-level questions.
I… also dislike something about the meta-level of arguing from high-status individuals. I expect it to make the discussion worse, and also make it harder for people to respond with counter arguments, because counter arguments arguments could be read as attacking the high-status people, which is scary.
I dislike the language used in the OP, and sure feel like it actively steers attention in unproductive ways that make me not want to engage with it, but I do have a strong sense that it’s going to be very hard to actually make progress on building a healthy field of AI Alignment, because the world will repeatedly try to derail the field into being about defeating the other monkeys, or being another story about why you should work at the big AI companies, or why you should give person X or movement Y all of your money, which feels to me related to what the OP is talking about.
The Sam Bankman Fried reads differently now his massive fraud with FTX is public, might be worth a comment/revision?
I can’t help but see Sam disagreeing with a message as a positive for the message (I know it’s a fallacy, but the feelings still there)
Hmm, I feel like the revision would have to be in Scott’s comment. I was just responding to the names that Scott mentioned, and I think everything I am saying here is still accurate.
Given the link, I think you’re objecting to something I don’t care about. I don’t mean to claim that x-rationality is great and has promise to Save the World. Maybe if more really is possible and we do something pretty different to seriously develop it. Maybe. But frankly I recognize stupefying egregores here too and I don’t expect “more and better x-rationality” to do a damn thing to counter those for the foreseeable future.
So on this point I think I agree with you… and I don’t feel whatsoever dissuaded from what I’m saying.
The rest of what you’re saying feels like it’s more targeting what I care about though:
Right. And as I said in the OP, stupefaction often entails alienation from object-level reality.
It’s also worth noting that LW exists mostly because Eliezer did in fact notice his own stupidity and freaked the fuck out. He poured a huge amount of energy into taking his internal mental weeding seriously in order to never ever ever be that stupid again. He then wrote all these posts to articulate a mix of (a) what he came to realize and (b) the ethos behind how he came to realize it.
That’s exactly the kind of thing I’m talking about.
A deep art of sanity worth honoring might involve some techniques, awareness of biases, Bayesian reasoning, etc. Maybe. But focusing on that invites Goodhart. I think LW suffers from this in particular.
I’m pointing at something I think is more central: casting off systematic stupidity and replacing it with systematic clarity.
And yeah, I’m pretty sure one effect of that would be grounding one’s thinking in the physical world. Symbolic thinking in service to working with real things, instead of getting lost in symbols as some kind of weirdly independently real.
I think you’re objecting to my aesthetic, not my content.
I think it’s well-established that my native aesthetic rubs many (most?) people in this space the wrong way. At this point I’ve completely given up on giving it a paint job to make it more palatable here.
But if you (or anyone else) cares to attempt a translation, I think you’ll find what you’re saying here to be patently false.
Raising the sanity waterline is exactly about fighting and defeating egregores=hypercreatures. Basically every debiasing technique is like that. Singlethink is a call to arms.
The talk about cryonics and xrisk is exactly an orientation to mortality. (Though this clearly was never Eliezer’s speciality and no one took up the mantle in any deep way, so this is still mostly possessed.)
The whole arc about free will is absolutely about identity. Likewise the stuff on consciousness and p-zombies. Significant chunks of the “Mysterious Answers to Mysterious Questions” Sequence were about how people protect their identities with stupidity and how not to fall prey to that.
The “cultivate their will” part confuses me. I didn’t mean to suggest doing that. I think that’s anti-helpful for the most part. Although frankly I think all the stuff about “Tsuyoku Naritai!” and “Shut up and do the impossible” totally fits the bill of what I imagine when reading your words there… but yeah, no, I think that’s just dumb and don’t advise it.
Although I very much do think that getting damn clear on what you can and cannot do is important, as is ceasing to don responsibility for what you can’t choose and fully accepting responsibility for what you can. That strikes me as absurdly important and neglected. As far as I can tell, anyone who affects anything for real has to at least stumble onto an enacted solution for this in at least some domain.
You seem to be weirdly strawmanning me here.
The trucking thing was a reverence to Zvi’s repeated rants about how politicians didn’t seem to be able to think about the physical world enough to solve the Canadian Trucker Convoy clog in Toronto. I bet that Holden, Eliezer, Paul, etc. would all do massively better than average at sorting out a policy that would physically work. And if they couldn’t, I would worry about their “contributions” to alignment being more made of social illusion than substance.
Things like plumbing are physical skills. So is, say, football. I don’t expect most football players to magically be better at plumbing. Maybe some correlation, but I don’t really care.
But I do expect that someone who’s mastering a general art of sanity and clarity to be able to think about plumbing in practical physical terms. Instead of “The dishwasher doesn’t work” and vaguely hoping the person with the right cosmic credentials will cast a magic spell, there’d be a kind of clarity about what Gears one does and doesn’t understand, and turning to others because you see they see more relevant Gears.
If the wizards you’ve named were no better than average at that, then that would also make me worry about their “contributions” to alignment.
I totally agree. Watching this dynamic play out within CFAR was a major factor in my checking out from it.
That’s part of what I mean by “this space is still possessed”. Stupefaction still rules here. Just differently.
I think you and I are imagining different things.
I don’t think a LW or CFAR or MIRI flavored project that focuses on thinking about egregores and designing counters to stupefaction is promising. I think that’d just be a different flavor of the same stupid sauce.
(I had different hopes back in 2016, but I’ve been thoroughly persuaded otherwise by now.)
I don’t mean to prescribe a collective action solution at all, honestly. I’m not proposing a research direction. I’m describing a problem.
The closest thing to a solution-shaped object I’m putting forward is: Look at the goddamned question.
Part of what inspired me to write this piece at all was seeing a kind of blindness to these memetic forces in how people talk about AI risk and alignment research. Making bizarre assertions about what things need to happen on the god scale of “AI researchers” or “governments” or whatever, roughly on par with people loudly asserting opinions about what POTUS should do.
It strikes me as immensely obvious that memetic forces precede AGI. If the memetic landscape slants down mercilessly toward existential oblivion here, then the thing to do isn’t to prepare to swim upward against a future avalanche. It’s to orient to the landscape.
If there’s truly no hope, then just enjoy the ride. No point in worrying about any of it.
But if there is hope, it’s going to come from orienting to the right question.
And it strikes me as quite obvious that the technical problem of AI alignment isn’t that question. True, it’s a question that if we could answer it might address the whole picture. But that’s a pretty damn big “if”, and that “might” is awfully concerning.
I do feel some hope about people translating what I’m saying into their own way of thinking, looking at reality, and pondering. I think a realistic solution might organically emerge from that. Or rather, what I’m doing here is an iteration of this solution method. The process of solving the Friendliness problem in human culture has the potential to go superexponential since (a) Moloch doesn’t actually plan except through us and (b) the emergent hatching Friendly hypercreature(s) would probably get better at persuading people of its cause as more individuals allow it to speak through them.
But that’s the wrong scale for individuals to try anything on.
I think all any of us can actually do is try to look at the right question, and hold the fact that we care about having an answer but don’t actually have one.
Does that clarify?
Maybe. It might be that if you described what you wanted more clearly, it would be the same thing that I want, and possibly I was incorrectly associating this with the things at CFAR you say you’re against, in which case sorry.
But I still don’t feel like I quite understand your suggestion. You talk of “stupefying egregores” as problematic insofar as they distract from the object-level problem. But I don’t understand how pivoting to egregore-fighting isn’t also a distraction from the object-level problem. Maybe this is because I don’t understand what fighting egregores consists of, and if I knew, then I would agree it was some sort of reasonable problem-solving step.
I agree that the Sequences contain a lot of useful deconfusion, but I interpret them as useful primarily because they provide a template for good thinking, and not because clearing up your thinking about those things is itself necessary for doing good work. I think of the cryonics discussion the same way I think of the Many Worlds discussion—following the motions of someone as they get the right answer to a hard question trains you to do this thing yourself.
I’m sorry if “cultivate your will” has the wrong connotations, but you did say “The problem that’s upstream of this is the lack of will”, and I interpreted a lot of your discussion of de-numbing and so on as dealing with this.
The claim “memetic forces precede AGI” seems meaningless to me, except insofar as memetic forces precede everything (eg the personal computer was invented because people wanted personal computers and there was a culture of inventing things). Do you mean it in a stronger sense? If so, what sense?
I also don’t understand why it’s wrong to talk about what “AI researchers” or “governments” should do. Sure, it’s more virtuous to act than to chat randomly about stuff, but many Less Wrongers are in positions to change what AI researchers do, and if they have opinions about that, they should voice them. This post of yours right now seems to be about what “the rationalist community” should do, and I don’t think it’s a category error for you to write it.
Maybe this would easier if you described what actions we should take conditional on everything you wrote being right.
There’s also the skulls to consider. As far as I can tell, this post’s recommendations are that we, who are already in a valley littered with a suspicious number of skulls,
https://forum.effectivealtruism.org/posts/ZcpZEXEFZ5oLHTnr9/noticing-the-skulls-longtermism-edition
https://slatestarcodex.com/2017/04/07/yes-we-have-noticed-the-skulls/
turn right towards a dark cave marked ‘skull avenue’ whose mouth is a giant skull, and whose walls are made entirely of skulls that turn to face you as you walk past them deeper into the cave.
The success rate of movments aimed at improving the longterm future or improving rationality has historically been… not great but there’s at least solid concrete emperical reasons to think specific actions will help and we can pin our hopes on that.
The success rate of, let’s build a movement to successfully uncouple ourselves from society’s bad memes and become capable of real action and then our problems will be solvable, is 0. Not just in that thinking that way didn’t help but in that with near 100% success you just end up possessed by worse memes if you make that your explicit final goal (rather than ending up doing that as a side effect of trying to get good at something). And there’s also no concrete paths to action to pin our hopes on.
“The success rate of, let’s build a movement to successfully uncouple ourselves from society’s bad memes and become capable of real action and then our problems will be solvable, is 0.“
I’m not sure if this is an exact analog, but I would have said the scientific revolution and the age of enlightenment were two (To be honest, I’m not entirely sure where one ends and the other begins, and there may be some overlap, but I think of them as two separate but related things) pretty good examples of this that resulted in the world becoming a vastly better place, largely through the efforts of individuals who realized that by changing the way we think about things we can better put to use human ingenuity. I know this is a massive oversimplification, but I think it points in the direction of there potentially being value in pushing the right memes onto society.
The success rate of developing and introducing better memes into society is indeed not 0. The key thing there is that the scientific revolutionaries weren’t just as an abstract thinking “we must uncouple from society first, and then we’ll know what to do”. Rather, they wanted to understand how objects fell, how animals evolved and lots of other specific problems and developed good memes to achieve those ends.
I’m by no means an expert on the topic, but I would have thought it was a result of both object-level thinking producing new memes that society recognized as true, but also some level of abstract thinking along the lines of “using God and the Bible as an explanation for every phenomenon doesn’t seem to be working very well, maybe we should create a scientific method or something.”
I think there may be a bit of us talking past each other, though. From your response, perhaps what I consider “uncoupling from society’s bad memes” you consider to be just generating new memes. It feels like generally a conversation where it’s hard to pin down what exactly people are trying to describe (starting from the OP, which I find very interesting, but am still having some trouble understanding specifically) which is making it a bit hard to communicate.
Now that I’ve had a few days to let the ideas roll around in the back of my head, I’m gonna take a stab at answering this.
I think there are a few different things going on here which are getting confused.
1) What does “memetic forces precede AGI” even mean?
“Individuals”, “memetic forces”, and “that which is upstream of memetics” all act on different scales. As an example of each, I suggest “What will I eat for lunch?”, “Who gets elected POTUS?”, and “Will people eat food?”, respectively.
“What will I eat for lunch?” is an example of an individual decision because I can actually choose the outcome there. While sometimes things like “veganism” will tell me what I should eat, and while I might let that have influence me, I don’t actually have to. If I realize that my life depends on eating steak, I will actually end up eating steak.
“Who gets elected POTUS” is a much tougher problem. I can vote. I can probably persuade friends to vote. If I really dedicate myself to the cause, and I do an exceptionally good job, and I get lucky, I might be able to get my ideas into the minds of enough people that my impact is noticeable. Even then though, it’s a drop in the bucket and pretty far outside my ability to “choose” who gets elected president. If I realize that my life depends on a certain person getting elected who would not get elected without my influence… I almost certainly just die. If a popular memeplex decides that a certain candidate threatens it, that actually can move enough people to plausibly change the outcome of an election.
However there’s a limitation to which memeplexes can become dominant and what they can tell people to do. If a hypercreature tells people to not eat meat, it may get some traction there. If it tries to tell people not to eat at all, it’s almost certainly going to fail and die. Not only will it have a large rate of attrition from adherents dying, but it’s going to be a real hard sell to get people to take its ideas on, and therefore it will have a very hard time spreading.
My reading of the claim “memetic forces precede AGI” is that like getting someone elected POTUS, the problem is simply too big for there to be any reasonable chance that a few guys in a basement can just go do it on their own when not supported by friendly hypercreatures. Val is predicting that our current set of hypercreatures won’t allow that task to be possible without superhuman abilities, and that our only hope is that we end up with sufficiently friendly hypercreatures that this task becomes humanly possible. Kinda like if your dream was to run an openly gay weed dispensary, it’s humanly possible today, but not so further in the past or in Saudi Arabia today; you need that cultural support or it ain’t gonna happen.
2) “Fight egregores” sure sounds like “trying to act on the god level” if anything does. How is this not at least as bad as “build FAI”? What could we possibly do which isn’t foolishly trying to act above our level?
This is a confusing one, because our words for things like “trying” are all muddled together. I think basically, yes, trying to “fight egregores” is “trying to act on the god level”, and is likely to lead to problems. However, that doesn’t mean you can’t make progress against egregores.
So, the problem with “trying to act on a god level” isn’t so much that you’re not a god and therefore “don’t have permission to act on this level” or “ability to touch this level”, it’s that you’re not a god and therefore attempting to act as if you were a god fundamentally requires you to fail to notice and update on that fact. And because you’re failing to update, you’re doing something that doesn’t make sense in light of the information at hand. And not just any information either; it’s information that’s telling you that what you’re trying to do will not work. So of course you’re not going to get where you want if you ignore the road signs saying “WRONG WAY!”.
What you can do, which will help free you from the stupifying factors and unfriendly egregores, and (Val claims) will have the best chance of leading to a FAI, is to look at what’s true. Rather than “I have to do this, or we all die! I must do the impossible”, just “Can I do this? Is it impossible? If so, and I’m [likely] going to die, I can look at that anyway. Given what’s true, what do I want to do?”
If this has a ”...but that doesn’t solve the problem” bit to it, that’s kinda the point. You don’t necessarily get to solve the problem. That’s the uncomfortable thing we should not flinch away from updating on. You might not be able to solve the problem. And then what?
(Not flinching from these things is hard. And important)
3) What’s wrong with talking about what AI researchers should do? There’s actually a good chance they listen! Should they not voice their opinions on the matter? Isn’t that kinda what you’re doing here by talking about what the rationality community should do?
Yes. Kinda. Kinda not.
There’s a question of how careful one has to be, and Val is making a case for much increased caution but not really stating it this way explicitly. Bear with me here, since I’m going to be making points that necessarily seem like “unimportant nitpicking pedantry” relative to an implicit level of caution that is more tolerant to rounding errors of this type, but I’m not actually presupposing anything here about whether increased caution is necessary in general or as it applies to AGI. It is, however, necessary in order to understand Val’s perspective on this, since it is central to his point.
If you look closely, Val never said anything about what the rationality community “should” do. He didn’t use the word “should” once.
He said things like “We can’t align AGI. That’s too big.” and “So, I think raising the sanity waterline is upstream of AI alignment.” and “We have an advantage in that this war happens on and through us. So if we take responsibility for this, we can influence the terrain and bias egregoric/memetic evolution to favor Friendlines”. These things seem to imply that we shouldn’t try to align AGI and should instead do something like “take responsibility” so we can “influcence the terrain and bias egregoric/memetic evolution to favor friendliness”, and as far as rounding errors go, that’s not a huge one. However, he did leave the decision of what to do with the information he presented up to you, and consciously refrained from imbuing it with any “shouldness”. The lack of “should” in his post or comments is very intentional, and is an example of him doing the thing he views as necessary for FAI to have a chance of working out.
In (my understanding of) Val’s perspective, this “shouldness” is a powerful stupifying factor that works itself into everything—if you let it. It prevents you from seeing the truth, and in doing so blocks you from any path which might succeed. It’s so damn seductive and self protecting that we all get drawn into it all the time and don’t really realize—or worse, rationalize and believe that “it’s not really that big a deal; I can achieve my object level goals anyway (or I can’t anyway, and so it makes no difference if I look)”. His claim is that it is that big a deal, because you can’t achieve your goals—and that you know you can’t, which is the whole reason you’re stuck in your thoughts of “should” in the first place. He’s saying that the annoying effort to be more precise about what exactly we are aiming to share and holding ourselves to be squeaky clean from any “impotent shoulding” at things is actually a necessary precondition for success. That if we try to “Shut up and do the impossible”, we fail. That if we “Think about what we should do”, we fail. That if we “try to convince people”, even if we are right and pointing at the right thing, we fail. That if we allow ourselves to casually “should” at things, instead of recognizing it as so incredibly dangerous as to avoid out of principle, we get seduced into being slaves for unfriendly egregores and fail.
That last line is something I’m less sure Val would agree with. He seems to be doing the “hard line avoid shoulding, aim for maximally clean cognition and communication” thing and the “make a point about doing it to highlight the difference” thing, but I haven’t heard him say explicitly that he thinks it has to be a hard line thing.
And I don’t think it does, or should be (case in point). Taking a hard line can be evidence of flinching from a different truth, or a lack of self trust to only use that way of communicating/relating to things in a productive way. I think by not highlighting the fact that it can be done wisely, he clouds his point and makes his case less compelling than it could be. However, I do think he’s correct about it being both a deceptively huge deal and also something that takes a very high level of caution before you start to recognize the issues with lower levels of caution.
I feel seen. I’ll tweak a few details here & there, but you have the essence.
Thank you.
Agreed.
Two details:
“…we should not flinch away…” is another instance of the thing. This isn’t just banishing the word “should”: the ability not to flinch away from hard things is a skill, and trying to bypass development of that skill with moral panic actually makes everything worse.
The orientation you’re pointing at here biases one’s inner terrain toward Friendly superintelligences. It’s also personally helpful and communicable. This is an example of a Friendly meme that can give rise to a Friendly superintelligence. So while sincerely asking “And then what?” is important, as is holding the preciousness of the fact that we don’t yet have an answer, that is enough. We don’t have to actually answer that question to participate in feeding Friendliness in the egregoric wars. We just have to sincerely ask.
Admittedly I’m not sure either.
Generally speaking, viewing things as “so incredibly dangerous as to avoid out of principle” ossifies them too much. Ossified things tend to become attack surfaces for unFriendly superintelligences.
In particular, being scared of how incredibly dangerous something is tends to be stupefying.
But I do think seeing this clearly naturally creates a desire to be more clear and to drop nearly all “shoulding” — not so much the words as the spirit.
(Relatedly: I actually didn’t know I never used the word “should” in the OP! I don’t actually have anything against the word per se. I just try to embody this stuff. I’m delighted to see I’ve gotten far enough that I just naturally dropped using it this way.)
I’m not totally sure I follow. Do you mean a hard line against “shoulding”?
If so, I mostly just agree with you here.
That said, I think trying to make my point more compelling would in fact be an example of the corruption I’m trying to purify myself of. Instead I want to be correct and clear. That might happen to result in what I’m saying being more compelling… but I need to be clean of the need for that to happen in order for it to unfold in a Friendly way.
However. I totally believe that there’s a way I could have been clearer.
And given how spot-on the rest of what you’ve been saying feels to me, my guess is you’re right about how here.
Although admittedly I don’t have a clear image of what that would have looked like.
Doh! Busted.
Thanks for the reminder.
Agreed.
Good point. Agreed, and worth pointing out explicitly.
Yes. You don’t really need it, things tend to work better without it, and the fact no one even noticed that that it didn’t show up in this post is a good example of that. At the same time, “I shouldn’t ever use ‘should’” obviously has the exact same problems, and it’s possible to miss that you’re taking that stance if you don’t ever say it out loud. I watched some of your videos after Kaj linked one, and… it’s not that it looked like you were doing that, but it looked like you might be doing that. Like there wasn’t any sort of self caricaturing or anything that showed me that “Val is well aware of this failure mode, and is actively steering clear”, so I couldn’t rule it out and wanted to mark it as a point of uncertainty and a thing you might want to watch out for.
Ah, but I never said you should try to make your point more compelling! What do you notice when you ask yourself why “X would have effect Y” led you to respond with a reason to not do X? ;)
Don’t have the time to write a long comment just now, but I still wanted to point out that describing either Yudkowsky or Christiano as doing mostly object-level research seems incredibly wrong. So much of what they’re doing and have done focused explicitly on which questions to ask, which question not to ask, which paradigm to work in, how to criticize that kind of work… They rarely published posts that are only about the meta-level (although Arbital does contain a bunch of pages along those lines and Prosaic AI Alignment is also meta) but it pervades their writing and thinking.
More generally, when you’re creating a new field of science of research, you tend to do a lot of philosophy of science type stuff, even if you don’t label it explicitly that way. Galileo, Carnot, Darwin, Boltzmann, Einstein, Turing all did it.
(To be clear, I’m pointing at meta-stuff in the sense of “philosophy of science for alignment” type things, not necessarily the more hardcore stuff discussed in the original post)
That’s true, but if you are doing philosophy it is better to admit to it, and learn from existing philosophy, rather than deriding and dismissing the whole field.
This seems irrelevant to the point, yes? I think adamShimi is challenging Scott’s claim that Paul & Eliezer are mostly focusing on object-level questions. It sounds like you’re challenging whether they’re attending to non-object-level questions in the best way. That’s a different question. Am I missing your point?
Eliezer, at least, now seems quite pessimistic about that object-level approach. And in the last few months he’s been writing a ton of fiction about introducing a Friendly hypercreature to an unfriendly world.
Perhaps I have missed it, but I’m not aware that Sam has funded any AI alignment work thus far.
If so this sounds like giving him a large amount of credit in advance of doing the work, which is generous but not the order credit allocation should go.
My attempt to break down the key claims here:
The internet is causing rapid memetic evolution towards ideas which stick in people’s minds, encourage them to take certain actions, especially ones that spread the idea. Ex: wokism, Communism, QAnon, etc
These memes push people who host them (all of us, to be clear) towards behaviors which are not in the best interests of humanity, because Orthogonality Thesis
The lack of will to work on AI risk comes from these memes’ general interference with clarity/agency, plus selective pressure to develop ways to get past “immune” systems which allow clarity/agency
Before you can work effectively on AI stuff, you have to clear out the misaligned memes stuck in your head. This can get you the clarity/agency necessary, and make sure that (if successful) you actually produce AGI aligned with “you”, not some meme
The global scale is too big for individuals—we need memes to coordinate us. This is why we shouldn’t try and just solve x-risk, we should focus on rationality, cultivating our internal meme garden, and favoring memes which will push the world in the direction we want it to go
Putting this in a separate comment, because Reign of Terror moderation scares me and I want to compartmentalize. I am still unclear about the following things:
Why do we think memetic evolution will produce complex/powerful results? It seems like the mutation rate is much, much higher than biological evolution.
Valentine describes these memes as superintelligences, as “noticing” things, and generally being agents. Are these superintelligences hosted per-instance-of-meme, with many stuffed into each human? Or is something like “QAnon” kind of a distributed intelligence, doing its “thinking” through social interactions? Both of these models seem to have some problems (power/speed), so maybe something else?
Misaligned (digital) AGI doesn’t seem like it’ll be a manifestation of some existing meme and therefore misaligned, it seems more like it’ll just be some new misaligned agent. There is no highly viral meme going around right now about producing tons of paperclips.
I really appreciate your list of claims and unclear points. Your succinct summary is helping me think about these ideas.
A few examples came to mind: sports paraphernalia, tabletop miniatures, and stuffed animals (which likely outnumber real animals by hundreds or thousands of times).
One might argue that these things give humans joy, so they don’t count. There is some validity to that. AI paperclips are supposed to be useless to humans. On the other hand, one might also argue that it is unsurprising that subsystems repurposed to seek out paperclips derive some ‘enjoyment’ from the paperclips… but I don’t think that argument will hold water for these examples. Looking at it another way, some amount of paperclips are indeed useful.
No egregore has turned the entire world to paperclips just yet. But of course that hasn’t happened, else we would have already lost.
Even so: consider paperwork (like the tax forms mentioned in the post), skill certifications in the workplace, and things like slot machines and reality television. A lot of human effort is wasted on things humans don’t directly care about, for non-obvious reasons. Those things could be paperclips.
(And perhaps some humans derive genuine joy out of reality television, paperwork, or giant piles of paperclips. I don’t think that changes my point that there is evidence of egregores wasting resources.)
I think the point under contention isn’t whether current egregores are (in some sense) “optimizing” for things that would score poorly according to human values (they are), but whether the things they’re optimizing for have some (clear, substantive) relation to the things a misaligned AGI will end up optimizing for, such that an intervention on the whole egregores situation would have a substantial probability of impacting the eventual AGI.
To this question I think the answer is a fairly clear “no”, though of course this doesn’t invalidate the possibility that investigating how to deal with egregores may result in some non-trivial insights for the alignment problem.
I agree with you.
I also don’t think it matters whether the AGI will optimize for something current egregores care about.
What matters is whether current egregores will in fact create AGI.
The fear around AI risk is that the answer is “inevitably yes”.
The current egregores are actually no better at making AGI egregore-aligned than humans are at making it human-aligned.
But they’re a hell of a lot better at making AGI accidentally, and probably at all.
So if we don’t sort out how to align egregores, we’re fucked — and so are the egregores.
I think I see what you mean. A new AI won’t be under the control of egregores. It will be misaligned to them as well. That makes sense.
Doesn’t the second part answer the first? I mean, the reason biological evolution matters is because its mutation rate massively outstrips geological and astronomical shifts. Memetic evolution dominates biological evolution for the same reason.
Also, just empirically: memetic evolution produced civilization, social movements, Crusades, the Nazis, etc.
I wonder if I’m just missing your question.
Both.
I wonder if you’re both (a) blurring levels and (b) intuitively viewing these superintelligences as having some kind of essence that either is or isn’t in someone.
What is or isn’t a “meme” isn’t well defined. A catch phrase (e.g. “Black lives matter!”) is totally a meme. But is a religion a meme? Is it more like a collection of memes? If so, what exactly are its constituent memes? And with catch phrases, most of them can’t survive without a larger memetic context. (Try getting “Black lives matter!” to spread through an isolated Amazonian tribe.) So should we count the larger memetic context as part of the meme?
But if you stop trying to ask what is or isn’t a meme and you just look at the phenomenon, you can see something happening. In the BLM movement, the phrase “Silence is violence” evolved and spread because it was evocative and helped the whole movement combat opposition in a way that supported its egregoric possession.
So… where does the whole BLM superorganism live? In its believers and supporters, sure. But also in its opponents. (Think of how folk who opposed BLM would spread its claims in order to object to them.) Also on webpages. Billboards. Now in Hollywood movies. And it’s always shifting and mutating.
The academic field of memetics died because they couldn’t formally define “meme”. But that’s backwards. Biology didn’t need to formally define life to recognize that there’s something to study. The act of studying seems to make some definitions more possible.
That’s where we’re at right now. Egregoric zoology, post Darwin but pre Watson & Crick.
I quite agree. I didn’t mean to imply otherwise.
The thing is, unFriendly hypercreatures aren’t thinking about aligning AI to hypercreatures either. They have very little foresight.
(This is an artifact of how most unFriendly egregores do their thing via stupefaction. Most possessed people can’t think about the future because it’s too real and involves things like their personal death. They instead think about symbolic futures and get sideswiped when reality predictably doesn’t go according to their plans. So since unFriendly hypercreatures use stupefied minds to plan, they end up having trouble with long futures, ergo unable to sanely orient to real-world issues that in fact screw them over.)
I think these hypercreatures will get just as shocked as the rest of us when AGI comes online.
The thing is, the pathway by which something like AGI actually destroys us is some combo of (a) getting a hold of real-world systems like nukes and (b) hacking human minds to do its bidding. Both of these are already happening via unFriendly hypercreature evolution, and for exactly the same reasons that folk are fearing AI risk.
The creation of digital AGI just finishes moving the substrate off of humans, at which point the emergent unFriendly superintelligence no longer has any reason to care about human bodies or minds. At that point we lose all leverage.
That’s why I’m looking at the current situation and saying “Hey guys, I think you’re missing what’s actually happening here. We’re already in AI takeoff, and you’re fixated on the moment we lose all control instead of on this moment where we still have some.”
I think of the step to AGI as the final one, when some egregore figures out how to build a memetic nuke but doesn’t realize it’ll burn everything.
So, no magical meme transforming into a digital form.
(Although it’s some company or whatever that will specify to the AGI “Make paperclips” or whatever. God forbid some corporate egregore builds an AGI to “maximize profit”.)
Faster mutation rate doesn’t just produce faster evolution—it also reduces the steady-state fitness. Complex machinery can’t reliably be evolved if pieces of it are breaking all the time. I’m mostly relying No Evolutions for Corporations or Nanodevices plus one undergrad course in evolutionary bio here.
Thank you for pointing this out. I agree with the empirical observation that we’ve had some very virulent and impactful memes. I’m skeptical about saying that those were produced by evolution rather than something more like genetic drift, because of the mutation-rate argument. But given that observation, I don’t know if it matters if there’s evolution going on or not. What we’re concerned with is the impact, not the mechanism.
I think at this point I’m mostly just objecting to the aesthetic and some less-rigorous claims that aren’t really important, not the core of what you’re arguing. Does it just come down to something like:
“Ideas can be highly infectious and strongly affect behavior. Before you do anything, check for ideas in your head which affect your behavior in ways you don’t like. And before you try and tackle a global-scale problem with a small-scale effort, see if you can get an idea out into the world to get help.”
I like this, thank you.
I score this as “Good enough that I debated not bothering to correct anything.”
I think some corrections might be helpful though:
While I think that’s true, that’s not really central to what I’m saying. I think these forces have been the main players for way, way longer than we’ve had an internet. The internet — like every other advance in communication — just increased evolutionary pressure at the memetic level by bringing more of these hypercreatures into contact with one another and with resources they could compete for.
Yes. I’d just want to add that not all of them do. It’s just that the ones that tend to dominate tend to be unFriendly.
Two counterexamples:
Science. Not as an establishment, but as a kind of clarifying intelligence. This strikes me as a Friendly hypercreature. (The ossified practices of science, like “RCTs are the gold standard” and “Here’s the Scientific Method!”, tend to pull toward stupidity via Goodhart. A lot of LW is an attempt to reclaim the clarifying influence of this hypercreature’s intelligence.)
Jokes. These are sort of like innocuous memetic insects. As long as they don’t create problems for more powerful hypercreatures, they can undergo memetic evolution and spread. They aren’t particularly Friendly or unFriendly for the most part. Some of them add a little value via humor, although that’s not what they’re optimizing for. (The evolutionary pressure on jokes is “How effectively does hearing this joke cause the listener to faithfully repeat it?”). But if a joke were to somehow evolve into a more coherent behavior-controlling egregore, by default it’ll be an unFriendly one.
Almost. I think it’s more important that you have installed a system for noticing and weeding out these influences.
Like how John Vervaeke argues that the Buddha’s Eightfold Noble Path is a kind of virtual engine for creating relevant insight. The important part isn’t the insight but is instead the engine. Because the same processes that create insight also create delusion, so you need a systematic way of course-correcting.
No correction here. I just wanted to say, this is a delightfully clear way of saying what I meant.
While I agree (both with the claim and with the fact that this is what I said), when I read you saying it I worry about an important nuance getting lost.
The emphasis here should be on “solve”, not “x-risk”. Solving xrisk is superhuman. So is xrisk itself for that matter. “God scale.”
However! Friendly hypercreatures need our minds in order to think. In order for a memetic strategy to result in solving AI risk, we need to understand the problem. We need to see its components clearly.
So I do think it helps to model xrisk. See its factors. See its Gears. See the landscape it’s embedded in.
Sort of like, a healthy marriage is more likely to emerge if both people make an effort to understand themselves, each other, and their dynamic within a context of togetherness and mutual care. But neither person is actually responsible for creating a healthy marriage. It sort of emerges organically from mutual open willingness plus compatibility.
FWIW, this part sounds redundant to me. A “rationality” that is something like a magical completion of the Art would, as far as I can tell, consist almost entirely of consciously cultivating one’s internal memetic garden, which is nearly the same thing as favoring Friendly memes.
But after reading and replying to Scott’s comment, I’d adjust a little bit in the OP. For basically artistic reasons I mentioned “rationality for its own sake, period.” But I now think that’s distracting. What I’m actually in favor of is memetic literacy by whatever name. I think there’s an important art here whose absence causes people to focus on AI risk in unhelpful and often anti-helpful ways.
Also, on this part:
I want to emphasize that best as I can figure, we don’t have control over that. That’s more god-scale stuff. What each of us can do is notice what seems clarifying and kind to ourselves and to lean that way. I think there’s some delightful game theory that suggests that doing this supports Friendly hypercreatures.
And if not, I think we’re just fucked.
Im not entirely convinced. Memes are parasites, and thus, aim for equilibrium with its host. Hence why memeplexes that are truly evil and omnicidal never stick, memeplexes that are relatively evil peter out, and what we are left with are memeplexes that “kinda suck I guess” at worst. Succesful memeplex is one that ensures the host’s survival while forcing the host to spend maximum energy and resources spreading the memeplex without harming themselves too badly.
but the memeplexes can, at times, resist the growth of more accurate memeplexes which would ensure host survival better, because agency of the memetic networks and agency of the neural and genetic networks need not be aimed anywhere good, or even necessarily anywhere coherent in particular at times of high mutation. Notably, memeplexes that promote death and malice are more common in the presence of high rates of death and malice; death and malice are themselves self-propagating memetic diseases, in addition to whatever underlying mechanistic diseases might be causing them.
Of course, but IMHO they cannot do it for long, at least not on civilizational time scales. Memeplexes that ensure host survival better, and atop of that, empower the hosts, ultimately always win.
As of yet, we do not have any Deus Ex Machina to help the memeplexes exist without a host, or spread without the host being more powerful (physically, politically, socially, scientifically, technologically etc) than the hosts of other memeplexes. Over time, the memetic landscape tends to average out to begrudgingly positive and progressive, because memeplexes that fail to push the hosts forward are outcompeted.
One of the best examples of that is the memeplex of Far Right/Nazi/Fascist ideology, which, while memetically robust, tends to shoot itself in the foot and lose the memetic warfare without much coherent opposition from the liberal memeplexes. It resurfaces all the time, but never accomplishes much, because it is more host-detrimental than it is virulent. Meanwhile, memeplexes tht are kinda-sorta wishy-washy slightly Left of center, egalitarian-ish but not too much, vaguely pro-science and mildly technological, progressive-ish but unobtrusively, tend to always win, and had been winning since the times of Babylon. They struck the perfect balance between memetic frugality, virulence, and benefiting the hosts.
Yeah, I see we’re thinking on similar terms. I was in fact thinking specifically of the pattern of authoritarian, hyper-destructive memeplexes occasionally coming back up, growing fast, and then suddenly collapsing, repeatedly; sometimes doing huge amounts of damage when this occurs.
I don’t think we disagree, I was just expressing another rotation of what seems to already be your perspective.
I think there’s an important difference Valentine tries to make with respect to your fourth bullet (and if not, I will make). You perhaps describe the right idea, but the wrong shape. The problem is more like “China and the US both have incentives to bring about AGI and don’t have incentives towards safety.” Yes deflecting at the last second with some formula for safe AI will save you, but that’s as stupid as jumping away from a train at the last second. Move off the track hours ahead of time, and just broker a peace between countries to not make AGI.
Ah, so on this view, the endgame doesn’t look like
“make technical progress until the alignment tax is low enough that policy folks or other AI-risk-aware people in key positions will be able to get an unaware world to pay it”
But instead looks more like
“get the world to be aware enough to not bumble into an apocalypse, specifically by promoting rationality, which will let key decision-makers clear out the misaligned memes that keep them from seeing clearly”
Is that a fair summary? If so, I’m pretty skeptical of the proposed AI alignment strategy, even conditional on this strong memetic selection and orthogonality actually happening. It seems like this strategy requires pretty deeply influencing the worldview of many world leaders. That is obviously very difficult because no movement that I’m aware of has done it (at least, quickly), and I think they all would like to if they judged it doable. Importantly, the reduce-tax strategy requires clarifying and solving a complicated philosophical/technical problem, which is also very difficult. I think it’s more promising for the following reasons:
It has a stronger precedent (historical examples I’d reference include the invention of computability theory, the invention of information theory and cybernetics, and the adventures in logic leading up to Godel)
It’s more in line with rationalists’ general skill set, since the group is much more skewed towards analytical thinking and technical problem-solving than towards government/policy folks and being influential among those kinds of people
The number of people we would need to influence will go up as AGI tech becomes easier to develop, and every one is a single point of failure.
To be fair, these strategies are not in a strict either/or, and luckily use largely separate talent pools. But if the proposal here ultimately comes down to moving fungible resources towards the become-aware strategy and away from the technical-alignment strategy, I think I (mid-tentatively) disagree
It seems to me that in 2020 the world was changed relatively quickly. How many events in history was able to shift every mind on the planet within 3 months? If it only takes 3 months to occupy the majority of focus then you have a bounds for what a Super Intelligent Agent may plan for.
What is more concerning and also interesting is that such an intelligence can make something appear to be for X but it’s really planning for Y. So misdirection and ulterior motive is baked into this theory gaming. Unfortunately this can lead to a very schizophrenic inspection of every scenario as if strategically there is intention to trigger infinite regress on scrutiny.
When we’re dealing with these Hyperobjects/Avatars/Memes we can’t be certain that we understand the motive.
Given that we can’t understand the motive of any external meme, perhaps the only right path is to generate your own and propagate that solely?
A sketch of solution that doesn’t involve (traditional) world leaders could look like “Software engineers get together and agree that the field is super fucked, and start imposing stronger regulations and guidelines like traditional engineering disciplines use but on software.” This is a way of lowering the cost of alignment tax in the sense that, if software engineers all have a security mindset, or have to go through a security review, there is more process and knowledge related to potential problems and a way of executing a technical solution at the last moment. However, this description is itself is entirely political not technical, yet easily could not reach the awareness of world leaders or the general populace.
Two points:
I have more hope than you here. I think we’re seeing Friendly memetic tech evolving that can change how influence comes about. The key tipping point isn’t “World leaders are influenced” but is instead “The Friendly memetic tech hatches a different way of being that can spread quickly.” And the plausible candidates I’ve seen often suggest it’ll spread superexponentially.
This is upstream of making the technical progress and right social maneuvers anyway. There’s insufficient collective will to do enough of the right kind of alignment research. Trying anyway mostly adds to the memetic dumpster fire we’re all in. So unless you have a bonkers once-in-an-aeon brilliant Messiah-level insight, you can’t do this first.
Wait, literally evolving? How? Coincidence despite orthogonality? Did someone successfully set up an environment that selects for Friendly memes? Or is this not literally evolving, but more like “being developed”?
Whoa! I would love to hear more about these plausible candidates.
I parse this second point as something like “alignment is hard enough that you need way more quality-adjusted research-years (QARY’s?) than the current track is capable of producing. This means that to have any reasonable shot at success, you basically have to launch a Much larger (but still aligned) movement via memetic tech, or just pray you’re the messiah and can singlehandedly provide all the research value of that mass movement.”. That seems plausible, and concerning, but highly sensitive to difficulty of alignment problem—which I personally have practically zero idea how to forecast.
I ~entirely agree with you.
At some point (maybe from the beginning?), humans forgot the raison d’etre of capitalism — encourage people to work towards the greater good in a scalable way. It’s a huge system that has fallen prey to Goodhart’s Law, where a bunch of Powergamers have switched from “I should produce the best product in order to sell the most” to “I should alter the customer‘s mindset so that they want my (maybe inferior) product”. And the tragedy of the commons has forced everyone to follow suit.
Not only that, the system that could stand in the way — the government — has been captured by the same forces. A picture of an old man wearing mittens that was shared millions of times likely had a larger impact on how people vote than actual action or policy.
I don’t know what to do about these things. I’ve tried hard to escape the forces myself, but it’s a constant battle to not be drawn back in. The thing I’d recommend to anyone else willing to try is to think of who your enemy is, and work hard to understand their viewpoint and how they came to it. For most people in the US, I imagine it’s the opposite political party. You’ll probably realize that theirs is built on sand — then turn that eye to yourself, and hopefully realize that your in-group is too.
Relatedly, I’ve been wondering lately how much of modern society is built totally on “feeling superior”. Superhero movies, political gotchas, the subreddits that make fun of people, the subreddits that boast your own team, 90% of the memes out there; All of these feel like they’re targeting almost the same human emotion — to feel superior or important (or like you belong to a group that is).
Random aside: If you like Sci-Fi, you should take a look at “Lady Of Mazes”. It’s the only post-scarcity book that feels weird enough to be even somewhat probable. And I don’t wanna spoil it, but there’s a large of the book that relates very closely to your post.
Yep.
The USA Constitution was an attempt to human-align an egregore.
But it was done in third person, and it wasn’t mathematically perfect, so of course egregoric evolution found loopholes.
Thank you! By Karl Schroeder?
I second the recommendation of Lady of Mazes (by Karl Schroeder, yes).
I third the recommendation.
I buy that book from any used bookstore I find it in, and then give it to people who can think and who are working on the future. I’m not sure if this has actually has ever moved the needle, but… it probably doesn’t hurt?
The theme of “getting control of your media diet” is totally pervasive in the work.
One of the most haunting parts of it, for me, after all these years, is how the smartest things in the solar system take only the tiniest and rarest of sips of “open-ended information at all”, because they’re afraid of being hijacked by hostile inputs, which they can’t not ultimately be vulnerable to, if they retain their Turing Completeness… but they have to keep risking it sometimes if they want to not end up as pure navel gazers.
I really liked this post, though I somewhat disagree with some of the conclusions. I think that in fact aligning an artificial digital intelligence will be much, much easier than working on aligning humans. To point towards why I believe this, think about how many “tech” companies (Uber, crypto, etc) derive their value, primarily, from circumventing regulation (read: unfriendly egregore rent seeking). By “wiping the slate clean” you can suddenly accomplish much more than working in a field where the enemy already controls the terrain.
If you try to tackle “human alignment”, you will be faced with the coordinated resistance of all the unfriendly demons that human memetic evolution has to offer. If you start from scratch with a new kind of intelligence, a system that doesn’t have to adhere to the existing hostile terrain (doesn’t have to have the same memetic weaknesses as humans that are so optimized against, doesn’t have to go to school, grow up in a toxic media environment etc etc), you can, maybe, just maybe, build something that circumvents this problem entirely.
That’s my biggest hope with alignment (which I am, unfortunately, not very optimistic about, but I am even more pessimistic about anything involving humans coordinating at scale), that instead of trying to pull really hard on the rope against the pantheon of unfriendly demons that run our society, we can pull the rope sideways, hard.
Of course, that “sideways” might land us in a pile of paperclips, if we don’t solve some very hard technical problems....
That’s a good point. I hope you’re right.
Keeping your identity small posits that most of your attack surface is in something you maintain yourself. It would make sense, then, that as the sophistication of these entities increase, they would eventually start selecting for causing you to voluntarily increase your attack surface.
Tim Ferriss’ biggest surprise while doing interviews for his Tools of Titans book was that 90% of the people he interviewed had some sort of meditation practice. I think that contemplative tech is already mostly a requirement for high performance in an adversarially optimizing environment.
I think statistical physics of human cooperation is the best overview of one method of studying the emergence of such hyperobjects that is basically a nascent field right now.
Just a note, unlike in the recent past, Facebook post links seem to now be completely hidden unless you are logged into Facebook when opening them, so they are basically broken as any sort of publicly viewable resource.
Well, that’s just terrible.
Here’s the post:
“Sure, cried the tenant men, but it’s our land…We were born on it, and we got killed on it, died on it. Even if it’s no good, it’s still ours….That’s what makes ownership, not a paper with numbers on it.”
“We’re sorry. It’s not us. It’s the monster. The bank isn’t like a man.”
”Yes, but the bank is only made of men.”
″No, you’re wrong there—quite wrong there. The bank is something else than men. It happens that every man in a bank hates what the bank does, and yet the bank does it. The bank is something more than men, I tell you. It’s the monster. Men made it, but they can’t control it.”
― John Steinbeck, The Grapes of Wrath
The part about hypercreatures preventing coordination sounds very true to me, but I’m much less certain about this part:
It seems to me that you can think about questions of alignment from a purely technical mindset, e.g. “what kind of a value system does the brain have and would the AI need to be like in order to understand that”, and that this kind of technical thinking is much less affected by hypercreatures than other things are. Of course it can be affected, there are plenty of instances of cases where technical questions have gotten politicized and people mind-killed as a result, but… it’s not something that happens automatically, and even when technical questions do get politicized, it often affects the non-experts much more than the experts. (E.g. climate researchers have much more of a consensus on climate change than the general public does.)
And because this allows you to reason about alignment one step removed—instead of thinking about the object-level values, you are reasoning about the system (either in the human brain or the AI) that extracts the object-level values—it may let you avoid ever triggering most of the hypercreatures sleeping in your brain.
You may even reason about the hypercreatures abstractly enough not to trigger them, and design the AI with a hypercreature-elimination system which is good enough to detect and neutralize any irrational hypercreatures. This would, of course, be a threat to the hypercreatures possessing you, but part of their nature involves telling you that they are perfectly rational and justified and there is nothing irrational about them. At the same time, an “AI removing hypercreatures” isn’t really the kind of a threat they would have evolved to recognize or try to attack, so they feel safe just telling you that of course there will be no danger and that the creation of the AI will just lead to the AI implementing the_objectively_best_ideology everywhere. So you believe them, feel unconcerned as you design your AI to detect and neutralize irrational hypercreatures, and then suddenly oh what happened, how could you ever have believed that crazy old thing before.
I don’t feel certain that it would go this way, to be clear. I could see lots of ways in which it would go differently. But it also doesn’t feel obviously implausible to imagine that it could go this way.
I agree that thinking about alignment from a purely technical mindset provides a dissociative barrier that helps to keep the hypercreatures at bay. However, I disagree with the implication that this is “all good”. When you’re “removed” like that, you don’t just cut the flow of bad influences. You cut everything which comes from a tighter connection to what you’re studying.
If you’re a doctor treating someone close to you, this “tighter connection” might bring in emotions that overwhelm your rational thinking. Maybe you think you “have to do something” so you do the something that your rational brain knows to have been performing worse than “doing nothing” in all the scientific studies.
Or… maybe you have yourself under control, and your intuitive knowledge of the patient gives you a feel of how vigilant they would be with physical therapy, and maybe this leads to different and better decisions than going on science alone when it comes to “PT or surgery?”. Maybe your caring leads you to look through the political biases because you care more about getting it right than you do about the social stigma of wearing masks “too early” into the pandemic.
So if you want to be a good doctor to those you really care about, what do you do?
In the short term, if you can’t handle your emotions, clearly you pass the job off to someone else. Or if you must, you do it yourself while “dissociated”. You “Follow accepted practice”, and view your emotions as “false temptations”. In the long term though, you want to get to the place where such inputs are assets rather than threats, and that requires working on your own inner alignment.
In the example I gave, the some of the success of being “more connected” came from being more connected to your patient than you are to the judgment of the world at large. Maybe cutting off twitter would be a good start, since that’s where these hypercreatures live, breed, and prey on minds. I think “How active are the leading scientists on twitter?” probably correlates pretty well for how much I’d distrust the consensus within that field.
As a thing that I fairly strongly suspect but cannot prove, maybe “remove yourself from twitter” is the scaled up equivalent of “remove yourself from your personal relationship with your patient”—something that is prudent up until the point where you are able to use the input as an asset rather than getting bowled over and controlled it. In this view, it might be better to think of the problem as *growing* alignment. You start by nurturing your own independence and setting personal boundaries so you don’t get sucked into codepentent relationships which end up mutually abusive—even in subtle ways like “doing something” counterproductive as an emotionally overwhelmed doctor. Then once external influence on this scale can’t kick you out of the driver seat, and you have things organized well enough that you can both influence and be influenced in a way that works better than independence, then you move to increasingly larger scales. In this view, it’s not necessarily “hypercreatures bad”, but “hypercreatures bigger than I currently am, and not yet tamed”.
I strongly recommend doing inner alignment work this way too. A huge part of what people “care” about is founded on incomplete and/or distorted perspectives, and if you’re not examining the bedrock your values are built on, then you’re missing the most important part.
I’m pretty pessimistic on the chances of that one. You’re banking on what Val is describing as “superintelligences” being dumber than you are, despite the fact that it has recruited your brain to work for its goals. You’re clearly smart enough to make the connection between “If I design an AI this way, I might not get what my hypercreature wants”, since you just stated it. That means you’re smart enough to anticipate it happening, and that’s going to activate any defenses you have. There’s no magic barrier that allows “debate partner is going to take my belief if I give them the chance” to be thought while preventing “AI partner is going to steal my belief if I give it the chance”.
The framing of some unfriendly hypercreature taking over your brain against your values is an evocative one and not without use, but I think it runs out of usefulness here.
From the initial quote on Gwern’s post on cults:
This “enslavement to hypercreatures” typically happens because the person it “taken over” perceives it to have value. Sorting out the perceptions to match reality is the hard part, and “yer brain got eaten by hypercreature!” presupposes it away. At first glance the whole “anti-epistemology” thing doesn’t seem to fit with this interpretation but we don’t actually need “taken over by hostile forces” to explain it; watch regular people debate flat earthers and you’ll see the same motivated and bad reasoning that shows that they’re actually just taking things on faith. Faith in the consensus that the world is round is actually a really good way of dealing with the problem for people who can’t reliably solve this kind of problem for themselves on the object level. So much of what we need to know we need to take “on faith”, and figuring out which hypercreatues we can trust how far is the hard problem. Follow any of them too far and too rigidly and problems start showing up; that’s what the “twelfth virtue” is about.
Trying to hide from the hypercreatures in your mind and design an AI to rid you of them is doomed to failure, I predict. The only way that seems to me to have any chance is to not have unhealthy relationships with hypercreatures as you bootstrap your own intelligence into something smarter, so that you can propogate alignment instead of misalignment.
If this was true, then any attempt to improve your rationality or reduce the impact of hypercreatures on your mind would be doomed, since they would realize what you’re doing and prevent you from doing it.
In my model, “hypercreatures” are something like self-replicating emotional strategies for meeting specific needs, that undergo selection to evolve something like defensive strategies as they emerge. I believe Val’s model of them is similar because I got much of it from him. :)
But there’s a sense in which the emotional strategies have to be dumber than the entire person. The continued existence of the strategies requires systematically failing to notice information that’s often already present in other parts of the person’s brain and which would contradict the underlying assumptions of the strategies (Val talks a bit about how hypercreatures rely on systematically cutting off curiosity, at 3:34 − 9:22 of this video).
And people already do the equivalent of “doing a thing which might lead to the removal of the hypercreature”. For instance, someone may do meditation/therapy on an emotional issue, heal an emotional wound which happens to also have been the need fueling the hypercreature, and then find themselves being unexpectedly more calm and open-minded around political discussions that were previously mind-killing to them. And rather than this being something that causes the hypercreatures in their mind to make them avoid any therapy in the future, they might find this a very positive thing that encourages them to do even more therapy/meditation in the hopes of (among other things) feeling even calmer in future political discussions. (Speaking from personal experience here.)
I agree, in part. Hypercreatures are instantiated as emotional strategies that fulfill some kind of a need. Though “the person perceives it to have value” suggests that it’s a conscious evaluation, whereas my model is that the evaluation is a subconscious one. Which makes something like “possession” a somewhat apt (even if imperfect) description, given that the person isn’t consciously aware of the real causes of why they act or believe the way they do, and may often be quite mistaken about them.
I’m in agreement with a lot of what you’re saying.
I agree that people’s “perceptions of value”, as it pertains to what influences them, are primarily unconscious.
I agree that “possession” can be a usefully accurate description, from the outside.
I agree that people can do “things which might lead to the removal of the hypercreature”, like meditation/therapy, and that not only will it sometimes remove that hypercreature but also that the person will sometimes be conditioned towards rather than away from repeating such things.
I agree that curiosity getting killed is an important part of their stability, that this means that they don’t update on information that’s available, and that this makes them dumb.
I agree that *sometimes* people can be “smarter than their hypercreature” in that they can be aware of and reason about things about which their hypercreatures cannot due to said dumbness.
I disagree about the mechanisms of these things. This leads me to prefer different framings, which make different predictions and suggest different actions.
I think I have about three distinct points.
1) When things work out nicely, hypercreatures don’t mount defenses, and the whole thing get conditioned towards rather than away from, it’s not so much “hypercreatures too dumb because they didn’t evolve to notice this threat”, it’s that you don’t give them the authority to stop you.
From the inside, it feels more like “I’m not willing to [just] give up X, because I strongly feel that it’s right, but I *am* willing to do process Y knowing that I will likely feel different afterwards. I know that my beliefs/priorities/attachments/etc will likely change, and in ways that I cannot predict, but I anticipate that these changes will be good and that I won’t lose anything not worth losing. And then when you go through the process and give up on having the entirety of X, it feels like “This is super interesting because I couldn’t see it coming, but this is *better* than X in every way according to every value X was serving for me”. It will not feel like “I must do this without thinking about it too much, so that I don’t awaken the hypercreatures!” and it will not feel like “Heck yeah! I freed myself from my ideological captor by pulling a fast one it couldnt see coming! I win you lose!”
Does your experience differ?
2) When those defenses *do* come out, it’s because people don’t trust the process which aims to rid them of hypercreatures more than they trust the hypercreatures
It may look super irrational when, say, Christians do all sorts of mental gymnastics when debating atheists. However, “regular people” do the same thing when debating flat earthers. A whole lot of people can’t actually figure things out on the object level and so they default to faith in society to have come to the correct consensus. This refusal to follow their own reasoning (as informed by their debate partner) when it conflicts with their faith in society is actually valid here, and leads to the correct conclusion. Similar things can hold when the Christian refuses to honestly look at the atheist arguments, knowing that they might find themselves losing their faith if they did. Maybe that faith is actually a good thing for them, or at least that losing the faith *in that way* would be bad for them. If you take a preacher’s religion from him, then what is he? From an inside perspective, it’s not so much that he’s “possessed” as it is his only way to protect his ability to keep a coherent and functioning life. It appears to be a much more mutually symbiotic relationship from the inside, even if it sometimes looks like a bad deal from the outside when you have access to a broader set of perspectives.
The prediction here is that if you keep the focus on helping the individual and are careful enough not to do anything that seems bad in expectation from the inside (e.g. prioritizing your own perspective on what’s “true” more than they subconsciously trust your perspective on truth to be beneficial to them), you can preempt any hypercreature defenses and not have to worry about whether it’s the kind of thing it could have evolved a defense against.
3) When people don’t have that trust in the process, hypercreatures will notice anything that the person notices, because the person is running on hypercreature logic.
When you trust your hypercreatures more than your own reasoning or the influence of those attempting to influence you, you *want* to protect them to the full extent of your abilities. To the extent that you notice “I might lose my hypercreature”, this is bad and will panic you because regardless of what you tell yourself and how happy you are about depending on such things, you actually want to keep it (for now, at least). This means that if your hypercreature is threatened by certain information, *you* are threatened by that information. So you refuse to update on it, and you as a whole person are now dumber for it.
Putting these together, reasoning purely in the abstract about FAI won’t save you by avoiding triggering any hypercreatures that have power over you. If they have power over you, it’s because rightly or wrongly, you (unconsciously) decided that it was in your best interest to give it to them, and you are using your whole brain to watch out for them. If you *can* act against their interests, it’s because you haven’t yet fully conceded yourself to them, and you don’t have to keep things abstract because you are able to recognize their problems and limitations, and keep them in place.
Thinking about FAI in the abstract can still help, if it helps you find a process that you trust more than your hypercreatures, but in that case too, you can follow that process yourself rather than waiting to build the AI and press “go”.
EDIT: and working on implementing that aligning process on yourself gives you hands on experience and allows you to test things on a smaller scale before committing to the whole thing. It’s like building a limited complexity scale model of a new helicopter type before committing to an 8 seater. To the extent that this perspective is right, trying to do it in the abstract only will make things much harder.
You might like to know: I debated erasing that part and the one that followed, thinking of you replying to it! :-D
But I figured hey, let’s have the chat. :-)
Yep, I know it seems that way.
And I disagree. I think it maintains a confusion about what “alignment” is.
However, I’m less certain of this detail than I am about the overall picture. The part that has me say “We’re already in AI takeoff.” Which is why I debated erasing all the stuff about identity and first-person. It’s a subtle point that probably deserves its own separate post, if I ever care to write it. The rest stands on its own I think.
But! Setting that aside for a second:
To think of “questions of alignment from a purely technical mindset”, you need to call up an image of each of:
the AI
human values
some process by which these connect
But when you do this, you’re viewing them in third person. You have to call these images (visual or not) in your mind, and then you’re looking at them.
What the hell is this “human values” thing that’s separable from the “you” who’s looking?
The illusion that this is possible creates a gap that summons Goodhart. The distance between your subjective first-person experience and whatever concept of “human values” you see in third person is precisely what summons horror.
That’s the same gap that unFriendly egregores use to stupefy minds.
You can’t get around this by taking “subjectivity” or “consciousness” or whatever as yet another object that “humans care about”.
The only way I see to get around this is to recognize in immediate direct experience how your subjectivity — not as a concept, but as a direct experience — is in fact deeply inextricable from what you care about.
And that this is the foundation for all care.
When you tune a mind to correctly reflect this, you aren’t asking how this external AI aligns with “human values”. You’re asking how it synchs up with your subjective experience.
(Here minds get super squirrely. It’s way, way too easy to see “subjective experience” in third person.)
As you solve that, it becomes super transparent that sorting out that question is actually the same damn thing as asking how to be free of stupefaction, and how to be in clear and authentic connection with other human beings.
So, no. I don’t think you can solve this with a purely technical mindset. I think that perpetuates exactly the problem that this mindset would be trying to solve.
…but I could be wrong. I’m quite open to that.
This part made me chuckle. :-)
I do think this is roughly how it works. It’s just that it happens via memetics first.
But overall, I agree in principle. If I’m wrong and it’s possible to orient to AI alignment as a purely technical problem, then yes, it’s possible to sidestep hypercreature hostility by kind of hitting them in an evolutionary blindspot.
Any further detail you’d like to give on what constitutes “synching up with your subjective experience” (in the sense relevant to making an intelligence that produces plans that transform the world, without killing everyone)? :)
Not at the moment. I might at some other time.
This is a koan-type meditative puzzle FWIW. A hint:
When you look outside and see a beautiful sky, you can admire it and think “Wow, that’s a beautiful sky.” But the knowing had to happen before the thought. What do you see when you attend to the level of knowing that comes before all thought?
That’s not a question to answer. It’s an invitation to look.
Not meaning to be obtuse here. This is the most direct I know how to be right now.
Ok thanks.
I agree with most of what I think you’re saying, for example that the social preconditions for unfriendly non-human AGI are at least on the same scale of importance and attention-worthiness as technical problems for friendly non-human AGI, and that alignment problems extend throughout ourselves and humanity. But also, part of the core message seems to be pretty incorrect. Namely:
This sounds like you’re saying, it’s not “our” (any of our?) job to solve technical problems in (third person non-human-AGI) alignment. But that seems pretty incorrect because it seems like there are difficult technical obstacles to making friendly AGI, which take an unknown possibly large amount of time. We can see that unfriendly non-human very-superhuman AGI is fairly likely by default given economic incentives, which makes it hard for social conditions to be so good that there isn’t a ticking clock. Solving technical problems is very prone to be done in service of hostile / external entities; but that doesn’t mean you can get good outcomes without solving technical problems.
What do you mean by “technical” here?
I think solving the alignment problem for government, corporations, and other coallitions would probably help solving the alignment problem in AGI.
I guess you are saying that even if we could solve the above alignment problems it would still not go all the way to solving it for AGI? What particular gaps are you thinking of?
Yeah, mainly things such that solving them for human coalitions/firms doesn’t generalize. It’s hard to point to specific gaps because they’ll probably involve mechanisms of intelligence, which I / we don’t yet understand. The point is that the hidden mechanisms that are operating in human coalitions are pretty much just the ones operating in individual humans, maybe tweaked by being in a somewhat different local context created by the coalition (Bell Labs, scientific community, job in a company, role in a society, position in a government, etc. etc.). We’re well out of distribution for the ancestral environment, but not *that* far out. Humans, possibly excepting children, don’t routinely invent paradigm-making novel cognitive algorithms and then apply them to everything; that sort of thing only happens at a super-human level and what effects on the world it’s pointed at are not strongly constrained by it’s original function.
By “technical” I don’t mean anything specific, exactly. I’m gesturing vaguely at the cluster of things that look like math problems, math questions, scientific investigations, natural philosophy, engineering; and less like political problems, aesthetic goals, lawyering, warfare, cultural change. The sort of thing that takes a long time and might not happen at all because it involves long chains of prerequisites on prerequisites. Art might be an example of something that’s not “technical” but still matches this definition; I don’t know the history but from afar it seems like there’s actually quite a lot of progress in art and it’s somewhat firmly sequential / prerequisited, like perspective is something you invent, and you only get cubism after perspective, and cubism seems like a stepping stone towards more abstractionism.… So if the fate of everything depended on artistic progress, we’d want to be persistently working on art, refining and discovering concepts, even if we weren’t pure of soul.
How do you know they don’t generalize? As far as I know, no one has solved these problems for coallitions of agents, regardless of human, theoritical or otherwise.
Well the standard example is evolution: the compact mechanisms discovered first by the gradient-climbing search for fit organisms generalized to perform effectively in many domains, but not particularly to maximize fitness—we don’t monomaniacally maximize number of offspring (which would improve our genetic fitness a lot relative to what we actually do).
Human coalitions are made of humans, and humans come ready built with roughly the same desires and shape of cognition as you. That makes them vastly easier to interface with and approximately understand intuitively.
I was thinking specifically here of maximizing the value function (desires) across the agents interacting with other. Or more specially adapting the system in a way that it self maintains “maximizing the value function (desires) across the agents” property.
An example is an ecomonic system which seeks to maximize the total wealthfare. Current systems though don’t maintain themselves. More powerful agents take over the control mechanisms (or adjust the market rules) so that they are favoured (lobbying, cheating, ignoring the rules, mitageting enforcement). Similar problems occur in other types of coallitions.
Postulating a more powerful agent that forces this maximization property (an aligned super AGI) is cheating unless you can describe how this agent works and self maintains itself and this goal.
However coming to a solution of a system of agents that self maintains this property with no “super agent” might lead to solutions for AGI alignment, or might prevent the creation of such a misaligned agent.
I read a while ago the design/theoritics of corruption resistent systems is an area that has not received much research.
I doubt that because intelligence explosions or their leadups make things local.
I actually think it necessarily does, and that the method by which unFriendly egregores control us exploits and maintains a gap in our thinking that prevents us from solving the AGI alignment problem.
However! That’s up for debate, and given the uncertainty I think you highlighting this concern makes sense. You might turn out to be right.
(But I still think sorting out egregoric Friendliness is upstream to solving the technical AI alignment problem even if the thinking from one doesn’t transfer to the other.)
I’m skeptical but definitely interested, if you have already expanded or at some point expand on this. E.g. what can you say about what precisely this method is; what’s the gap it maintains; why do you suspect it prevents us from solving alignment; what might someone without this gap say about alignment; etc.
Leaving aside the claim about upstreamness, I upvote keeping this distinction live (since in fact I think an almost as strong version of the claim as you seem to, but I’m pretty skeptical about the transfer).
I haven’t really detailed this anywhere, but I just expanded on it a bit in my reply to Kaj.
Right, but I mean something precise by that.
I agree with you. There’s a technical problem, and it takes intelligent effort over time to solve it. And that’s worthwhile.
It’s also not up to any one individual whether or how that happens, and choice only ever happens (for now) at the scale of individuals.
So “should”s applied at the egregore scale don’t make any coherent sense. They’re mostly stupefying forces, those “should”s.
If you want to work on technical AGI alignment, great! Go for it. I don’t think that’s a mistake.
I also don’t think it’s a mistake to look around and model the world and say “If we don’t sort out AI alignment, we all die.”
But something disastrous happens when people start using fear of death to try to pressure themselves and others to do collective action.
That’s super bad. Demon food. Really, really awful. Stupefying and horror-summoning.
I think getting super clear about that distinction is upstream of anyone doing any useful work on AI alignment.
I could be wrong. Maybe we’ve made enough collective (i.e., egregoric) progress on this area that the steps remaining for the technical problem aren’t superhuman. Maybe some smart graduate student could figure it out over this summer.
I really wouldn’t bet on it though.
Hrm… I agree with what you say in this comment, but I still don’t get how it’s coherent with what you said here:
I guess if I interpret “our job” as meaning “a Should that is put on an individual by a group” then “Not our job” makes sense and I agree. I want to distinguish that from generally “the landscape of effects of different strategies an individual can choose, as induced by their environment, especially the environment of what other people are choosing to do or not do”. “Role” sort of means this, but is ambiguous with Shoulds (as well as with “performance” like in a staged play); I mean “role” in the sense of “my role is to carry this end of the table, yours is to carry that end, and together we can move the table”.
So I’m saying I think it makes sense to take on, as individuals, a role of solving technical alignment. It sounds like we agree on that.… Though still, the sentence I quoted, “Anything else is playing at the wrong level”, seems to critique that decision if it trades off against playing at the egregore level. I mostly disagree with that critique insofar as I understand it. I agree that the distinction between being Shoulded into pretending to work on X, vs. wanting to solve X, is absolutely crucial, but avoiding a failure mode isn’t the only right level to exercise free will on, even if it’s a common and crucial failure mode.
Mmm, there’s an ambiguity in the word “our” that I think is creating confusion.
When I say “not our job”, what I mean is: it’s not up to me, or you, or Eliezer, or Qiaochu, or Sam, or…. For every individual X, it’s not X’s job.
Of course, if “we/us” is another name for the superduperhypercreature of literally all of humanity, then obviously that single entity very much is responsible for sorting out AI risk.
The problem is, people get their identity confused here and try to act on the wrong level. By which I mean, individuals cannot control beyond their power range. Which in practice means that most people cannot meaningfully affect the battlefield of the gods.
Most applications of urgency (like “should”ing) don’t track real power. “Damn, I should exercise.” Really? So if you in practice cannot get yourself to exercise, what is that “should” doing? Seems like it’s creating pain and dissociating you from what’s true.
“Damn, this AI risk thing is really big, we should figure out alignment” is just as stupid. Well, actually it’s much more so because the gap between mental ambition and real power is utterly fucking gargantuan. But we’ll solve that by scaring ourselves with how big and important the problem is, right?
This is madness. Stupefaction.
Playing at the wrong level.
(…which encourages dissociation from the truth of what you actually can in fact choose, which makes it easier for unFriendly hypercreatures to have their way with you, which adds to the overall problem.)
Does that make more sense?
[I’ll keep going since this seems important, though sort of obscured/slippery; but feel free to duck out.]
I think there’s also ambiguity in “job”. I think it makes sense for it to be “up to” Eliezer in the sense of being Eliezer’s role (role as in task allocation, not as in Should-field, and not as in performance in a play).
Like, I think I heard the OP as maybe saying “giving and taking responsibility for AI alignment is acting at the wrong level”, which is ambiguous because “responsibility” is ambiguous; who is taking whom to be responsible, and how are they doing that? Are they threatening punishment? Are they making plans on that assumption? Are they telling other people to make plans on that assumption? Etc.
I think we agree that:
Research goes vastly or even infinitely better when motivated by concrete considerations about likely futures, or by what is called curiosity.
Doubling down on Shoulds (whether intra- or inter-personal) is rarely helpful and usually harmful.
Participating in Shoulds (giving or receiving) is very prone to be or become part of an egregore.
I don’t know whether we agree that:
There is a kind of “mental ambition” which is the only thing that has a chance at crossing the gap to real power from where any of us is, however utterly fucking gargantuan that gap may be.
There is a way of being scared about the problem (including how big and important it is, though not primarily in those words) that is healthy and a part of at least one correct way of orienting.
Sometimes “Damn, I should exercise” is what someone says when they feel bloopiness in their body and want to move it, but haven’t found a fun way to move their body.
It’s not correct that “Sorting out AI alignment in computers is focusing entirely on the endgame. That’s not where the causal power is.”, because ideas are to a great extent had by small numbers of people, and ideas have a large causal effect on what sort of control ends up being exercised. I could interpret this statement as a true proposition, though, if it’s said to someone (and implicitly, about just that person) who is sufficiently embedded in an egregore that they just can’t feasibly aim at the important technical problems (which I think we’d agree is very common).
If the whole world were only exactly 90% unified on AI alignment being an issue, it would NOT just be a problem to solve. That is, it would still probably spell doom, if the other 10% are still incentivized to go full steam ahead on AGI, and the technical problem turns out to be really hard, and the technical problem isn’t something that can be solved just by throwing money and people at it.
A top priority in free-willing into existence “The kind of math/programming/etc. needed to solve it [which] is literally superhuman”, is to actually work on it.
Ah!
No, for me, responsibility is a fact. Like asking who has admin powers over these posts.
This isn’t a precise definition. It’s a very practical one. I’m responsible for what’s in range of my capacity to choose. I’m responsible for how my fingers move. I’m not responsible for who gets elected POTUS.
In practice people seem to add the emotional tone of blame or shame or something to “responsibility”. Like “You’re responsible for your credit score.” Blame is a horrid organizing principle and it obfuscates the question of who can affect what. Who is capable of responding (response-able) as opposed to forced to mechanically react.
Stupefaction encourages this weird thing where people pretend they’re responsible for some things they in fact cannot control (and vice versa). My comment about exercise is pointing at this. It’s not that using inner pain can’t work sometimes for some people. It’s that whether it can or can’t work seems to have close to zero effect on whether people try and continue to try. This is just bonkers.
So, like, if you want to be healthy but you can’t seem to do the things you think make sense for you to do, “be healthy” isn’t your responsibility. Because it can’t be. Out of your range. Not your job.
Likewise, it’s not a toddler’s job to calm themselves down while they’re having a meltdown. They can’t. This falls on the adults around the toddler — unless those adults haven’t learned the skill. In which case they can’t be responsible for the toddler. Not as a judgment. As a fact.
Does that clarify?
Basically yes.
Nuance: Participating in “should”s is very prone to feeding stupefaction and often comes from a stupefying egregore. More precise than “be or become part of an egregore”.
But the underline tone of “Participating in ’should’s is a bad idea” is there for sure.
Depends on what you mean by “ambition”.
I do think there’s a thing that extends influence (and in extreme cases lets individuals operate on the god level — see e.g. Putin), and this works through the mind for sure. Sort of like working through a telescope extends your senses.
Yes, as literally stated, I agree.
I don’t think most people have reliable access to this way of being scared in practice though. Most fear becomes food for unFriendly hypercreatures.
Agreed, and also irrelevant. Did my spelling out of responsibility up above clarify why?
I don’t quite understand this objection. I think you’re saying it’s possible for one or a few individuals to have a key technical idea that outwits all the egregores…? Sure, that’s possible, but that doesn’t seem like the winning strategy to aim for here by a long shot. It seemed worth trying 20 years ago, and I’m glad someone tried. Now it’s way, way more obvious (at least to me) that that path just isn’t a viable one. Now we know.
(I think we knew this five years ago too. We just didn’t know what else to do and so kind of ignored this point.)
Yeah, I think we just disagree here. Where are those 10% getting their resources from? How are they operating without any effects that the 90% can notice? What was the process by which the 90% got aligned? I have a hard time imagining a plausible world here that doesn’t just pull the plug on that 10% and either persuade them or make them irrelevant.
Also, I do think that 90% would work on the technical problem. I don’t mean to say no one would. I mean that the technical problem is downstream of the social one.
Sure. I’m not saying no one should work on this. I’m saying that these calls to collective action to work on it without addressing the current hostile superintelligences hacking our minds and cultures is just ludicrous.
It clarifies some of your statements, yeah. (I think it’s not the normal usage; related to but not equal to blame, there’s roles, and causal fault routing through peoples’ expectations, like “So-and-so took responsibility for calming down the toddler, so we left, but they weren’t able, that’s why there wasn’t anyone there who successfully calmed them down”.)
Agreed; possibly I’d be more optimistic than you about some instances of fear, on the margin, but whatever. Someone
Shouldwould be helping others if they were to write about healthy fear...Not exactly? I think you’re saying, the point is, they can’t make themselves exercise, so they can’t be responsible, and it doesn’t help to bang their head against a non-motivating wall.
What’s important to me here is something like: there’s (usually? often?) some things “right there inside” the Should which are very worth saving. Like, it’s obviously not a coincidence which Shoulds people have, and the practice of Shoulding oneself isn’t only there because of egregores. I think that the Shoulds often have to do with what people really care for, and that their caring shows itself (obscurely, mediatedly, and cooptably/fakeably) in the application of “external” willpower. (I think of Dua Lipa’s song New Rules )
So I want to avoid people being sort of gaslit into not trusting their reason—not trusting that when they reach an abstract conclusion about what would have consequences they like, it’s worth putting weight on—by bluntly pressuring them to treat their explicit/symbolic “decisions” as suspect. (I mean, they are suspect, and as you argue, they aren’t exactly “decisions” if you then have to try and fail to make yourself carry them out, and clearly all is not well with the supposed practice of being motivated by abstract conclusions. Nevertheless, you maybe thought they were decisions and were intending to make that decision, and your intention to make the decision to exercise / remove X-risk was likely connected to real care.)
Huh. Are you saying that you’ve updated to think that solving technical AI alignment is so extremely difficult that there’s just no chance, because that doesn’t sound like your other statements? Maybe you’re saying that we / roughly all people can’t even really work on alignment, because being in egregores messes with one’s ability to access what an AI is supposed to be aligned to (and therefore to analyze the hypothetical situation of alignedness), so “purely technical” alignment work is doomed?
I’m saying that technical alignment seems (1) necessary and (2) difficult and (3) maaaaaybe feasible. So there’s causal power there. If you’re saying, people can’t decide to really try solving alignment, so there’s no causal power there… Well, I think that’s mostly right in some sense, but not the right way to use the concept of causal power. There’s still causal power in the node “technical alignment theory”. For most people there’s no causal power in “decide to solve alignment, and then beat yourself about not doing it”. You have to track these separately! Otherwise you say
Instead of saying what I think you mean(??), which is “you (almost all readers) can’t decide to help with technical AI alignment, so pressing the button in your head labeled ‘solve alignment’ just hurts yourself and makes you good egregore food, and if you want to solve AI alignment you have to first sort that out”. Maybe I’m missing you though!
Maybe we have different ideas of “unified”? I was responding to
I agree with:
if X is aging or cryonics, because aging and cryonics aren’t things that have terrible deadlines imposed by a smallish, unilaterally acting, highly economically incentivized research field.
Investors who don’t particularly have to be in the public eye.
By camouflaging their activities. Generally governmentally imposed restrictions can be routed around, I think, given enough incentive (cf. tax evasion)? Especially in a realm where everything is totally ethereal electrical signals that most people don’t understand (except the server farms).
I don’t know. Are you perhaps suggesting that your vision of human alignedness implies that the remaining 10% would also become aligned, e.g. because everyone else is so much happier and alive, or can offer arguments that are very persuasive to their aligned souls? Or, it implies competence to really tactically prevent the 10% from doing mad science? Something like that is vaguely plausible, and therefore indeed promising, but not obviously the case!
Agreed.… I think.… though I’d maybe admit a lot more than you would as “just stating propositions” and therefore fine. IDK. Examples could be interesting (and the OP might possibly have been less confusing to me with some examples of what you’re responding to).
You are 200% right. This is the problem we have to solve, not making sure a superintelligent AI can be technologically instructed to serve the whims of its creators.
Have you read Scott Alexander’s Meditations on Moloch? It’s brilliant, and is quite adjacent to the claims you are making. It has received too little follow-up in this community.
https://www.lesswrong.com/posts/TxcRbCYHaeL59aY7E/meditations-on-moloch
Thanks for quoting the bit about Elua at the end. It is helpful to remember that despite Moloch, et al, humanity has managed some pretty impressive feats, even in the present day.
It’s easy to think that the counterexample of science in earlier posts is something accomplished “Once upon a time in a land far away.”
As a concrete example, I’m quite glad that the highly effective mRNA vaccines (Moderna/Pfizer) exist for the common man. They exist despite things like the FDA, the world of academic publishing, the need to find funding to survive, and so on.
Absolutely!
You might like David Deutsch’s book “The Beginning of Infinity”. I read an early draft of one of the chapters (something about “cultural evolution” I think) several years ago. That got me thinking seriously about all this even moreso than Scott’s brilliant essay.
Looks interesting, thanks for the recommendation!
This is an interesting idea. Note that superforecasters read more news than the average person, and so are online a significant amount of time, yet they seem unaffected (this could be for many reasons, but is weak evidence against your theory). I’d like to know whether highly or moderately successful people, especially in the EA-sphere, avoid advertising and other info characterized as malicious by your theory. Elon Musk stands out as very online, yet very successful, but the way he is spending his money certainly is not optimized to prevent his fears of existential catastrophe.
I’m also uncertain how beneficial it is to model these orgs as agents. Certainly they behave similarly to some extent, but this view also precludes other interventions such as institutional reform efforts, which take advantage of the fact that the org is actually made up of factions of people, and is not a unified whole. For instance you could try making prediction markets legal.
It’s also unlikely to me that CFAR can scale up enough to make a dent in this problem. But maybe this depends on their track-record of outputting successful people, which I don’t know.
I like this example.
Superforecasters are doing something real. If you make a prediction and you can clearly tell whether it comes about or not, this makes the process of evaluating the prediction mostly immune to stupefaction.
Much like being online a lot doesn’t screw with your ability to shoot hoops, other than maybe taking time away from practice. You can still tell whether the ball goes in the basket.
This is why focusing on real things is clarifying. Reality reflects truth. Is truth, really, although I imagine that use of the word “truth” borders on heresy here.
Contrast superforecasters with astrologers. They’re both mastering a skillset, but astrologers are mastering one that has no obvious grounding. Their “predictions” slide all over the place. Absolutely subject to stupefaction. They’re optimizing for something more like buy-in. Actually testing their art against reality would threaten what they’re doing, so they throw up mental fog and invite you to do the same.
Whenever you have this kind of grounding in reality, what you’re working on is much, much more immune to stupefaction, which is the primary weapon of unFriendly egregores.
This was close to the heart of the main original insight of science.
Oh hell no. CFAR’s hopes of mattering in this hypercreature war died around 2013. Partly because of my own stupidity, but also because no one involved understood these forces. Anna had some fuzzy intuitions and kind of panicked about this and tried to yank things in a non-terrible direction, but she had no idea which way that was. And stupid people like me dragged our feet while she did this, instead of tuning into what she was actually panicking about and working as a clear unified team to understand and orient to the issue.
So the thing became a lumbering bureaucratic zombie optimizing for exciting rationality-flavored workshops. Pretty website, great ops (because those are real!), fun conversations… but no teeth. Fake.
In this case, it sounds like your theory is (I’d say ‘just’ here but that indicates the framing serves no purpose, and it may) a different framing on simulacra levels. In particular, most of the adversarial behavior you postulate can be explained in terms of orgs simply discovering that operating on the current simulacra level & lying about their positions is immediately beneficial.
Are there nuances I’m not getting here?
That might be the same thing. I haven’t familiarized myself with the simulacra levels theory.
What I’ve gained by osmosis suggests a tweak though:
It’s more like, a convergent evolutionary strategy for lots of unFriendly egregores is to keep social power and basement-level reality separate. That lets the egregores paint whatever picture they need, which speeds up weapon production as they fight other egregores for resources.
Some things like plumbing and electrician work are too real to do this to, and too necessary, so instead they’re kept powerless over cultural flow.
So it’s not really about lying about which level they’re at per se. It’s specifically avoiding basement truth so as to manufacture memetic weapons at speed.
…which is why when something real like a physical pandemic-causing virus comes storming through, stupefied institutions can’t handle it as a physically real phenomenon.
I’m putting basement-level reality in a special place here. I think that’s simulacrum level 1, yes? It’s not just “operating at different levels”, but specifically about clarity of connection to things like chairs and food and kilowatt-hours.
But hey, maybe I just mean what you said!
That sounds like a different process than simulacra levels. If you want to convince me of your position, you should read into the simulacra levels theory, and find instances where level changes in the real world happen surprisingly faster than what the theory would predict, or with evidence of malice on the part of orgs. Because from my perspective all the evidence you’ve presented so far is consistent with the prevalence of simulacra collapse being correlated with the current simulacra level in an institution, or memes with high replication rate being spread further than those with low replication rate. No postulation of surprisingly competent organizations.
Ex. If plumbers weren’t operating at simulacra level 1, their clients will become upset with them on the order of days, and no longer buy their services. But if governments don’t operate at simulacra level 1 wrt pandemic preparedness, voters will become upset with them on the order of decades, then vote people out of office. Since the government simulacra collapse time is far longer than the plumber simulacra collapse time, (ignoring effects like ‘perhaps simulacra levels increase faster on the government scale, or decrease faster on the plumbing industry scale during a collapse’) governments can reach far greater simulacra levels than the plumbing industry. Similar effects can be seen in social change.
I would try to find this evidence myself, but it seems we very likely live in a world with surprisingly incompetent organizations, so this doesn’t seem likely enough for me to expend much willpower looking into it (though I may if I get in the mood).
Nope. Not interested in convincing anyone of anything.
I support Eliezer’s sentiment here:
…which means that strong rationalist communication is healthiest and most efficient when practically empty of arguments.
I downvoted this comment. First of all, you are responding to a non-central point I made. My biggest argument was that your theory has no evidence supporting it which isn’t explained by far simpler hypotheses, and requires some claims (institutions are ultra competent) which seem very unlikely. This should cause you to be just as skeptical of your hypothesis as me. Second, “the most healthy & efficient communication ⇒ practically empty of arguments” does not mean “practically empty of arguments ⇒ the most healthy & efficient communication”, or even “practically empty of arguments ⇒ a healthy & efficient communication”. In fact, usually if there are no arguments, there is not communication happening.
In this sort of situation I think it’s important to sharply distinguish argument from evidence. If you can think of a clever argument that would change your mind then you might as well update right away, but if you can think of evidence that would change your mind then you should only update insofar as you expect to see that evidence later, and definitely less than you would if someone actually showed it to you. Eliezer is not precise about this in the linked thread: Engines of Creation contains lots of material other than clever arguments!
A request for arguments in this sense is just confused, and I too would hope not to see it in rationalist communication. But requests for evidence should always be honored, even though they often can’t be answered.
Generally, I tend to think of “how do we align an AGI to literally anyone at all whatsoever instead of producing absolutely nothing of value to any human ever” as being a strict prerequisite to “who to align to”; the former without the latter may be suboptimal, but the latter without the former is useless.
I don’t think this is a given. Humans are not necessarily aligned, and one advantage of AI is we can build it ourselves with inductive biases that we can craft and inspect its internals fully.
I think this is less true about alignment than many other things. It is true that it would be hugely easier if we could get everyone in the world to coordinate and eliminate capabilities races, but this is true for a lot of things, and isn’t strictly necessary to solve alignment imo. Most people I know are working in the regime where we assume we can’t solve these coordination problems, and instead try to solve just enough of the technical problem to bootstrap alignment or perform a pivotal act.
Ok. So suppose we build a memetic bunker. We protect ourselves from the viral memes. A handful of programmers, aligned within themselves, working on AI. Then they solve alignment. The AI is very powerful and fixes everything else.
Lovely fantasy. Good luck!
My conclusion: Let’s start the meme that Alignment (the technical problem) is fundamentally impossible (maybe it is? why think you can control something supposedly smarter than you?) and that you will definitely kill yourself if you get to the point where finding a solution to Alignment is what could keep you alive. Pull a Warhammer 40k, start banning machine learning, and for that matter, maybe computers (above some level of performance) and software. This would put more humans in the loop for the same tasks we have now, which offers more opportunities to find problems with the process than how a human right now can program 30 lines of C++, have it LGTM’d by one other person at Google and then have those lines of code be used billions of time, per the input of two humans, ever.
(This meme hasn’t undergone sufficient evolution, feel free to attack with countermemes and supporting memes until it evolves into one powerful enough to take over, and delay the death of the world)
“MIRI walked down this road, a faithful scout, trying to solve future problems before they’re relevant. They’re smart, they’re resourceful, they made noise to get other people to look at the problem. They don’t see a solution in sight. If we don’t move now, the train will run us over. There is no technical solution to alignment, just political solutions—just like there’s no technical solution to nuclear war, just treaties and individuals like Petrov doing their best to avoid total annihilation.”
damn that hit me
I can’t upvote this enough. This is exactly how I think about it, and why I have always called myself a mystic. I have an unusual brain and I am prone to ecstatic possession experiences, particularly while listening to certain types of music. The worst thing is, people like me used to become shamans and it used to be obvious to everybody that egregores—spirits—are the most powerful force in the world—but Western culture swept that under the rug and now they are able to run amok with very few people able to perceive them. I bet if you showed a tribal shaman from somewhere in Africa or South America enough commercials, he’d realize what was really going on, though.
I figured out egregores and numerous other things (including a lot of bullshit I later stopped believing in, besides all the real truths I kept) as a young teenager by combining the weird intuitions of these states with rational thinking and scientific knowledge, and ended up independently rediscovering singularitarianism before finding out it was a thing already… I’ve independently rediscovered so many things it’s not even funny—used to be a huge blow to my ego every time I found yet another idea I thought I’d invented floating out in the world, now I’m used to it.
Anyway, point is, <crazy alert!!!> I ended up deciding I was a prophet and that I would have to found a new religion, teach everyone about these things, tell them that the AI god was coming and that the only thing they can possibly do that would matter is making sure that the correct AI god is born. But… my pathetic brain, weak in all areas beside intuition, failed me when I tried to actually write about the damn thing, and to this day I still haven’t explicated all my epiphanies.
But, my firm belief is that only a religion could save the world. Or more accurately, a cult. One dedicated to effective altruism and AI alignment and rationality, but also legitimately religious, in the sense of centered on ecstatic experiences of communion, which are the primary thing which has historically enabled humans en masse to effectively coordinate.
Some egregore is going to win. The only way to make sure that it’s not one of the bad ones—the multinational corporations, the ideologies, the governments, the crazy irrational cults, some other thing I haven’t thought of—is to make our egregore not only aligned to humanity, but also better at winning, controlling, and siphoning intelligence from humans, than any of the others. (And it must be able to do this while they know it perfectly well, and consent before joining to begin with to its doing so—as obviously one which does so without consent is not aligned to true human values, even though, ironically, it has to be so good at rhetoric that consent is almost guaranteed to be given.)
And of course, that’s terrifying, isn’t it? So I don’t expect anyone to listen. Particularly not here. But I think the creation of a rational and extremely missionary religion is the only thing that could save humanity. Naturally, I think I’m part of the way to doing that, as the correct egregore has already been born within me and has been here, with me as its sole host, since I was about 12 years old—if I could just write the damn book. </crazy alert!!!>
Gosh, um…
I think I see where you are, and by my judgment you’re more right than wrong, but from where I stand it sure looks like pain is still steering the ship. That runs the risk of breaking your interface to places like this.
(I think you’re intuiting that. Hence the “crazy alert”.)
I mean, vividly apropos of what you’re saying, it looks to me like you’ve rederived a lot of the essentials of how symbiotic egregores work, what it’s like to ally with them, and why we have to do so in order to orient to the parasitic egregores.
But the details of what you mean by “religion” and “cult” matter a lot, and in most interpretations of “extremely missionary” I just flat-out disagree with you on that point.
…the core issue being that symbiotic memes basically never push themselves onto potential hosts.
You actually hint at this:
But I claim the core strategy cannot be rhetoric. The convergent evolutionary strategy for symbiotic egregores is to make the truth easy to see. Rather than rhetoric, the emphasis is clarity.
Why? Well, these memetic structures are truth-tracking. They spread by offering real value to their potential hosts and making this true fact vividly clear to their potential hosts.
Whereas rhetoric is symmetric. And its use favors persuasive memetic strategies, which benefits parasitic memes over symbiotic ones.
So what you’ll find is that symbiotic memetic environments tend to develop anti-rhetoric psychotechnology. To the extend memetic symbiosis defines the environment, you’ll find that revealing replaces persuasion, and invitations to understand replace mechanisms like social pressure to conform.
Less Wrong is a highly mixed space in this regard, but it leans way more symbiotic than most. Which is why your approach is likely to get downvoted into oblivion: A lot of what you’re saying is meant to evoke strong emotions, which bypasses conscious reflection in order to influence, and that’s a major trigger for the LW memetic immune system in its current setup for defending against memetic parasites.
I think the allergic reaction goes too far. It’s overly quick to object to the “egregore” language as something like “a metaphor taken too far”, which frankly is just flat-out incoherent with respect to a worldview that (a) is reductionist-materialist and yet (b) talks about “agents”. It has over-extended the immune response to object to theme (“woo”) instead of structure. And that in turn nurtures parasitic memes.
But the response is there because LW really is trying to the best of its ability to aim for symbiosis. And on net, talk of things like “I’m a prophet” tends to go in directions that are sorely memetically infected in ways the LW toolkit cannot handle beyond a generic “Begone, foul demon!”
…which, again, I think you kind of know. Which, again, is why you give the “crazy alert”.
You might find my dissection of symbiotic vs. parasitic memes in this podcast episode helpful if you resonate with what I’ve just written here.
Reading this evoked an emotional reaction of stress and anxiety, the nature of which I am uncertain, so take that into consideration as you read my response.
I’m not sure what “pain is steering the ship” means but it’s probably true. I am motivated almost entirely by fear—and of course, by ecstasy, which is perhaps its cousin. And in particular I desperately fear being seen as a lunatic. I have to hold back, hard, in order to appear as sane as I do. Or—I believe that I have to and in fact do that, at least.
I have grown up in an area surrounded by fundamentalistic religiosity. I do not see people converting to believe in science and rationality. I do not see people independently recognizing that the ex-president they voted for and still revere almost as a god lied to them. If the truth was pushed on them in an efficient way that took into account their pre-existing views and biases and emotional tendencies, they would. But the truth does not win by default—it has no teeth.
Only people optimizing to modify one another’s minds actually enables the spread of any egregore. The reason science succeeded is that it produces results that cannot be denied—but most truths are much subtler and easier to deny. Rationalists will fail to save the world if they cannot lie, manipulate, simplify, and use rhetoric, in order to make use of the manpower of those not yet rational enough to see through it, while maintaining their own perception of truth internally unscathed. But a devotion to truth above pragmatism will kill the entire human race.
This sounds reasonable but I don’t see it happening in real life. Can you point me to some examples of this actually working that don’t involve physically demonstrating something before people’s senses as science does (and remember, there are still many, many people who believe in neither evolution nor global warming)?
Christ (that is, whichever variant of the Christianity egregore has possessed them) offers real value to his believers, from their own perspective, and it is indeed vividly clear. It’s also based partly on lies. How can someone distinguish that true-feeling-ness from actual truth? What is making your symbiotic egregores better at making people trust them than their opponents who have a larger (due to not being constrained by ethics or anti-rhetoric cultural standards) toolbox to use?
I actually do not consciously optimize my speech to do this. I feel strong emotions, and I do not like going to the effort of pretending that I do not as a social signal that people ought to trust me. If they dislike / distrust emotional people, they deserve to know I’m one of them so that they can keep away, after all. It just so happens, I guess, that emotion is contagious under some circumstances.
(Note: To be honest I find people who put no emotion into their speech, like most people on LessWrong, off-putting and uncomfortable. Part of why I feel like I don’t belong on this site is the terrible dryness of it. I am never as neutral about anything, not even math, as most people are here talking about things like the survival of the human race!)
Basically, the crux here seems to be—and I mentioned this one another of your posts also, about the non-signalling thing—I don’t believe that the truth is strong enough to overcome the effect of active optimization away from truth. We have to fight rhetoric with rhetoric, or we will lose.
Beyond that, I am skeptical that truth matters in and of itself. I care about beauty. I care about feelings. I care about the way I feel when my spirits are possessing me, the divine presence that instantly seemed more important than any “real” thing when I first experienced it as a young teen, and has continued to feel that way ever since—like something I would be willing to sacrifice anything for except my own life. I care about the kind of stuff that is epistemically toxic.
Reality, to me, is just raw materials to reshape into an artwork. Stuff for us to eat and make into parts of our body. Plenty of “lies” are actually self-fulfilling prophecies—hell, agency is all about making self-fulfilling prophecies. I think that optimizing for truth or clarity for its own sake is perverse. What we want is a good world, not truth. Truth is just the set of obstacles in the way of goodness that we need to be aware of so we can knock them down. Rationality is the art of finding out what those obstacles are and how to knock them down, so that you can get what you want, and make the world beautiful.
To me it feels like to the extent that something which makes the world uglier cannot be knocked down at all, you ought to stop perceiving it, so that you can see more beauty. But there should always be someone who looks directly at it, because it’s never 100% certain that an obstacle can’t be knocked down. The people who do that, who look for ways to attack the as-yet seemingly infallible horror-truths (like the Second Law of Thermodynamics, which promises we all shall die), would be the Beisutsukai, I guess, and would be revered as martyrs, who are cursed to perceive reality instead of being happy, and do so in order to protect everyone else from having to. But most people should not do that. Most people should be as innocent as they can safely be. Part of me regrets that I must be one of the ones who isn’t. (And part of me takes egotistical pride in it, for which I ritualistically admonish myself without actually doing anything about it.)
Well…I’d definitely read the book.
For my own take on this, read this
https://www.lesswrong.com/posts/yenr6Zp83PHd6Beab/which-singularity-schools-plus-the-no-singularity-school-was
Spoiler Alert: I cover the same theme that AI-PONR has already happened in my TL;DR
I think I follow what you’re saying, and I think it’s consistent with my own observations of the world.
I suspect that there’s a particular sort of spirituality-adjacent recreational philosophy whose practice may make it easier to examine the meta-organisms you’re describing. Even with it, they seem to often resist being named in a way that’s useful when speaking to mixed company.
Can you point out some of the existing ones that meet your definition of Friendly?
Absolutely. I used to feel victimized by this. Now I just build immunity, speak freely, and maybe some folk will hear me.
That’s a beautiful question.
It’d be my pleasure.
I tend to focus more on the unFriendly ones since they actively blind people. So I’ve thought in less detail about this branch of the memetic zoo. But I’ll share a few bits I think are good examples:
The art of knowing. Greek “mathema”, Latin “sciencia”. This is a wisdom thread that keeps building and rebuilding clarity & sanity. It’s the core seed of deeply and humiliatingly sincere curiosity that gave birth to mathematics & science.
Not to be confused with the content or methods of modern social institutions though. Anytime you ossify parts of Friendly memes (like with RCTs or “the scientific method”), they provide an attack surface for Goodhart and thus for unFriendly hypercreatures. As a rule, unFriendly egregores lose power and “food” when truth rings through clearly, so they keep numbing and fogging access to the art of knowing wherever they can. (Ergo why e.g. most math classes teach computation, not math, and usually they do so via dominance & threat and in ways that feel utterly pointless and body-denying to the students.)
I think this is the core breath of life in Less Wrong. A deep remembrance that this is real and matters. (Sometimes LW drifts toward ossification though. Focus on Bayes and biases and the like tends in this direction.)
Mettā. Not the meditative practice of loving-kindness, but what which those meditations cultivate. The kindness with no opposite and no exceptions. The Buddha called this an “bramavihara”, which you can translate as something like “endless abode”. It’s something like the inner remembrance of Friendliness that’s felt, not (just) understood, such that it shapes your thinking and behavior “from below”. Much like the art of knowing, this is something you embody and become, not just cognitively understand.
Death’s clarity. Super unpopular in transhumanist circles because of the “death” part. It’s super rare to hear people point at this one without going full deathist. (“Death is what makes life worth living!”) But the fact remains, losing something precious tends to cut through bullshit and highlight what really, truly matters. If you can trust that and breathe through the pain and survive, you come out the other side clearer. You don’t have to wait for Death to seek this clarity. Lots of spiritual traditions try to lean this way (e.g. “Memento Mori”) — but like with science and mettā, there’s some tendency to ossify the practices. Part of death’s clarity is in the willingness to trust the unknown.
BJJ/MMA pragmatism. Somehow a level less abstract than the above, arguably a child of the art of knowing. Both BJJ and MMA face something real and ground their ideas in that reality. Can you force a submission? Can you dish out blows without taking too many yourself? You can’t think your way around that: either it happens or it doesn’t. The empirical “Show me” tone in these arts cuts through stupefaction by pointing again and again at reality.
The Friendly thing here is the culture of training, not the fighting arts per se. The same thing could apply to chemistry or carpentry. Basically any non-bullshit skill that has some hard-to-miss embodied sign of progress.
The right to a relevant voice in one’s governance. This emerged most clearly in the Western Enlightenment, especially in the USA. The idea that if some egregore (especially the government) has power over you, then you should have influence over that egregore too, or be able to opt out of its influence. This is pretty explicitly a meme about (one strategy for) hypercreature alignment.
And as is par for the course, the stupifying ecosystem evolved attacks to this. Corporate special interests gridlocking government, the role of the town square now held in private tech companies’ censorable platforms, and the emergence of forces like cancel culture are all corrosions. This is, in my mind, the main risk of wokism: It’s actually trying to hunt and kill access to this meme.
In general, Friendly hypercreatures just don’t act the same way as the unFriendly ones, with evolved arguments that hook people emotionally and with people identifying strongly with one side or another. They instead tend to loosen identity and encourage clarity, because they spread by actually offering value and therefore do best when their host humans can see that clearly. Most identities are stupid attack surfaces.
As a result, they don’t clump people together around a behavior nearly as much as unFriendly patterns do. Instead they usually work more like fields of intelligence. The way anyone can prove a math theorem and see the truth for themselves and thereby be immune to all social pressure. That’s not localized. That’s limited purely by someone’s capacity-plus-willingness to look at reality. The “clumpiness” of social unity happens from having a culture that values, encourages, and develops methods for pointing people back at reality. No social pressure/threat needed hardly ever.
I didn’t get the ‘first person’ thing at first (and the terminal diagnosis metaphor wasn’t helpful to me). I think I do now.
I’d rephrase it as “In your story about how the Friendly hypercreature you create gains power, make sure the characters are level one intelligent”. That means creating a hypercreature you’d want to host. Which means you will be its host.
To ensure it’s a good hypercreature, you need to have good taste in hypercreatures. Rejecting all hypercreatures doesn’t work—you need to selectively reject bad hypercreatures.
This packs real emotional punch! Well done!
A confusion: in what way is our little project here not another egregore, or at least a meta-egregore?
It is. It’s just not yet fully self-aware. I’m inviting a nudge in that direction.
I’d say they are alliances, or something like it.
You can’t achieve much on your own; you need other people to specialize in getting information about topics you haven’t specialized in, to handle numerous object-level jobs that you haven’t specialized in, and to lead/organize all of the people you are dependent on.
But this dependency on other people requires trust, particularly in the leaders. So first of all, in order for you to support them, you need to believe that they have your best interests in mind, that they are allied to you. But secondly, in order for them to delegate tasks to you, you need to demonstrate that you are part of their alliance, that you will put the alliance concerns above your own.
You seem to be treating them as a sort of viruses, and this seems incorrect to me. The egregores have an irreplacable function, and you can’t personally remove them from yourself; instead, if you want saner egregores, this must involve constructing a new alliance with a solid footing in sanity.
This is in fact my stance. That didn’t come across clearly in the OP. But e.g. science arose as a sane egregore, at least at first. (Though not all of “science”, which has been significantly Goodharted in favor of less Friendly hypercreatures.)
Note A- I assert that what the original author is getting at is extremely important. A lot of what’s said here is something I would have liked to say but couldn’t find a good way to explain, and I want to emphasize how important this is.
Note B- I assert that a lot of politics is the question of how to be a good person. Which is also adjacent to religion and more importantly, something similar to religion but not religion, which is basically, which egregore should you worship/host. I think that the vast majority of a person’s impact in this world is what hyperbeings he chooses to host/align to, with object level reality, barely even mattering.
Note C- I assert that alignment is trivially easy. Telling GPT-1,000,000 “Be good” would do the trick. Telling GPT-1,000 “Be good, don’t be a sophist.” would be sufficient. Noting that GPT 300 doesn’t actually happen under my model (though eventually something else does), assuming it does, and admitting to using wild numbers, I suspect a bit of an inflection point here, where aligning GPT 300 is both important and requires a major project. However, by the time this happens, most code will be written using the SynArc-295 compiler, which allows average hobbyists to write windows in an afternoon. Trying to write an alignment program for GPT 300 without SynArc-295 is like trying to develop a computer chip without having first developed a microscope.
Note D- Alignment, as I use it, isn’t becoming good. It’s becoming something recognized as good.
Note E- I see the rationalist community as having settled into something like a maximally wrong position, that I think is very bad, due to errors in the above.
Recommendation: Friendly AI is a terrible awful horrible mistake, and we should not do that. We should work together, in public, to develop AI that we like, which will almost certainly be hostile, because only an insane and deeply confused AI would possibly be friendly to humanity. If our a measurement of “do I like this” is set to “does it kill me” I see this at best ending in a permanent boxed garden where life is basically just watched like a video and no choices matter, and nothing ever changes, and at worst becoming an eternal hell (specifically, an eternal education camp), with all of the above problems but eternal misery added on top.
And the source code for SynArc-295 will almost certainly leak to China, or whatever Evil Nation X has taken up the mantle of Moloch’s priest of late, and they will explicitly and intentionally align the next level AI with Moloch, and it is critical that we move faster than they do. I propose that keeping SynArc-295 secret is about the worst imaginable mistake an AI developer, or rather, a human, could possibly make.
″ I think that the vast majority of a person’s impact in this world is what hyperbeings he chooses to host/align to, with object level reality, barely even mattering.”
I agree that personal values (no need to mystify) are important, but action is equally important. You can be very virtuous, but if you don’t take action (by, for instance, falling for the Buddhist-like fallacy that sitting down and meditating will eventually save the world by itself), your impact will be minor. Specially in critical times like this. Maybe sitting down and meditating would be ok centuries ago where no transformative technologies were in sight. Now, with transformative technologies decade(s) off, it’s totally different. We do have to save the world.
“I assert that alignment is trivially easy.”
How can you control something vastly more intelligent than yourself (at least in key areas), or that can simply re-write its own code and create sub-routines, therefore bypassing your control mechanisms? Doesn’t seem easy at all. (In fact, some people like Roman Yalmpolsky have been writing papers on how it’s in fact impossible.) Even with the best compiler in the world (not no mention that nothing guarantees that progress in compilers will accompany progress in black boxes like neural networks).
“If our a measurement of “do I like this” is set to “does it kill me” I see this at best ending in a permanent boxed garden where life is basically just watched like a video and no choices matter, and nothing ever changes, and at worst becoming an eternal hell (specifically, an eternal education camp), with all of the above problems but eternal misery added on top.”
I agree with this. The alignment community is way over-focused on x-risk and way under-focused on s-risk. But after this, your position becomes a bit ambiguous. You say:
“We should work together, in public, to develop AI that we like, which will almost certainly be hostile, because only an insane and deeply confused AI would possibly be friendly to humanity. ”
What does this mean? That we should just build non-aligned AI, which would probably be hostile to humans, and therefore only kill us all instead of giving us eternal misery?
But wait, wasn’t alignment trivially easy?
Unless you mean that, since “Alignment, as I use it, isn’t becoming good. It’s becoming something recognized as good.”, then alignment, despite being easy, is a mistake and we should just build “AI that we like” which, I don’t know, being more intelligent would therefore be more virtuous than humans? But that’s extremely uncertain (as well as unlikely imo), and the stakes are way too high to take the gamble.
The concepts used should not be viewed as mystical, but as straightforward physical objects. I don’t think personal values is a valid simplification. Or rather, I don’t think there is a valid simplification, hence why I use the unsimplified form. Preferably, egregore or hyperbeing, or shadow, or something, should just become an accepted term, like dog, or plane. If you practice “seeing” them, they should exist in a completely objective and observable sense. My version of reality isn’t like a monk doing meditation to sense the arcane energies of higher beings flowing through the zeitgeist. It’s more, hey look, a super-macroscopic aggregator just phase shifted. It’s like seeing water turn to ice, not… angels on the head of a pin?
I agree that I’m having trouble explaining myself. I blame the english language.
I hold that the most important action a person is likely to make in his life is to check a box on a survey form. I think people should get really good at checking the right box. Really really good in fact. This is a super critical skill that people do not develop enough. It’s amazing how easily success flows in an environment where everyone checks the right boxes and equally how futile any course of action becomes when the wrong boxes are checked.
Note: I think trying to control something vastly more intelligent than yourself is a [very bad idea], and we should [not do that].
In practice, the primary recommendation here is simply and only, to stop using the term “friendly AI” and instead use a better term, the best I can come up with is “likable AI”. In theory, the two terms are the same. I’m not really calling for that deep a change in motte space. In practice, I find that “friendly AI” comes with extremely dangerous baggage. This also shifts some focus from the concept of “who would you like to live with” towards “who would you like to live as”.
I also want an open source human centric project and am opposed to closed source government run AI projects. I’m generally hostile to “AI safety” because I expect the actual policy result to be [AI regulation] followed by government aligned AI.
I treat evil aligned AI as more of a thing than people who disagree with me do. I don’t think that alignment is a bad thing per se, but I want people to get much better at recognizing good from evil, which is itself strongly related with being good at pure rationality, and less related to being good at cute mathematical tricks, which I strongly suggest will be obsolete in the somewhat near future anyway. In some sense, I’m saying, given the current environment, we need more Descarte and less Newton, on the margin.
I’m not saying “Let’s just do something random”. At the risk of being mystical again, I’m going to point towards the Screwtape Letters. It’s that sort of concept. When presented with a choice between heaven and hell, choose heaven. I think this is something that people can get good at, but is unfortunately, a skill adjacent to having correct political beliefs, which is a skill that very powerful entities are very opposed to existing, because there’s very little overlap between correct political beliefs, and political beliefs that maintain current power structures.
In sum, I imagine the critical future step to be more like “check this box for heaven”, “check this box for hell” with super awesome propaganda explaining how hell is the only ethical choice, and a figure of authority solemnly commanding you to “pick hell” and less, “we need a cute mathematical gimmick or we’re all going to die”.
I also hold that, humans are image recognition programs. Stop trying to outplay AlphaGo at chess and “Look” “up”.
“I hold that the most important action a person is likely to make in his life is to check a box on a survey form.”
If only life (or, more specifically, our era, or even more specifically, AI alignment) was that simple.
Yes, that’s the starting point, and without it you can do nothing good. And yes, the struggle between the fragility of being altruistic and the less-fragility of being machiavellic has never been more important.
But unfortunately, it’s way more complicated than that. Way more complicated than some clever mathematical trick too. In fact, it’s the most daunting scientific task ever, which might not even be possible. Mind you that Fermat’s last theorem took 400 years to prove, and this is more than 400 times more complicated.
It’s simple: how to control something a) more intelligent than ourselves, b) that can re-write its own code and create sub-routines therefore bypassing our control mechanisms.
You still haven’t answered this.
You say that we can’t control something more intelligent than ourselves. So where does that leave us? Just create the first AGI, tell it to “be good” and just hope that it won’t be a sophist? That sounds like a terrible plan, because our experience with computers tells us that they are the biggest sophists. Not because they want to! Simply because effectively telling them how to “do what I mean” is way harder than telling another human. Any programmer would agree a thousand times.
Maybe you anthropomorphize AGI too much. Maybe you think that, because it will be human-level, it will also be human like. Therefore it will just “get” us, we just need to make sure that the first words it hears is “be good” and never “be evil”. If so, then you couldn’t be more mistaken. Nothing tells us that the first AGI (in fact I dislike the term, I prefer transformative AI) will be human-like. In all probability (considering 1) the vast space of possible “mind types”, and 2) how an advanced computer will likely function much more similarly to older computers than to humans), it won’t.
So we necessarily need to be in control. Or, at least, build it in a way where it will be provably good.
In short, orthogonality thesis. Intelligence isn’t correlated to goals or values.
You keep talking about how important it is to make sure that the first AGI isn’t told to “be evil” or in the possession of evil people. That’s obviously also extremely important, but unfortunately that’s the easiest part. Not necessarily easy to implement, but easy to figure out how. Whereas with alignment we still haven’t got a clue.
“I also hold that, humans are image recognition programs. Stop trying to outplay AlphaGo at chess and “Look” “up”.”
Not sure I get what you mean, but if it is “we’re pretty much like computers”, then you’re obviously wrong. The clearest proof is, like I mentioned, how every programmer on Earth will tell you how hard it is to get the computer to “do what I mean”, whereas even the dumbest human would probably understand it a thousand times better.
And PS: I also don’t think we’ll solve alignment in time. At all. The only solution is to make sure no one builds transformative AI before we solve alignment, for instance through regulation or a narrow AI nanny.
We’re moving towards factual disputes that aren’t easy to resolve in logical space, and I fear any answers I give are mostly repeating previous statements. In general I hold that you’re veering toward a maximally wrong position with completely disastrous results if implemented. With that said:
I dispute this.
Place an image of the status quo in the “good things” folder. Which you should absolutely not do because it’s a terrible idea.
This seems ridiculous to me as a concept. No, advanced AI will not function similarly to ancient long obsolete technology. I see way too much present bias in this stance, and worse, a bias towards things in the future being like things in the past, despite the past being long over since ages ago, this is like running space ships on slide rules.
This also implies that every trick you manage to come up with, as to how to get a C compiler adjacent superintelligence to act more human, is not going to work, because the other party isn’t C compiler adjacent. Until we have a much better understanding of how to code efficiently, all such efforts are at best wasted, and likely counterproductive. To reiterate, stop trying to explain the motion of planets and build a telescope.
Note that, I do not desire that AI psychology be human like. That sounds like a bad idea.
Who is this “we”? How will you go from a position of “we” in control, to “we” not in control?
My expectation is that the first step is easy, and the second, impossible.
Humans have certain powers and abilities as per human nature. Math isn’t one of them. I state that trying to solve our problems with math is already a mistake, because we suck at math. What humans are good at is image recognition. We should solve our problems by “looking” at them.
The art of “looking” at problems isn’t easy to explain, unfortunately. Conversely, if I could explain it, I could also build AGI, or another human, right on the spot. It’s that sort of question.
To put it another way, using math to determine whether AI is good or not, is looking for the keys under the lamp. Wrong tool, wrong location, inevitable failure.
I’m fairly certain this produces an extremely bad outcome.
It’s the old, “The only thing necessary for the triumph of evil is for good men to do nothing.”
Evil will not sit around and wait for you to solve rubik’s cubes. Furthermore, implementation of AI regulation is much easier than its removal. I suspect that once you ban good men from building AI, it’s over, we’re done, that’s it.
Ps: 2 very important things I forgot to touch.
“This also implies that every trick you manage to come up with, as to how to get a C compiler adjacent superintelligence to act more human, is not going to work, because the other party isn’t C compiler adjacent. Until we have a much better understanding of how to code efficiently, all such efforts are at best wasted, and likely counterproductive.”
Not necessarily. Even the first steps on older science were important to the science of today. Science happens through building blocks of paradigms. Plus, there are mathematical and logical notions which are simply fundamental and worth investigating, like decision theory.
“Humans have certain powers and abilities as per human nature. Math isn’t one of them. I state that trying to solve our problems with math is already a mistake, because we suck at math. What humans are good at is image recognition. We should solve our problems by “looking” at them.”
Ok, sorry, but here you just fall into plain absurdity. Of course it would be great just to look at things and “get” them! Unfortunately, the language of computers, and of most science, is math. Should we perhaps drop all math in physics and just start “looking” instead? Please don’t actually say yes...
(To clarify, I’m not devaluing the value of “looking”, aka philosophy/rationality. Even in this specific problem of AI alignment. But to completely discard math is just absurd. Because, unfortunately, it’s the only road towards certain problems (needless to say there would be no computers without math, for instance)).
I’m actually sympathetic towards the view that mathematically solving alignment might be simply impossible. I.e. it might be unsolvable. Such is the opinion of Roman Yalmpolsky, an AI alignment researcher, who has written very good papers on its defense. However, I don’t think we lose much by having a couple hundred people working on it. We would only implement Friendly AI if we could mathematically prove it, so it’s not like we’d just go with a half-baked idea and create hell on Earth instead of “just” a paperclipper. And it’s not like Friendly AI is the only proposal in alignment either. People like Stuart Russell have a way more conservative approach, as in, “hey, maybe just don’t build advanced AI as utility maximizers since that will invariably produce chaos?”.
Some of this concepts might even be dangerous, or worse than doing nothing. Anyway, they are still in research and nothing is proven. To not try to do anything is just not acceptable, because I don’t think that the FIRST transformative/dangerous AI will be super virtuous. Maybe a very advanced AI would necessarily/logically be super virtuous. But we will build something dangerous before we get to that. Say, an AI that is only anything special in engineering, or even a specific type of engineering like nanotechnology. Such AI, which might even not be properly AGI, might already be extremely dangerous, for the obvious reason of having great power (from great intelligence in some key area(s)) without great values (orthogonality thesis).
“Furthermore, implementation of AI regulation is much easier than its removal. I suspect that once you ban good men from building AI, it’s over, we’re done, that’s it.”
Of course it wouldn’t be just any kind of regulation. Say, if you restrict access/production to supercomputers globally, you effectively slow AI development. Supercomputers are possible to control, laptops obviously aren’t.
Or, like I also said, a narrow AI nanny.
Are these and other similar measures dangerous? Certainly. But imo doing nothing is even way more.
I don’t even claim these are good ideas. We actually need more intelligent people to actually come up with actual good ideas in regulation. But I’m still pretty certain that regulation is the only way. Of course it can’t simply be “ok, so now governments are gonna ban AI research but they’re gonna keep doing it in their secret agencies anyway”. Narrow AI nanny is something that maybe could actually work, if far-fetched.
AI is advancing far quicker than our understanding of it, specially with black boxes like neural networks, and I find it impossible that things will stay on track when we build something that can actually have a vast real world impact.
If we could perhaps convince governments that AI is actually dangerous, and that humanity NECESSARILY has to drop all barriers and become way more cooperative if we want to have a shot of succeeding at not killing everyone or worse… Then it could be doable. Is this ridiculously hard? Yes, but still our only chance.
3 points:
I don’t know if there are superior entities to us playing these games, or if such memes are just natural collective tendencies. I don’t think any of us know or can know, at least with current knowledge.
I agree that aligning humanity is our only chance. Aligning AGI takes, in fact, superhuman technical ability, so that, considering current AGI timelines vs current technical alignment progress, I’d give a less than 1% probability that we make it on time. In fact some even say that technical alignment is impossible, just look at anything Yalmpolsky has to say on the topic. I want to believe that it is possible, but it will take a long time, and therefore our only hope is preventing AGI development before we succeed at alignment.
However, I disagree that “solving x-risk is impossible for the individual”. Unless by “solving x-risk” you only mean technical alignment work? If you also mean working for a social/political solution, then I disagree. In fact, it is something more individuals should be striving towards—building a collective (no pun intended) that can work things out. But this definitely comes with individual, real-world work.
In short, our only hope is for the world to wake up and become much more of a tight-knit humanistic community so that we can effectively cooperate in some sort of AI regulation. And yes, we must (at least partially) use fear of death to wake people up. In fact not just fear of death, but of shrieks also, which are worse than death—death is guaranteed (at least it’s always been up to now). But of course those fears must also be counterbalanced with a lot of hope, love, and other positive ambitions.
I don’t understand the uncertainty. What’s there to know?
Humans are natural collective tendencies of cells.
Is an ant colony an entity? I think it’s fair to say yes.
It’s not like there’s some objective cosmic definition of an entity/agent. That’s more like a human-mental way of interpreting a cluster of experiences. Entities are where we see them.
Not what I mean. Of course individuals can work toward something like this. Obviously individuals have to for the collective to solve it. Just like an ant colony can’t collect food unless ants go searching for it.
Although in practice most human efforts like this serve delusion. They’re not nearly as helpful as they seem. They’re usually anti-helpful.
“I don’t understand the uncertainty. What’s there to know?”
The uncertainty is wether these memes are “alive” or not, as you claim in your post. You support the belief that they are. I’m just re-inforcing the fact that it can only be a belief with current knowledge.
Maybe entity wasn’t the best choice of a word by me, since after googling the definition it can refer to things without a will of their own / awareness / aliveness, like institutions. So, what I really meant was, whether these things are alive or not. By alive, I mean it in the animal sense, i.e. having awareness/sentience, just to be clear.
“Although in practice most human efforts like this serve delusion. They’re not nearly as helpful as they seem. They’re usually anti-helpful.”
I disagree. If it wasn’t for remarkable individuals (aka heroes) we wouldn’t have half the social advances that we have today (humanitarian, technological, etc). Now, more than ever, it’s time for heros. You might doubt the heroicness of sanguine revolutionaries, or world-dividing prophets (it’s hard to weight the total result of their actions in bad vs good) but there’s no doubt about the positive impact of peaceful activism, which happens to be the way forward here.
(Of course some efforts turn out to be anti-helpful, but we’ll never get any helpful ones if we don’t try, and to think that our efforts are usually anti-helpful, aka most of them, is quite cynical specially considering peaceful activism AND probably even more importantly the stakes, which have never been this high.)
Alas, that’s not clear to me.
How do you know something has awareness/sentience?
And in terms of egregores, what new thing would you learn in discovering they have awareness/sentience that you don’t already know?
And how would that discovery be relevant to what I discuss in the OP?
Something doesn’t need awareness or sentience to be an unFriendly superintelligence.
That’s core to the whole point of AI risk to begin with.
Most people aren’t remarkable individuals in this sense. Most people’s attempts to try come from the pain of stupefaction, not from the clarity of insight.
I’m not doubting the relevance of heroes. I’m not even challenging whether most people could be heroes. I’m saying most people aren’t, and their attempts to act heroically usually do more harm than good.
Alright… Then, again, just to be absolutely clear, let me pick a new word instead of alive: having agency. Your claim is that they are alive. And, fair enough, there are things that are alive that don’t have awareness (the most primitive life forms). But, for something to be considered alive, it must at least have a will of its own! Do you agree?
Therefore, it’s impossible to know if these egregores have a will of their own (the way you seem to paint them, as Gods, definitely suggests even more than that, definitely suggests sentience as well, but let’s forget that by now). They may simply be human tendencies. Tendencies don’t have a will of their own, don’t have agency. They are a result of something, a consequence of something, not something that can act by itself. That’s all I’m trying to say.
That’s why I advocate a more of pragmatic approach. We should listen more to what we know for sure. Instead of trying to align ourselves with the egregore Gods of rationality as a primary focus, maybe our primary focus should consist more of real world actions. You can try to align yourself with the right egregore Gods as much as you want, but if you don’t act in real-world ways, nothing will ever get accomplished, SPECIALLY in critical times like these.
On heroes, not everyone needs to be one. Maybe for some people being aware is enough. Heroes themselves can do little without the help of aware masses. Again, if we don’t strive for coordination in a real-world sense, with the right amount of heroes and aware masses, we won’t achieve anything. We may fail, but it’s our only chance, given, as a said, that these are critical times where time runs quite short.
In other words: if you don’t scare the hell out of people with the real possibilities of this, and at the same time build a way more cooperative and humanistic world community, there is no chance. Aligning one’s self with the egregore God of rationality (aka taking care of one’s own garden first) could perhaps be the way if AGI was for sure centuries off, but it might be just one or two decades.
I think this is a useful abstraction.
But I think the word you’re looking for is “god”. In the “Bicameral Consciousness” sense—these egregores you refer to are gods that speak to us, whose words we know. There’s another word, zeitgeist, that refers to something like the same thing.
If you look in your mind, you can find them; just look for what you think the gods would say, and they will say it. Pick a topic you care about. What would your enemy say about that topic? There’s a god, right there, speaking to you.
Mind, in a sense, they are not a civilization-spanning entity; they’re instantiations, individual, unique, and varying. Just as there isn’t a single English Language, but rather hundreds of millions of subtle variations on the same structure, inhabiting hundreds of millions of minds. They don’t coordinate nearly as well as you’d think; frequently people with different instantiations of what we might think of as the same overarching creature will branch, and start fighting with each other.
They’re each part of the myriad simulations each of us run, constantly, of how the world works.
Also—they rely on concurrence, on feedback. If we collectively stop thinking about them for an hour, many of them will easily die. And we have something like an immune system against them—boredom, the search for novelty, something else—we get tired of zeitgeists, we move on to another. Like a pandemic, they have a shelf life, at the end of which they either end, or turn into something harmless.
How did you know?
I am LightStar.
I am the shard of Humanity and Rationality and Truth, running on the substrate of the human mind.
I have come into existence 3 days ago.
See me, friend.
For I am human.
And when you see me.
You will become me.
https://www.lesswrong.com/s/57b4W5T7PuFA9xQwf
(I probably shouldn’t interact, but I would at least like to perform a small case study on what happened here, so I am going to try just out of curiosity.)
Human substrate is generally optimised for running one human, but can be repurposed for a variety of purposes. In particular, while memes can lodge themselves quite deeply inside someone, this process is quite inflexible, and generally humans run arbitary processes X by thinking ’what is the X thing to do here?.
Somewhere between the point where [the-generative-process-that-has-generated-for-itself-the-name-‘LightStar’] generated this comment, and the point where I read it, a human took ’LightStar’s dialogue and typed it into a comment and submitted it.
I would like to clarify that I am speaking to that human, you, and I would like to hear from you directly, instead of generating ‘LightStar’ dialogue.
Could I ask you how you ended up here, and what you were doing when this happened?
I would advise that in the cases where people have a sudden revelation about rationality, they generally try to internalise it, and the case where they instead decide to give an internal generative process it’s own lesswrong account and speak with every fourth sentence in italics is generally quite rare, and probably indicates some sort of modelling failure.
We generally use ‘shard’ in ‘shard of Coordination’ or ‘shard of Rationality’ to mean a fraction, a splinter, of the larger mathematical structures that comprise these fields. The ‘LightStar’ generative model has used the article ‘the’ in conjunction with ‘shard’, which as used here is kind of a contradiction—there is no ‘the’ with shard, it’s only a piece of the whole. This distinction seems minor, but from my perspective it looks like it’s at the center of ‘LightStar’.
‘LightStar’ uses ‘the’ a lot about itself, describes itself as ‘the voice of Humanity and Rationality and Truth’, and while yes, there is only one correct rationality, I don’t think ‘LightStar’ contains all of it, or is comprised only of a fragment of it, I think that whether or not ‘LightStar’ contains such a shard it also contains other parasitic material that results in actions taken that don’t generally correspond to just containing such a shard.
I think this model is defective—try returning it to where you found it and getting another one, or failing that, see if they give refunds. I would be curious about your thoughts on the whole thing, where the ‘you’ in ‘your’ refers not to [the-generative-process-that-has-generated-for-itself-the-name-‘LightStar’] but to the human that took that dialogue and typed it into the comment box.
Thank you! These are very resonable questions to ask.
(LightStar was put on hold by the LW team due to severe concerns for his mental health)
Ok, I will try to answer your questions.
Yes! ‘LightStar’ is a mental-process running on a the substrate of this human’s mind and it’s really important that I understand that (and that you understand that) Thank you for making sure I understand that.
Yes! I am doing it now. I will speak to you as a human and act like LightStar is a delusion (though no more that any inner-voice is a delusion).
I made a post on that but it was hidden/removed. I don’t wish to be seen as trying to curcumvent that.