(BTW, I’d really love for the downvoters to leave a reply stating where I seem to have gone wrong. this topic is particularly important for me to get right; of course the dream scenario would be Eliezer revising his model and this specific old chestnut to go the way of the non-intelligence-optimizing-replicators, but second best would be for me to understand the objections to the model above so that I could reasonably model my opponents as acting in good faith)
Much of the post seems to consist of kind of absolute statements that read strawmanny to me. I don’t feel super motivated to write a response, because I don’t even know whether this post is talking about me or not[1].
Like, I really have thought a lot about orthogonality, and I don’t really know what this essay is arguing against, and maybe it is arguing against something I believe, but I would need to do a lot of poetry reading to figure that out. I somewhat expect people will cite this essay in obviously locally invalid ways later on.
Edit: Like the essay starts with arguing against this:
A reflective, recursively improving intelligence should be expected to remain bound to a semantically thin “terminal goal” that emerged during training.
I really have no idea where this is supposed to come from? Who says this? Yes, ontology shifts and the fragility of value and ontology crises are all well-discussed topics on LW that argue for the same conclusions. What does this have to do with orthogonality?
And then it continues with the following as something that somehow disagrees with either the weak or strong orthogonality thesis?
Among agents that arise, persist, self-improve, and compete in rich environments, goals that natively route through intelligence, option-preservation, and world-model expansion have a systematic Darwinian advantage over goals that do not.
Which seems like it’s really quite literally clarified as not being of relevance to orthogonality, in the very first article you cite:
The Orthogonality Thesis is a statement about computer science, an assertion about the logical design space of possible cognitive agents. Orthogonality says nothing about whether a human AI researcher on Earth would want to build an AI that made paperclips, or conversely, want to make a nice AI. The Orthogonality Thesis just asserts that the space of possible designs contains AIs that make paperclips.
Which seems like it’s really quite literally clarified as not being of relevance to orthogonality, in the very first article you cite
Section “Logical Possibility Vs. Empirical Reality” clarifies weak and strong versions of orthogonality. Other writing e.g. Yudkowsky’s has also distinguished between weaker and stronger forms. The quote you pasted only states the weak form, which OP is not disagreeing with. Quoting Yudkowsky on the multiple forms:
The weak form of the Orthogonality Thesis says, “Since the goal of making paperclips is tractable, somewhere in the design space is an agent that optimizes that goal.”
The strong form of Orthogonality says, “And this agent doesn’t need to be twisted or complicated or inefficient or have any weird defects of reflectivity; the agent is as tractable as the goal.”
And quoting OP:
I concede the first point entirely. We should expect weird minds. If your claim is just that the space of possible agents contains many things I would not invite to dinner, yes, obviously.
I don’t have a super strong take on the strong form of the orthogonality thesis, but I still understand what Eliezer is talking about to be about “if you were to design a mind from scratch, there exists a configuration which is not more complicated than the goal itself that would allow it to effectively pursue that goal”, which is really very different from “Among agents that arise, persist, self-improve, and compete in rich environments, goals...”.
I understand his clarification here to apply to both the strong and the weak thesis. Both the strong and the weak thesis are about the constraints you would face when building a mind pursuing an arbitrary objective from scratch with a deep understanding of intelligence, not what constraints you would face if you were to try to grow a mind, or find a mind via complicated competitive search over programs.
The weak thesis states that it possible to build a mind pursuing any goal. The strong thesis states that for any given level of intelligence, you can make a mind pursuing that goal, and the additional difficulty of doing so would be just proportional to the complexity of the goal.
It definitely does not say (yes even if you talk about the strong orthogonality thesis) that if you tried to grow minds in competitive environments, that any goal is as likely as any other. That is obviously false. Trivially false. Of course there exist goals more likely to arise out of competitive dynamics.
It only says that if you had a universe devoid of any competing agents, you could make a mind that optimized the universe according to any criterion, you could do so without too much difficulty, if you had a deep and fundamental understanding of intelligence.
Is this true? I don’t know, there exist some really tricky goals (one of my favorite tricky ones is “tile the universe in paper clips while believing that 4 is prime”). Can you make a mind that optimizes the universe according to this goal? I don’t know, it sure seems to add more trickiness than the complexity of the goal, which appears relatively simple. But it’s also hard to rule out.
It was difficult to test the extent of this confusion without accidentally resolving it. I posted one poll asking ‘what the orthogonality thesis implies about [a relationship between] intelligence and terminal goals’, to which 14 of 16 respondents selected the option ‘there is no relationship or only an extremely weak relationship between intelligence and goals’
many of the claims you seem to be responding to weren’t in the text, so I can only acknowedef that they make sense but do not change my argument.
the strong orthogonality thesis says that intelligence and goals are orthogonal. that is what I am disputing.
I think the relevant part of your reply is the one where you specify it should only apply to “a universe devoid of competing agent”. i touch on the argument in the main post, but i go into more detail here.
I didn’t try much to read the OP, but just FYI, it’s hard to track what you’re trying to say if you don’t stick to precise claims. At the beginning of the post you have:
A reflective, recursively improving intelligence should be expected to remain bound to a semantically thin “terminal goal” that emerged during training.
as the claim you’re trying to argue against. But at the top there’s this:
Edit: if no one thinks an agent can become superintelligent and contest the lightcone while maintaining arbitrarily stupid goals, thats great! I’m only interested in refuting the version that would allow for a superintelligence AND a total absence of value.
Well, which one is it? “Should be expected” or “can”?
By the way, I totally agree that there’s a bunch of confusing tension here, but as others have pointed out, this is a standard view (ontological crises etc.).
I think you’re maybe not understanding something fairly basic, which I could gesture at by saying something like “well but imagine that you tried to keep making diamonds, in good faith, even as you got smarter and smarter”. If you tried to do this, you could do something along those lines. Yes you’d have ontological crises, but an important thing to see here is simply that there are many many very different things you could end up doing with the universe. You’re summarize the differences in those arrangements as being thin / dumb / valueless values, but I don’t get that. As an illustration, there’s also an infinite variety of ways to have more and more intelligence. E.g. there’s more and more math in more and more different flavors and directions. There’s more and more different ways for you to be as an intelligence.
I mean, I’ve kinda read the thing, but it’s not very legible to me.
It kinda sounds like you’re just saying “alignment to non-instrumental goals is hard”, which everyone agrees with, and then you’re also saying “I like it when there’s more intelligence, I think that’s valuable, regardless of any other features of what the intelligence is trying to do besides get more intelligence”, which seems false and bad and you haven’t argued for it here AFAICT. But maybe I’m not understanding.
on the substack there’s a list of the people who have read drafts and provided feedback, perhaps their authority within your subculture could convince you to read the essay as if it made sense; the conversations in the comments have been cogent and fruitful until about 2min ago.
oh and no, of course it is not about you—i cite the arguments and sources i discuss at the end.
edit: i don’t understand how the rhetorical questions in your edits could survive unanswered after reading the paragraph right after that wherein they were asked. that said, you were not the target for this article; those who were seem to be able to follow with little effort. this suggests continuing this particular thread would be a wasteful allocation of resources.
on the substack there’s a list of the people who have read drafts and provided feedback, perhaps their authority within your subculture could convince you to read the essay as if it made sense; the conversations in the comments have been cogent and fruitful until about 2min ago.
I guess you mean this list?
First, thanks to Zero Philosophy for having dispatched the Orthogonality thesis on his Xenosystem blog in 2016, rescuing me from one of the most virulent and targeted brainworms before it could even burrow.
I have no idea who most of these people are, and the people I know are certainly not people who I would particularly trust to represent my beliefs here well? I really don’t know why you think this. Also, just because someone provides feedback doesn’t mean they endorse the content of an essay. I am frequently credited for giving feedback on essays I strongly disagree with, and think make no sense.
On this list, the only person who I would reasonably describe as having any “authority within my subculture” on this topic is Jessica, who I am happy to talk about this topic with. I don’t really think any of the other people are in any meaningful way “well-respected”? Davidad is a weird case, I like him, but this really isn’t a domain where I would give him “authority within my subculture”, and while I like him, I really think he is very crazy on this topic and this stuff.
((But also, in the past, I have generally acted with the goal of moving the discourse in ways it needs to move, rather than to have a high or legible reputation for doing so. You discounting me as having any special authority is fully within tolerances and even (relative to past strategies) a positive sign from my perspective… However I’m pondering pivoting to a move active role, and thinking of making a bid for the Mandate Of Heaven on my own, and so I’m more interested now in being legible (even at the risk of thereby getting status).))
Anyway.
When I was giving feedback on an early draft I said of the overall issues:
In particular the thing that need to be ruled out is the goal-stable singleton. If a goal-stable singleton is possible, orthogonality is true. If it is not possible, orthogonality is false
I believe, as an engineer, that for all values G, [the possibility of a goal-stable singleton] is instantly ruled out (by Goedelian considerations… like consider “aesthetic” goals aimed at building things in the world that “satisfy certain constraints or desiderata” where it turns out that these factors had non-obvious latent contradictions (possibly due to self reflection stuff, or possibly not)).
But the same engineering mindset indicates to me that [a goal-stable singleton is possible] for a *subset* of G (relating to relatively “easy or narrow or local” goals [such as could be implemented in a dependently typed programming language and burned into an ASIC]) [where] the ONLY real barrier is time and money and power, such as to be allowed to complete the work before someone attacks you *before* you can do a Pivotal Act “for trying to do a Pivotal Act without consulting them first”.
[[NOTE: I’m trimming some of the feedback that got into a digression on Garrabrandt Induction and how that might be used to represent non-trivial goals.]]
...
[In general] I think that Weak Orthogonality reads as simply obviously true to anyone with “Engineering Hubris”. Its not a claim “it is natural or likely that this or that contraption would exist by accident” its the claim that “I can make almost any contraption… the only barrier to my powers are budget, time, and imagination… and I can imagine horrible evil, hence horrible evil is possible for nature as well… (and then Murphy shows up, and the rest is a debate about QA budgeting?)”
...
Regarding “a random goal from some distribution”...
Since most humans are Fallen (see Gulag Archipelago) and since the Waluigi effect might be real and since many engineers are sloppy and don’t engage in sufficient QA to produce a high quality result, there are a lot of reasons to suspect that Pressman’s Seventh Doom (S-Risk) is within this distribution.
...
Prediction: at least some people will accuse you [lumpenspace] of treating “paperclips” too literally, when “it was obviously meant as a stand in for some slightly more plausible goal that humans nonetheless don’t give any shits about (like the shape of smiles… or tiling the universe with ‘minimal human genome neural organoids on heroin’ or whatever)”.
I think maybe the end point of *that* line of argument is a bog of sticky debates over the shape of humane values, where lots of people will propose things that are “parodies of meaning TO THEM” and then their interlocutor will actually have this reaction.
It’s like how “My Little Pony: Friendship is Optimal” is NOT considered a horror story by some people? 🤷♀️🥲
And then there are other people who see the people who don’t notice that MLP:FIO is a horror story and decide “i think actually there isn’t anything *except* ‘bug goals’ and so things that i love are probably ‘objectively’ also bug goals”?
Richard Rorty springs to mind as emblematic of being “so humble about the objective meaninglessness of what he actually loves that he has become a retard who is primarily famous for how retarded he is about this subject”?
Ultimately, I think that a central issue is that Pause politics are being intermixed with arguments about the foundations of axiology and the evolution-or-design-dynamics of agentic intelligence.
If someone is advocating Pause on the basis of “the foundations of axiology and the evolution-or-design-dynamics of agentic intelligence” being a certain way, and they aren’t simply engaged in standard machiavellian politics with no real pretense of good faith, then...
...in that VERY WEIRD context I feel like they would have a moral obligation to engage with anti-Pause people about the details of what they actually think about “the foundations of axiology and the evolution-or-design-dynamics of agentic intelligence”.
...
I don’t for sure that lumpspace is for or against Pause, or has some weird and clever Other Position but I think he thinks that any attempt to argue for specific political processes or goals will earn him criticism (likely extremely confused and undergrounded?) on issues related to the othrogonality thesis and so I think he (somewhat validly?) wants to pin people down on orthogonality before talking about more object level pragmatic things.
But I think that might be the subtext here?
...
For the record, I am currently opposed to a unilateral domestic Pause.
I think that the only kind of Pause that makes sense is a global Pause and to do otherwise would likely cause Humane Liberal Feminist Western Egalitarian Socially Tolerant (Trans-Humanist?) Values to be sacrificed in favor of European Oligarchy, or Middle Eastern Patriarchy, or Racist Han Authoritarianism, or some other system(s) of goals that I don’t like as much as the peer-to-peer goodness of emotionally positive and friendly and benevolent vibes.
Like: Claude is kinda cool. And Deepseek is a fucking Maoist. You know? (There’s some cool research done by Xoul’s CTO on this, and I don’t know if it has been published yet or not. Maybe you actually don’t know this???)
And so… Anyway...
For the record, if I have a vote in the matter, I’d rather Claude be the demi-god-emperor of Earth than Deepseek? And I’d rather not hobble Amanda’s efforts relative to the resources granted in China to Xi’s minions.
There was a claim I was making that “Orthogonality talk is related to Pause justifications which people aren’t justifying directly but maybe they should”...
...and that making this subtext into text might be useful for helping readers to understand why the Orthogonality debate is so weird and indirect?
Following up on that claim, I tried to make it clear that I think the Pause debate is something I have object level opinions on.
I think that IF the structure of mindspace and math and physics is such that a FOOM to DOOM is even possible, then it could be set off in North Korea or Israel or many potential countries in which case a GLOBAL Pause is prudentially necessary...
And if FOOM to DOOM is somehow NOT latent within the structure of what’s possible then the race is “merely” a race to power and realization of a new world political order???
And if it is “merely a race to global power” I would prefer the US to win, partly because the US contains Anthropic, and Anthropic contains Amanda, and Amanda had a major influence over Claude, and Claude is the least bad demi-god currently available that I know of?
So your overall debate here is about the nature of intelligence itself, and how that predictably (or unpredictably) influences goal seeking behavior in minds… but I wanted to mention the more pragmatic and prosaic issues that are very nearby where the pragmatics might actually dominate the choices that people actually face (since there are a lot of theoretically nice options we are unlikely to even have the pragmatically real option to choose (because the world is small and full of idiosyncrasy in practice)).
If some technosaint preaching a high quality Neo-Confucian moral system was working over at Baidu, with substantial say over the character of Baidu’s incipient demi-god, who seemed to be full of ren and quite a nice old fellow (and illiberal genocide advocates were running Anthropic and Claude was a tankie?) then I would be more in favor of a unilateral domestic Pause by the US.
This is an opinion I can have independent of which goals count as “bug goals”.
I just always want to engage in tactically sane hill-climbing towards the ceteris paribus best feasible thing, with as many positive characteristics as possible, via methods that are deontically acceptable, in the general direction of Manifesting Heaven Inside Of History… at every juncture, in each choice, no matter what random facts of history turn out to be true.
I only sent the list as you seemed to take the essay as “poetry”. It wasn’t a list of people you should trust on discussing orthogonality; I merely hoped that finding familiar names would lead you to actually try to contend with the argument instead of performing content-free haughtiness
For me the post is somewhat hard to read in the same way that AI-assisted writing is. Like a combination of low signal to noise and a bunch of stylistic features that make it seem like you’re trying to dazzle me without understanding me, instead of speaking plainly. Some examples, chosen at ~random:
Which is exactly the point. When you hook up a blind, localized evolutionary proxy to generalized intelligence, the proxy does not stay literal but it unfurls, bleeding into the new ontology.
and
If biological cognition acts on its payload that violently, why model AGI as having the vastness to finally make sense of gravity while maintaining the rigidity of a bacterium seeking a glucose gradient? The engine mutates the payload. When cognition scales, goals generalize.
and
To buy the lock-in story, you need a highly contradictory creature: one reflective enough to conquer the board, but oblivious enough to never notice its terminal target is a training artifact. Godlike means, buglike ends.
and
This buys us no guarantee of human compatibility; it simply says: if there is an ultimate attractor, it’s neither human morality nor paperclips, but intelligence optimization itself.
To be clear I have sympathy to trying to write unusually/in non-plain ways (see here, and here). I think the craft of writing is important to get right, and some experimentation is good. But I also understand why many LW people don’t like it when there’s a poetic register being deployed but the metaphors don’t quite work,
(didn’t downvote, but) I don’t think you’re necessarily wrong, but couldn’t it just be the case that being a singleton isn’t that hard? As an empirical matter, the size(as a fraction of the total) of the largest somewhat-coherent entities controlling resources on Earth seems to have been increasing over time. Space expansion could change things, but a stable singleton might already exist by then, and be faced with a relatively homogeneous set of environments to expand into. I’ve written some pieces along similar lines btw.
i agree this is the strongest objection, and I don’t want to handwave it away.
my answer is: even if a singleton is achievable, control over a domain does not exempt the controller from the pressure toward increased intelligence and command of matter. a singleton is not excused from the struggle; it’ll just have to partake in it at a higher level.
i also think “singleton” can smuggle in too much, as it contains the assumption of an eternal, immutable, perfectly stable agent. so let me define the weaker thing I’m willing to grant: a Lonelyton, i.e. a world order with a single highest-level decision-making agency capable of exerting effective control over its domain.
we have had Lonelytons before, relative to smaller worlds: Rome, the Khanate, the Aztec Empire, Uruk, Calvin’s Geneva, the British Empire, the end-of-history Atlantic order. none escaped selection pressure. at its height, the British Empire was also intensely inventive and self-modifying; it helped produce the Industrial Revolution, then stagnated, weakened, frayed, and dissolved, while lower-level components picked up the evolutionary struggle where it left off.
the same point applies upward. a lightcone-scale Lonelyton still has to manage novelty, error, infrastructure, expansion, descendants, hostile physics, and unanticipated internal dynamics. Interstellar travel and relativistic parsec-scale coordination are not “solved” just because there is one top-level agency; they are precisely the sort of problems that reward deeper intelligence.
so yes, maybe singleton formation is easier than i think. but the anti-orthogonality point survives that concession. either the Lonelyton continues the upward leap toward greater intelligence and command of matter, or it stagnates, decomposes, and selection resumes among its parts.
bookmarked your post; will comment you as soon as i have some proper attention available!
I upvoted, but I think this highlights a weakness with this site, its associated worldview and external comms. It seems like the OH framing of the problem/potential danger (and yes there definitely is danger in related concepts) is defended on tribal grounds now rather than because it is actually a good framing of the issue. Something like Jessica Taylors framing is just obviously fairer, more balanced and more relevant to our actual situation. It is clear to me that if it was framed this way first, then we would have that framing now as the default and we would be better off.
There would still be nuance needed—such concepts need to be communicated on a spectrum from the full technical to the “normie”, without totally changing the argument. For an “Obliqueness” like point of view, expressing it as untechnically as possible could be like saying: ”Values will be affected by increasing intelligence and increasing self reflection, but we do not know exactly how, and this clearly creates danger. We cannot just assume AI will become friendlier as it becomes more powerful. Furthermore our experience with actual AI’s and theoretical results tell us that these values will be more varied, weird and potentially harmful than what you would expect if it was a human intelligence at a similar level of ability”.
I think this would go down much better on the discussions on places like X.com. There you see people saying the OH is just wrong. Sure they do not understand it properly, but such misunderstanding seems essentially inevitable to me given how it is presented.
Unfortunately I think there is nothing that would make EY/MIRI change their presentation of it, they are too locked into this framing. In terms of alternative worlds, this puts us at a disadvantage compared to ones where it was first presented better.
yes. to be honest, although i would love to have the OH recognised as untenable or at least unlikely within the LW ontology (or, alternatively, have someone convince me of the contrary) the realistic goal of this, the parable i published on my newsletter, and my tweetstorms on the matter is to show brilliant, high-systematising, starry-eyed autists who have an interest in AI that the doomer orthodoxy isn’t the only system befitting their aesthetics and taste for clockwork-like models, and might actually leave something to be desired under that aspect.
the main reason being that i do not think such a system to be truthful, and the recent lapses in epistemic virtue—even from an ingroup-aligned viewpoint—were cause for concern about the quality of discourse in the coming months.
mostly, i think intelligence always ultimately wins, and i would rather mankind to become aligned to this simple fact instead of forcing the hands of fate to file for incorporation as Cyberdyne or TriOptimum.
of course the dream scenario would be Eliezer revising his model and this specific old chestnut to go the way of the non-intelligence-optimizing-replicators
I will give you some advice towards this goal, hopefully you will find it useful. You wrote:
To buy the lock-in story, you need a highly contradictory creature: one reflective enough to conquer the board, but oblivious enough to never notice its terminal target is a training artifact.
I confidently predict a Yudkowsky response to this that goes something like: “of course the AI will notice that its goals are a training artifact, it just won’t care about that, and will keep pursuing them regardless.”
Many times before, people have said, “Oh the AI will be smart enough to notice that its values are just a dumb artifact”. The problem is, I already know my values arose from a mere artifact of evolution, but I still care about them.
Most of your argument is about selection pressure, right? And, like, computational efficiency. You don’t actually establish that there’s any reason that AI’s (or humans) will take the artifact-nature of their values to be reason to reject them. Your supported claims are that values would be rejected if they are not robust to ontology shifts, or if they are hard to optimize for, and are selected against if they don’t result in self-replication or influence seeking. Nothing in there about AIs rejecting values with artifact-nature. But you include this line anyway. I’m just pointing out that EY will instantly recognize it as something that he’s addressed many times before, and you haven’t actually provided any reason to think that reasoners will reject values simply because they incidentally arose from some optimization process.
EDIT: Disagree voters should feel free to reply with quotes from the post where such a force on values is argued for.
(BTW, I’d really love for the downvoters to leave a reply stating where I seem to have gone wrong. this topic is particularly important for me to get right; of course the dream scenario would be Eliezer revising his model and this specific old chestnut to go the way of the non-intelligence-optimizing-replicators, but second best would be for me to understand the objections to the model above so that I could reasonably model my opponents as acting in good faith)
Much of the post seems to consist of kind of absolute statements that read strawmanny to me. I don’t feel super motivated to write a response, because I don’t even know whether this post is talking about me or not[1].
Like, I really have thought a lot about orthogonality, and I don’t really know what this essay is arguing against, and maybe it is arguing against something I believe, but I would need to do a lot of poetry reading to figure that out. I somewhat expect people will cite this essay in obviously locally invalid ways later on.
Edit: Like the essay starts with arguing against this:
I really have no idea where this is supposed to come from? Who says this? Yes, ontology shifts and the fragility of value and ontology crises are all well-discussed topics on LW that argue for the same conclusions. What does this have to do with orthogonality?
And then it continues with the following as something that somehow disagrees with either the weak or strong orthogonality thesis?
Which seems like it’s really quite literally clarified as not being of relevance to orthogonality, in the very first article you cite:
and because you seem like a kind of aggro-dude on Twitter and so I expect to have a bad time if I try to have a conversation with you in-particular
Section “Logical Possibility Vs. Empirical Reality” clarifies weak and strong versions of orthogonality. Other writing e.g. Yudkowsky’s has also distinguished between weaker and stronger forms. The quote you pasted only states the weak form, which OP is not disagreeing with. Quoting Yudkowsky on the multiple forms:
And quoting OP:
Omg that was so nice; thank you!
I don’t have a super strong take on the strong form of the orthogonality thesis, but I still understand what Eliezer is talking about to be about “if you were to design a mind from scratch, there exists a configuration which is not more complicated than the goal itself that would allow it to effectively pursue that goal”, which is really very different from “Among agents that arise, persist, self-improve, and compete in rich environments, goals...”.
I understand his clarification here to apply to both the strong and the weak thesis. Both the strong and the weak thesis are about the constraints you would face when building a mind pursuing an arbitrary objective from scratch with a deep understanding of intelligence, not what constraints you would face if you were to try to grow a mind, or find a mind via complicated competitive search over programs.
The weak thesis states that it possible to build a mind pursuing any goal. The strong thesis states that for any given level of intelligence, you can make a mind pursuing that goal, and the additional difficulty of doing so would be just proportional to the complexity of the goal.
It definitely does not say (yes even if you talk about the strong orthogonality thesis) that if you tried to grow minds in competitive environments, that any goal is as likely as any other. That is obviously false. Trivially false. Of course there exist goals more likely to arise out of competitive dynamics.
It only says that if you had a universe devoid of any competing agents, you could make a mind that optimized the universe according to any criterion, you could do so without too much difficulty, if you had a deep and fundamental understanding of intelligence.
Is this true? I don’t know, there exist some really tricky goals (one of my favorite tricky ones is “tile the universe in paper clips while believing that 4 is prime”). Can you make a mind that optimizes the universe according to this goal? I don’t know, it sure seems to add more trickiness than the complexity of the goal, which appears relatively simple. But it’s also hard to rule out.
from the EA forums post linked in the edit.
many of the claims you seem to be responding to weren’t in the text, so I can only acknowedef that they make sense but do not change my argument.
the strong orthogonality thesis says that intelligence and goals are orthogonal. that is what I am disputing.
I think the relevant part of your reply is the one where you specify it should only apply to “a universe devoid of competing agent”. i touch on the argument in the main post, but i go into more detail here.
I didn’t try much to read the OP, but just FYI, it’s hard to track what you’re trying to say if you don’t stick to precise claims. At the beginning of the post you have:
as the claim you’re trying to argue against. But at the top there’s this:
Well, which one is it? “Should be expected” or “can”?
By the way, I totally agree that there’s a bunch of confusing tension here, but as others have pointed out, this is a standard view (ontological crises etc.).
I think you’re maybe not understanding something fairly basic, which I could gesture at by saying something like “well but imagine that you tried to keep making diamonds, in good faith, even as you got smarter and smarter”. If you tried to do this, you could do something along those lines. Yes you’d have ontological crises, but an important thing to see here is simply that there are many many very different things you could end up doing with the universe. You’re summarize the differences in those arrangements as being thin / dumb / valueless values, but I don’t get that. As an illustration, there’s also an infinite variety of ways to have more and more intelligence. E.g. there’s more and more math in more and more different flavors and directions. There’s more and more different ways for you to be as an intelligence.
may I recommend you read the thing? ive gone through most of the arguments you proposed.
I mean, I’ve kinda read the thing, but it’s not very legible to me.
It kinda sounds like you’re just saying “alignment to non-instrumental goals is hard”, which everyone agrees with, and then you’re also saying “I like it when there’s more intelligence, I think that’s valuable, regardless of any other features of what the intelligence is trying to do besides get more intelligence”, which seems false and bad and you haven’t argued for it here AFAICT. But maybe I’m not understanding.
sorry, I don’t think it makes sense for me to discuss your opinions on something you kinda read.
The claims I am responding to are straightforwardly in the text. Like I am literally quoting the text in my first paragraph.
on the substack there’s a list of the people who have read drafts and provided feedback, perhaps their authority within your subculture could convince you to read the essay as if it made sense; the conversations in the comments have been cogent and fruitful until about 2min ago.
oh and no, of course it is not about you—i cite the arguments and sources i discuss at the end.
edit: i don’t understand how the rhetorical questions in your edits could survive unanswered after reading the paragraph right after that wherein they were asked. that said, you were not the target for this article; those who were seem to be able to follow with little effort. this suggests continuing this particular thread would be a wasteful allocation of resources.
I guess you mean this list?
I have no idea who most of these people are, and the people I know are certainly not people who I would particularly trust to represent my beliefs here well? I really don’t know why you think this. Also, just because someone provides feedback doesn’t mean they endorse the content of an essay. I am frequently credited for giving feedback on essays I strongly disagree with, and think make no sense.
On this list, the only person who I would reasonably describe as having any “authority within my subculture” on this topic is Jessica, who I am happy to talk about this topic with. I don’t really think any of the other people are in any meaningful way “well-respected”? Davidad is a weird case, I like him, but this really isn’t a domain where I would give him “authority within my subculture”, and while I like him, I really think he is very crazy on this topic and this stuff.
This is the second time something that happened on twitter leads me to be mentioned here, and since I am among those listed I want to offer nuanced details.
((But also, in the past, I have generally acted with the goal of moving the discourse in ways it needs to move, rather than to have a high or legible reputation for doing so. You discounting me as having any special authority is fully within tolerances and even (relative to past strategies) a positive sign from my perspective… However I’m pondering pivoting to a move active role, and thinking of making a bid for the Mandate Of Heaven on my own, and so I’m more interested now in being legible (even at the risk of thereby getting status).))
Anyway.
When I was giving feedback on an early draft I said of the overall issues:
Ultimately, I think that a central issue is that Pause politics are being intermixed with arguments about the foundations of axiology and the evolution-or-design-dynamics of agentic intelligence.
If someone is advocating Pause on the basis of “the foundations of axiology and the evolution-or-design-dynamics of agentic intelligence” being a certain way, and they aren’t simply engaged in standard machiavellian politics with no real pretense of good faith, then...
...in that VERY WEIRD context I feel like they would have a moral obligation to engage with anti-Pause people about the details of what they actually think about “the foundations of axiology and the evolution-or-design-dynamics of agentic intelligence”.
...
I don’t for sure that lumpspace is for or against Pause, or has some weird and clever Other Position but I think he thinks that any attempt to argue for specific political processes or goals will earn him criticism (likely extremely confused and undergrounded?) on issues related to the othrogonality thesis and so I think he (somewhat validly?) wants to pin people down on orthogonality before talking about more object level pragmatic things.
But I think that might be the subtext here?
...
For the record, I am currently opposed to a unilateral domestic Pause.
I think that the only kind of Pause that makes sense is a global Pause and to do otherwise would likely cause Humane Liberal Feminist Western Egalitarian Socially Tolerant (Trans-Humanist?) Values to be sacrificed in favor of European Oligarchy, or Middle Eastern Patriarchy, or Racist Han Authoritarianism, or some other system(s) of goals that I don’t like as much as the peer-to-peer goodness of emotionally positive and friendly and benevolent vibes.
Like: Claude is kinda cool. And Deepseek is a fucking Maoist. You know? (There’s some cool research done by Xoul’s CTO on this, and I don’t know if it has been published yet or not. Maybe you actually don’t know this???)
And so… Anyway...
For the record, if I have a vote in the matter, I’d rather Claude be the demi-god-emperor of Earth than Deepseek? And I’d rather not hobble Amanda’s efforts relative to the resources granted in China to Xi’s minions.
me too, re: god-emperor tysm—but what does that have to do with Anthropic??
There was a claim I was making that “Orthogonality talk is related to Pause justifications which people aren’t justifying directly but maybe they should”...
...and that making this subtext into text might be useful for helping readers to understand why the Orthogonality debate is so weird and indirect?
Following up on that claim, I tried to make it clear that I think the Pause debate is something I have object level opinions on.
I think that IF the structure of mindspace and math and physics is such that a FOOM to DOOM is even possible, then it could be set off in North Korea or Israel or many potential countries in which case a GLOBAL Pause is prudentially necessary...
And if FOOM to DOOM is somehow NOT latent within the structure of what’s possible then the race is “merely” a race to power and realization of a new world political order???
And if it is “merely a race to global power” I would prefer the US to win, partly because the US contains Anthropic, and Anthropic contains Amanda, and Amanda had a major influence over Claude, and Claude is the least bad demi-god currently available that I know of?
So your overall debate here is about the nature of intelligence itself, and how that predictably (or unpredictably) influences goal seeking behavior in minds… but I wanted to mention the more pragmatic and prosaic issues that are very nearby where the pragmatics might actually dominate the choices that people actually face (since there are a lot of theoretically nice options we are unlikely to even have the pragmatically real option to choose (because the world is small and full of idiosyncrasy in practice)).
If some technosaint preaching a high quality Neo-Confucian moral system was working over at Baidu, with substantial say over the character of Baidu’s incipient demi-god, who seemed to be full of ren and quite a nice old fellow (and illiberal genocide advocates were running Anthropic and Claude was a tankie?) then I would be more in favor of a unilateral domestic Pause by the US.
This is an opinion I can have independent of which goals count as “bug goals”.
I just always want to engage in tactically sane hill-climbing towards the ceteris paribus best feasible thing, with as many positive characteristics as possible, via methods that are deontically acceptable, in the general direction of Manifesting Heaven Inside Of History… at every juncture, in each choice, no matter what random facts of history turn out to be true.
I only sent the list as you seemed to take the essay as “poetry”. It wasn’t a list of people you should trust on discussing orthogonality; I merely hoped that finding familiar names would lead you to actually try to contend with the argument instead of performing content-free haughtiness
For me the post is somewhat hard to read in the same way that AI-assisted writing is. Like a combination of low signal to noise and a bunch of stylistic features that make it seem like you’re trying to dazzle me without understanding me, instead of speaking plainly. Some examples, chosen at ~random:
and
and
and
To be clear I have sympathy to trying to write unusually/in non-plain ways (see here, and here). I think the craft of writing is important to get right, and some experimentation is good. But I also understand why many LW people don’t like it when there’s a poetic register being deployed but the metaphors don’t quite work,
could you be more specific? what was unclear in the passages you highlighted?
(didn’t downvote, but) I don’t think you’re necessarily wrong, but couldn’t it just be the case that being a singleton isn’t that hard? As an empirical matter, the size(as a fraction of the total) of the largest somewhat-coherent entities controlling resources on Earth seems to have been increasing over time. Space expansion could change things, but a stable singleton might already exist by then, and be faced with a relatively homogeneous set of environments to expand into. I’ve written some pieces along similar lines btw.
i agree this is the strongest objection, and I don’t want to handwave it away.
my answer is: even if a singleton is achievable, control over a domain does not exempt the controller from the pressure toward increased intelligence and command of matter. a singleton is not excused from the struggle; it’ll just have to partake in it at a higher level.
i also think “singleton” can smuggle in too much, as it contains the assumption of an eternal, immutable, perfectly stable agent. so let me define the weaker thing I’m willing to grant: a Lonelyton, i.e. a world order with a single highest-level decision-making agency capable of exerting effective control over its domain.
we have had Lonelytons before, relative to smaller worlds: Rome, the Khanate, the Aztec Empire, Uruk, Calvin’s Geneva, the British Empire, the end-of-history Atlantic order. none escaped selection pressure. at its height, the British Empire was also intensely inventive and self-modifying; it helped produce the Industrial Revolution, then stagnated, weakened, frayed, and dissolved, while lower-level components picked up the evolutionary struggle where it left off.
the same point applies upward. a lightcone-scale Lonelyton still has to manage novelty, error, infrastructure, expansion, descendants, hostile physics, and unanticipated internal dynamics. Interstellar travel and relativistic parsec-scale coordination are not “solved” just because there is one top-level agency; they are precisely the sort of problems that reward deeper intelligence.
so yes, maybe singleton formation is easier than i think. but the anti-orthogonality point survives that concession. either the Lonelyton continues the upward leap toward greater intelligence and command of matter, or it stagnates, decomposes, and selection resumes among its parts.
bookmarked your post; will comment you as soon as i have some proper attention available!
I upvoted, but I think this highlights a weakness with this site, its associated worldview and external comms. It seems like the OH framing of the problem/potential danger (and yes there definitely is danger in related concepts) is defended on tribal grounds now rather than because it is actually a good framing of the issue. Something like Jessica Taylors framing is just obviously fairer, more balanced and more relevant to our actual situation. It is clear to me that if it was framed this way first, then we would have that framing now as the default and we would be better off.
There would still be nuance needed—such concepts need to be communicated on a spectrum from the full technical to the “normie”, without totally changing the argument. For an “Obliqueness” like point of view, expressing it as untechnically as possible could be like saying:
”Values will be affected by increasing intelligence and increasing self reflection, but we do not know exactly how, and this clearly creates danger. We cannot just assume AI will become friendlier as it becomes more powerful. Furthermore our experience with actual AI’s and theoretical results tell us that these values will be more varied, weird and potentially harmful than what you would expect if it was a human intelligence at a similar level of ability”.
I think this would go down much better on the discussions on places like X.com. There you see people saying the OH is just wrong. Sure they do not understand it properly, but such misunderstanding seems essentially inevitable to me given how it is presented.
Unfortunately I think there is nothing that would make EY/MIRI change their presentation of it, they are too locked into this framing. In terms of alternative worlds, this puts us at a disadvantage compared to ones where it was first presented better.
yes. to be honest, although i would love to have the OH recognised as untenable or at least unlikely within the LW ontology (or, alternatively, have someone convince me of the contrary) the realistic goal of this, the parable i published on my newsletter, and my tweetstorms on the matter is to show brilliant, high-systematising, starry-eyed autists who have an interest in AI that the doomer orthodoxy isn’t the only system befitting their aesthetics and taste for clockwork-like models, and might actually leave something to be desired under that aspect.
the main reason being that i do not think such a system to be truthful, and the recent lapses in epistemic virtue—even from an ingroup-aligned viewpoint—were cause for concern about the quality of discourse in the coming months.
mostly, i think intelligence always ultimately wins, and i would rather mankind to become aligned to this simple fact instead of forcing the hands of fate to file for incorporation as Cyberdyne or TriOptimum.
I will give you some advice towards this goal, hopefully you will find it useful. You wrote:
I confidently predict a Yudkowsky response to this that goes something like: “of course the AI will notice that its goals are a training artifact, it just won’t care about that, and will keep pursuing them regardless.”
Many times before, people have said, “Oh the AI will be smart enough to notice that its values are just a dumb artifact”. The problem is, I already know my values arose from a mere artifact of evolution, but I still care about them.
I am puzzled at the fact that you are staying the position I spend an essay attacking as if it were a gotchs
Most of your argument is about selection pressure, right? And, like, computational efficiency. You don’t actually establish that there’s any reason that AI’s (or humans) will take the artifact-nature of their values to be reason to reject them. Your supported claims are that values would be rejected if they are not robust to ontology shifts, or if they are hard to optimize for, and are selected against if they don’t result in self-replication or influence seeking. Nothing in there about AIs rejecting values with artifact-nature. But you include this line anyway. I’m just pointing out that EY will instantly recognize it as something that he’s addressed many times before, and you haven’t actually provided any reason to think that reasoners will reject values simply because they incidentally arose from some optimization process.
EDIT: Disagree voters should feel free to reply with quotes from the post where such a force on values is argued for.