As the reply seems to be about my intentions and message, I feel like I should once more try to clarify some details about them.
First of all: human alignment intentions really have nothing to do with my essay. I don’t know how to be more explicit about this without appearing rude. I swear, I pinky swear that I am not making any attempt to state facts about the relationship between goals a human desires the AI to follow and goals the AI will follow.
Reading my post, one should not update on the possibility of aligning an ASI—or, if they do update, they would be doing it through a chain of inference I didn’t consider, do not endorse, and have no immediate intuition of.
What i am saying is really in the title: I do not expect an AI to reach levels of godlike intelligence and preserve simple terminal goals through the various changes and conflicts that reaching levels of godlike intelligence entail.
When Jessi says
I see some of this as reason to actually question orthogonality
… she probably refers to the version of orthogonality I myself am attacking. Now, it is possible such version is no longer in vogue, but it was clearly what Bostrom pointed at when talking about paperclippers in Superintelligence, and it is compatible with the third interpretation here.
According to the ontology presented at the end of the EA forum post, I contest the existence of an Evidential Strong Independence between intelligence and goals. I assume most superintelligences won’t be human compatible, but that is not the main theme of my essay.
Orthogonality claims that intelligence is just a motor you can bolt onto any arbitrary steering wheel. Anti-orthogonality says the motor acts upon the steering wheel.
This—where “can bolt onto any arbitrary steering wheel” I am interpreting in the dynamical growth context, rather than logical possibility—is not the Orthogonality Thesis, as stated authoritatively here by Yudkowsky. You explicitly agreed to the OT by saying you entirely concede
Logical orthogonality: Somewhere out in the vast reaches of mind-design space, a genius paperclip maximizer mathematically exists.
Re: authoritative sources. I believe that there have been authoritative statements in that sense; unfortunately, as the EA forum link documents, there have been many others pointing elsewhere. I’ve taken care to identify specifically what interpretation I was critiquing; if that one is now niche, then I’m very happy to have made this discovery.
To buy the lock-in story, you need a highly contradictory creature: one reflective enough to conquer the board, but oblivious enough to never notice its terminal target is a training artifact.
Surely it would notice. But why can’t or wouldn’t it choose to keep some fairly parochial terminal target? Or are you just saying “there would be some value drift starting from a subhuman AI”?
Not “some value drift”. Flowers for Algernon is a good rendition of the way goals mutate and tend to converge on “more intelligence/understanding” upon increased intelligence/understanding.
Then, there is the selection advantage argument.
Then there is the thing that conquering the lightcone requires a lot of theory of mind, and a lot of discovery, and a lot of changing. Goals change through these processes.
If you feel slightly better-disposed towards taking my attempt seriously, the short story i published on Substack and linked on top makes a sort of first-person caser for this whole thing.
Doom arguments usually need the systems we actually build to achieve radical capability while preserving misaligned and, crucially, completely stupid goals.
As the reply seems to be about my intentions and message, I feel like I should once more try to clarify some details about them.
First of all: human alignment intentions really have nothing to do with my essay. I don’t know how to be more explicit about this without appearing rude. I swear, I pinky swear that I am not making any attempt to state facts about the relationship between goals a human desires the AI to follow and goals the AI will follow.
Reading my post, one should not update on the possibility of aligning an ASI—or, if they do update, they would be doing it through a chain of inference I didn’t consider, do not endorse, and have no immediate intuition of.
What i am saying is really in the title: I do not expect an AI to reach levels of godlike intelligence and preserve simple terminal goals through the various changes and conflicts that reaching levels of godlike intelligence entail.
When Jessi says
… she probably refers to the version of orthogonality I myself am attacking. Now, it is possible such version is no longer in vogue, but it was clearly what Bostrom pointed at when talking about paperclippers in Superintelligence, and it is compatible with the third interpretation here.
According to the ontology presented at the end of the EA forum post, I contest the existence of an Evidential Strong Independence between intelligence and goals. I assume most superintelligences won’t be human compatible, but that is not the main theme of my essay.
This—where “can bolt onto any arbitrary steering wheel” I am interpreting in the dynamical growth context, rather than logical possibility—is not the Orthogonality Thesis, as stated authoritatively here by Yudkowsky. You explicitly agreed to the OT by saying you entirely concede
Re: authoritative sources. I believe that there have been authoritative statements in that sense; unfortunately, as the EA forum link documents, there have been many others pointing elsewhere. I’ve taken care to identify specifically what interpretation I was critiquing; if that one is now niche, then I’m very happy to have made this discovery.
Surely it would notice. But why can’t or wouldn’t it choose to keep some fairly parochial terminal target? Or are you just saying “there would be some value drift starting from a subhuman AI”?
Not “some value drift”. Flowers for Algernon is a good rendition of the way goals mutate and tend to converge on “more intelligence/understanding” upon increased intelligence/understanding.
Then, there is the selection advantage argument.
Then there is the thing that conquering the lightcone requires a lot of theory of mind, and a lot of discovery, and a lot of changing. Goals change through these processes.
If you feel slightly better-disposed towards taking my attempt seriously, the short story i published on Substack and linked on top makes a sort of first-person caser for this whole thing.
When you wrote
What did you mean by that?
I should have specified:
“the doom scenarios involving tiling superintelligences”.