Orthog­o­nal­ity Thesis

TagLast edit: 6 May 2024 18:06 UTC by habryka

The Orthogonality Thesis asserts that there can exist arbitrarily intelligent agents pursuing any kind of goal.

The strong form of the Orthogonality Thesis says that there’s no extra difficulty or complication in the existence of an intelligent agent that pursues a goal, above and beyond the computational tractability of that goal.

Suppose some strange alien came to Earth and credibly offered to pay us one million dollars’ worth of new wealth every time we created a paperclip. We’d encounter no special intellectual difficulty in figuring out how to make lots of paperclips.

That is, minds would readily be able to reason about:

The Orthogonality Thesis asserts that since these questions are not computationally intractable, it’s possible to have an agent that tries to make paperclips without being paid, because paperclips are what it wants. The strong form of the Orthogonality Thesis says that there need be nothing especially complicated or twisted about such an agent.

The Orthogonality Thesis is a statement about computer science, an assertion about the logical design space of possible cognitive agents. Orthogonality says nothing about whether a human AI researcher on Earth would want to build an AI that made paperclips, or conversely, want to make a nice AI. The Orthogonality Thesis just asserts that the space of possible designs contains AIs that make paperclips. And also AIs that are nice, to the extent there’s a sense of “nice” where you could say how to be nice to someone if you were paid a billion dollars to do that, and to the extent you could name something physically achievable to do.

This contrasts to inevitablist theses which might assert, for example:

The reason to talk about Orthogonality is that it’s a key premise in two highly important policy-relevant propositions:

Orthogonality does not require that all agent designs be equally compatible with all goals. E.g., the agent architecture AIXI-tl can only be formulated to care about direct functions of its sensory data, like a reward signal; it would not be easy to rejigger the AIXI architecture to care about creating massive diamonds in the environment (let alone any more complicated environmental goals). The Orthogonality Thesis states “there exists at least one possible agent such that…” over the whole design space; it’s not meant to be true of every particular agent architecture and every way of constructing agents.

Orthogonality is meant as a descriptive statement about reality, not a normative assertion. Orthogonality is not a claim about the way things ought to be; nor a claim that moral relativism is true (e.g. that all moralities are on equally uncertain footing according to some higher metamorality that judges all moralities as equally devoid of what would objectively constitute a justification). Claiming that paperclip maximizers can be constructed as cognitive agents is not meant to say anything favorable about paperclips, nor anything derogatory about sapient life.

The thesis was originally defined by Nick Bostrom in the paper “Superintelligent Will”, (along with the instrumental convergence thesis). For his purposes, Bostrom defines intelligence to be instrumental rationality.

(Most of the above copied from the Arbital orthogonality thesis article, continue reading there)

Related: Complexity of Value, Decision Theory, General Intelligence, Utility Functions

See Also

External links

Align­ment has a Basin of At­trac­tion: Beyond the Orthog­o­nal­ity Thesis

RogerDearnaley1 Feb 2024 21:15 UTC
5 points
15 comments13 min readLW link

Self-Refer­ence Breaks the Orthog­o­nal­ity Thesis

lsusr17 Feb 2023 4:11 UTC
40 points
35 comments2 min readLW link

Sort­ing Peb­bles Into Cor­rect Heaps

Eliezer Yudkowsky10 Aug 2008 1:00 UTC
212 points
110 comments4 min readLW link

If we had known the at­mo­sphere would ignite

Jeffs16 Aug 2023 20:28 UTC
53 points
49 comments2 min readLW link

Su­per­in­tel­li­gent In­tro­spec­tion: A Counter-ar­gu­ment to the Orthog­o­nal­ity Thesis

DirectedEvolution29 Aug 2021 4:53 UTC
3 points
18 comments4 min readLW link

Pro­posed Orthog­o­nal­ity Th­e­ses #2-5

rjbg14 Jul 2022 22:59 UTC
8 points
0 comments2 min readLW link

John Dana­her on ‘The Su­per­in­tel­li­gent Will’

lukeprog3 Apr 2012 3:08 UTC
9 points
12 comments1 min readLW link

Dist­in­guish­ing claims about train­ing vs deployment

Richard_Ngo3 Feb 2021 11:30 UTC
68 points
29 comments9 min readLW link

[Link] Is the Orthog­o­nal­ity Th­e­sis Defen­si­ble? (Qualia Com­put­ing)

ioannes13 Nov 2019 3:59 UTC
6 points
5 comments1 min readLW link

Re­sponse to nos­talge­braist: proudly wav­ing my moral-an­tire­al­ist bat­tle flag

Steven Byrnes29 May 2024 16:48 UTC
101 points
29 comments11 min readLW link

Pod­cast with Divia Eden and Ronny Fer­nan­dez on the strong or­thog­o­nal­ity thesis

DanielFilan28 Apr 2023 1:30 UTC
18 points
1 comment1 min readLW link

Con­tra Nora Belrose on Orthog­o­nal­ity Th­e­sis Be­ing Trivial

tailcalled7 Oct 2023 11:47 UTC
18 points
21 comments1 min readLW link

A poor but cer­tain at­tempt to philo­soph­i­cally un­der­mine the or­thog­o­nal­ity of in­tel­li­gence and aims

Jay9524 Feb 2023 3:03 UTC
−2 points
1 comment1 min readLW link

Co­her­ence ar­gu­ments im­ply a force for goal-di­rected behavior

KatjaGrace26 Mar 2021 16:10 UTC
91 points
24 comments11 min readLW link1 review

Ar­gu­ing Orthog­o­nal­ity, pub­lished form

Stuart_Armstrong18 Mar 2013 16:19 UTC
25 points
10 comments23 min readLW link

Embed­ded Agents are Quines

12 Dec 2023 4:57 UTC
11 points
7 comments8 min readLW link

Gen­eral pur­pose in­tel­li­gence: ar­gu­ing the Orthog­o­nal­ity thesis

Stuart_Armstrong15 May 2012 10:23 UTC
33 points
155 comments18 min readLW link

Orthog­o­nal­ity is Expensive

DragonGod3 Apr 2023 0:43 UTC
21 points
3 comments1 min readLW link

Orthog­o­nal­ity is expensive

beren3 Apr 2023 10:20 UTC
42 points
9 comments3 min readLW link

Ev­i­dence for the or­thog­o­nal­ity thesis

Stuart_Armstrong3 Apr 2012 10:58 UTC
14 points
293 comments1 min readLW link

How many philoso­phers ac­cept the or­thog­o­nal­ity the­sis ? Ev­i­dence from the PhilPapers survey

Paperclip Minimizer16 Jun 2018 12:11 UTC
3 points
26 comments3 min readLW link

Is the or­thog­o­nal­ity the­sis at odds with moral re­al­ism?

ChrisHallquist5 Nov 2013 20:47 UTC
7 points
118 comments1 min readLW link

Amend­ing the “Gen­eral Pu­pose In­tel­li­gence: Ar­gu­ing the Orthog­o­nal­ity Th­e­sis”

diegocaleiro13 Mar 2013 23:21 UTC
4 points
22 comments2 min readLW link

Non-or­thog­o­nal­ity im­plies un­con­trol­lable superintelligence

Stuart_Armstrong30 Apr 2012 13:53 UTC
23 points
47 comments1 min readLW link

The Me­taethics and Nor­ma­tive Ethics of AGI Value Align­ment: Many Ques­tions, Some Implications

Eleos Arete Citrini16 Sep 2021 16:13 UTC
6 points
0 comments8 min readLW link

[Question] Why Do AI re­searchers Rate the Prob­a­bil­ity of Doom So Low?

Aorou24 Sep 2022 2:33 UTC
7 points
6 comments3 min readLW link

[Question] Is the Orthog­o­nal­ity Th­e­sis true for hu­mans?

Noosphere8927 Oct 2022 14:41 UTC
12 points
20 comments1 min readLW link

A caveat to the Orthog­o­nal­ity Thesis

Wuschel Schulz9 Nov 2022 15:06 UTC
37 points
10 comments2 min readLW link

Sort­ing Peb­bles Into Cor­rect Heaps: The Animation

Writer10 Jan 2023 15:58 UTC
26 points
2 comments1 min readLW link

Is the ar­gu­ment that AI is an xrisk valid?

MACannon19 Jul 2021 13:20 UTC
5 points
61 comments1 min readLW link

Mo­ral re­al­ism and AI alignment

Caspar Oesterheld3 Sep 2018 18:46 UTC
13 points
10 comments1 min readLW link

Orthog­o­nal­ity or the “Hu­man Worth Hy­poth­e­sis”?

Jeffs23 Jan 2024 0:57 UTC
21 points
31 comments3 min readLW link

Re­quire­ments for a Basin of At­trac­tion to Alignment

RogerDearnaley14 Feb 2024 7:10 UTC
22 points
6 comments31 min readLW link

A Semiotic Cri­tique of the Orthog­o­nal­ity Thesis

Nicolas Villarreal4 Jun 2024 18:52 UTC
4 points
8 comments15 min readLW link

The Im­pos­si­bil­ity of a Ra­tional In­tel­li­gence Optimizer

Nicolas Villarreal6 Jun 2024 16:14 UTC
−9 points
5 comments14 min readLW link

The Orthog­o­nal­ity Th­e­sis is Not Ob­vi­ously True

omnizoid5 Apr 2023 21:06 UTC
1 point
79 comments9 min readLW link

The Mo­ral Coper­ni­can Principle

Legionnaire2 May 2023 3:25 UTC
5 points
7 comments2 min readLW link

An­thro­po­mor­phic Optimism

Eliezer Yudkowsky4 Aug 2008 20:17 UTC
75 points
60 comments5 min readLW link

A re­jec­tion of the Orthog­o­nal­ity Thesis

ArisC24 May 2023 16:37 UTC
−2 points
11 comments2 min readLW link

[Question] What would a post that ar­gues against the Orthog­o­nal­ity Th­e­sis that LessWrong users ap­prove of look like?

Thoth Hermes3 Jun 2023 21:21 UTC
3 points
3 comments1 min readLW link

Na­ture < Nur­ture for AIs

scottviteri4 Jun 2023 20:38 UTC
14 points
22 comments7 min readLW link

Su­per­in­tel­li­gence 9: The or­thog­o­nal­ity of in­tel­li­gence and goals

KatjaGrace11 Nov 2014 2:00 UTC
14 points
80 comments7 min readLW link

In­stru­men­tal Con­ver­gence to Com­plex­ity Preservation

Macro Flaneur13 Jul 2023 17:40 UTC
2 points
2 comments3 min readLW link

Are we all mis­al­igned?

Mateusz Mazurkiewicz3 Jan 2021 2:42 UTC
11 points
0 comments5 min readLW link

[Video] In­tel­li­gence and Stu­pidity: The Orthog­o­nal­ity Thesis

plex13 Mar 2021 0:32 UTC
5 points
1 comment1 min readLW link