Orthogonality Thesis

TagLast edit: 6 May 2024 18:06 UTC by habryka

The Orthogonality Thesis asserts that there can exist arbitrarily intelligent agents pursuing any kind of goal.

The strong form of the Orthogonality Thesis says that there’s no extra difficulty or complication in the existence of an intelligent agent that pursues a goal, above and beyond the computational tractability of that goal.

Suppose some strange alien came to Earth and credibly offered to pay us one million dollars’ worth of new wealth every time we created a paperclip. We’d encounter no special intellectual difficulty in figuring out how to make lots of paperclips.

That is, minds would readily be able to reason about:

How many paperclips would result, if I pursued a policy null ?
How can I search out a policy null that happens to have a high answer to the above question?

The Orthogonality Thesis asserts that since these questions are not computationally intractable, it’s possible to have an agent that tries to make paperclips without being paid, because paperclips are what it wants. The strong form of the Orthogonality Thesis says that there need be nothing especially complicated or twisted about such an agent.

The Orthogonality Thesis is a statement about computer science, an assertion about the logical design space of possible cognitive agents. Orthogonality says nothing about whether a human AI researcher on Earth would want to build an AI that made paperclips, or conversely, want to make a nice AI. The Orthogonality Thesis just asserts that the space of possible designs contains AIs that make paperclips. And also AIs that are nice, to the extent there’s a sense of “nice” where you could say how to be nice to someone if you were paid a billion dollars to do that, and to the extent you could name something physically achievable to do.

This contrasts to inevitablist theses which might assert, for example:

“It doesn’t matter what kind of AI you build, it will turn out to only pursue its own survival as a final end.”
“Even if you tried to make an AI optimize for paperclips, it would reflect on those goals, reject them as being stupid, and embrace a goal of valuing all sapient life.”

The reason to talk about Orthogonality is that it’s a key premise in two highly important policy-relevant propositions:

It is possible to build a nice AI.
It is possible to screw up when trying to build a nice AI, and if you do, the AI will not automatically decide to be nice instead.

Orthogonality does not require that all agent designs be equally compatible with all goals. E.g., the agent architecture AIXI-tl can only be formulated to care about direct functions of its sensory data, like a reward signal; it would not be easy to rejigger the AIXI architecture to care about creating massive diamonds in the environment (let alone any more complicated environmental goals). The Orthogonality Thesis states “there exists at least one possible agent such that…” over the whole design space; it’s not meant to be true of every particular agent architecture and every way of constructing agents.

Orthogonality is meant as a descriptive statement about reality, not a normative assertion. Orthogonality is not a claim about the way things ought to be; nor a claim that moral relativism is true (e.g. that all moralities are on equally uncertain footing according to some higher metamorality that judges all moralities as equally devoid of what would objectively constitute a justification). Claiming that paperclip maximizers can be constructed as cognitive agents is not meant to say anything favorable about paperclips, nor anything derogatory about sapient life.

The thesis was originally defined by Nick Bostrom in the paper “Superintelligent Will”, (along with the instrumental convergence thesis). For his purposes, Bostrom defines intelligence to be instrumental rationality.

(Most of the above copied from the Arbital orthogonality thesis article, continue reading there)

External links

Definition of the orthogonality thesis from Bostrom’s Superintelligent Will
Critique of the thesis by John Danaher
Superintelligent Will paper by Nick Bostrom

If we had known the atmosphere would ignite

Jeffs16 Aug 2023 20:28 UTC

54 points

49 comments2 min readLW link

Sorting Pebbles Into Correct Heaps

Eliezer Yudkowsky10 Aug 2008 1:00 UTC

213 points

110 comments4 min readLW link

Alignment has a Basin of Attraction: Beyond the Orthogonality Thesis

RogerDearnaley1 Feb 2024 21:15 UTC

13 points

15 comments13 min readLW link

Self-Reference Breaks the Orthogonality Thesis

lsusr17 Feb 2023 4:11 UTC

40 points

35 comments2 min readLW link

Proposed Orthogonality Theses #2-5

rjbg14 Jul 2022 22:59 UTC

8 points

0 comments2 min readLW link

Superintelligent Introspection: A Counter-argument to the Orthogonality Thesis

DirectedEvolution29 Aug 2021 4:53 UTC

3 points

18 comments4 min readLW link

Embedded Agents are Quines

lsusr and DaemonicSigil

12 Dec 2023 4:57 UTC

11 points

7 comments8 min readLW link

Response to nostalgebraist: proudly waving my moral-antirealist battle flag

Steven Byrnes29 May 2024 16:48 UTC

101 points

29 comments11 min readLW link

General purpose intelligence: arguing the Orthogonality thesis

Stuart_Armstrong15 May 2012 10:23 UTC

33 points

155 comments18 min readLW link

Evidence for the orthogonality thesis

Stuart_Armstrong3 Apr 2012 10:58 UTC

14 points

293 comments1 min readLW link

A “Bitter Lesson” Approach to Aligning AGI and ASI

RogerDearnaley6 Jul 2024 1:23 UTC

52 points

38 comments24 min readLW link

Orthogonality is Expensive

DragonGod3 Apr 2023 0:43 UTC

21 points

3 comments1 min readLW link

(www.beren.io)

Orthogonality is expensive

beren3 Apr 2023 10:20 UTC

42 points

9 comments3 min readLW link

Podcast with Divia Eden and Ronny Fernandez on the strong orthogonality thesis

DanielFilan28 Apr 2023 1:30 UTC

18 points

1 comment1 min readLW link

(youtu.be)

Contra Nora Belrose on Orthogonality Thesis Being Trivial

tailcalled7 Oct 2023 11:47 UTC

18 points

21 comments1 min readLW link

Coherence arguments imply a force for goal-directed behavior

KatjaGrace26 Mar 2021 16:10 UTC

91 points

24 comments11 min readLW link 1 review

(aiimpacts.org)

John Danaher on ‘The Superintelligent Will’

lukeprog3 Apr 2012 3:08 UTC

9 points

12 comments1 min readLW link

Distinguishing claims about training vs deployment

Richard_Ngo3 Feb 2021 11:30 UTC

68 points

29 comments9 min readLW link

[Link] Is the Orthogonality Thesis Defensible? (Qualia Computing)

ioannes13 Nov 2019 3:59 UTC

6 points

5 comments1 min readLW link

A poor but certain attempt to philosophically undermine the orthogonality of intelligence and aims

Jay9524 Feb 2023 3:03 UTC

−2 points

1 comment1 min readLW link

Arguing Orthogonality, published form

Stuart_Armstrong18 Mar 2013 16:19 UTC

25 points

10 comments23 min readLW link

Anthropomorphic Optimism

Eliezer Yudkowsky4 Aug 2008 20:17 UTC

77 points

60 comments5 min readLW link

A rejection of the Orthogonality Thesis

ArisC24 May 2023 16:37 UTC

−2 points

11 comments2 min readLW link

(medium.com)

[Question] What would a post that argues against the Orthogonality Thesis that LessWrong users approve of look like?

Thoth Hermes3 Jun 2023 21:21 UTC

3 points

3 comments1 min readLW link

Nature < Nurture for AIs

scottviteri4 Jun 2023 20:38 UTC

14 points

22 comments7 min readLW link

Superintelligence 9: The orthogonality of intelligence and goals

KatjaGrace11 Nov 2014 2:00 UTC

14 points

80 comments7 min readLW link

Instrumental Convergence to Complexity Preservation

Macro Flaneur13 Jul 2023 17:40 UTC

2 points

2 comments3 min readLW link

A Semiotic Critique of the Orthogonality Thesis

Nicolas Villarreal4 Jun 2024 18:52 UTC

4 points

8 comments15 min readLW link

Are we all misaligned?

Mateusz Mazurkiewicz3 Jan 2021 2:42 UTC

11 points

0 comments5 min readLW link

[Video] Intelligence and Stupidity: The Orthogonality Thesis

plex13 Mar 2021 0:32 UTC

5 points

1 comment1 min readLW link

(www.youtube.com)

[Question] Is the Orthogonality Thesis true for humans?

Noosphere8927 Oct 2022 14:41 UTC

12 points

20 comments1 min readLW link

Is the argument that AI is an xrisk valid?

MACannon19 Jul 2021 13:20 UTC

5 points

61 comments1 min readLW link

(onlinelibrary.wiley.com)

How many philosophers accept the orthogonality thesis ? Evidence from the PhilPapers survey

Paperclip Minimizer16 Jun 2018 12:11 UTC

3 points

26 comments3 min readLW link

Is the orthogonality thesis at odds with moral realism?

ChrisHallquist5 Nov 2013 20:47 UTC

7 points

118 comments1 min readLW link

Amending the “General Pupose Intelligence: Arguing the Orthogonality Thesis”

diegocaleiro13 Mar 2013 23:21 UTC

4 points

22 comments2 min readLW link

Non-orthogonality implies uncontrollable superintelligence

Stuart_Armstrong30 Apr 2012 13:53 UTC

23 points

47 comments1 min readLW link

Requirements for a Basin of Attraction to Alignment

RogerDearnaley14 Feb 2024 7:10 UTC

38 points

6 comments31 min readLW link

A caveat to the Orthogonality Thesis

Wuschel Schulz9 Nov 2022 15:06 UTC

38 points

10 comments2 min readLW link

The Impossibility of a Rational Intelligence Optimizer

Nicolas Villarreal6 Jun 2024 16:14 UTC

−9 points

5 comments14 min readLW link

Sorting Pebbles Into Correct Heaps: The Animation

Writer10 Jan 2023 15:58 UTC

26 points

2 comments1 min readLW link

(youtu.be)

Moral realism and AI alignment

Caspar Oesterheld3 Sep 2018 18:46 UTC

13 points

10 comments1 min readLW link

(casparoesterheld.com)

The Metaethics and Normative Ethics of AGI Value Alignment: Many Questions, Some Implications

Eleos Arete Citrini16 Sep 2021 16:13 UTC

6 points

0 comments8 min readLW link

Orthogonality or the “Human Worth Hypothesis”?

Jeffs23 Jan 2024 0:57 UTC

21 points

31 comments3 min readLW link

The Orthogonality Thesis is Not Obviously True

omnizoid5 Apr 2023 21:06 UTC

1 point

79 comments9 min readLW link

[Question] Why Do AI researchers Rate the Probability of Doom So Low?

Aorou24 Sep 2022 2:33 UTC

7 points

6 comments3 min readLW link

The Moral Copernican Principle

Legionnaire2 May 2023 3:25 UTC

5 points

7 comments2 min readLW link

Ruby 1 Oct 2020 22:26 UTC
2 points
From the old discussion page:
Talk:Orthogonality thesis
Quality Concerns
- There is no reference for Stuart Armstrong’s requirements. I guess they come from here.
- I know roughly what the orthogonality thesis is. But if I’d only read the wiki page, it wouldn’t make sense to me. – »Well, we don’t share the opinion of some people that the goals of increasingly intelligent agents converge. So we put up a thesis which claims that intelligence and goals vary freely. Suppose there’s one goal system that fulfils Armstrong’s requirements. This would refute the orthogonality thesis, even if most intelligences converged on one or two other goal systems.« I don’t mean to say that the orthogonality thesis itself doesn’t make sense. I mean that the wiki page doesn’t provide enough information to enable people to understand that it makes sense.
Instead of repairing the above shortcomings, I propose referring people to the corresponding Arbital page. --Rmoehn (talk) 14:47, 31 May 2016 (AEST)

Orthog­o­nal­ity Thesis

See Also

External links

Talk:Orthogonality thesis

Quality Concerns

Orthogonality Thesis