Oracle AI

TagLast edit: Dec 30, 2024, 10:00 AM by Dakara

Oracle AI is a regularly proposed solution to the problem of developing Friendly AI. It is conceptualized as a super-intelligent system which is designed for only answering questions, and has no ability to act in the world. The name was first suggested by Nick Bostrom.

Safety

The question of whether Oracles – or just keeping an AGI forcibly confined—are safer than fully free AGIs has been the subject of debate for a long time. Armstrong, Sandberg and Bostrom discuss Oracle safety at length in their Thinking inside the box: using and controlling an Oracle AI. In the paper, the authors review various methods which might be used to measure an Oracle’s accuracy. They also try to shed some light on some weaknesses and dangers that can emerge on the human side, such as psychological vulnerabilities which can be exploited by the Oracle through social engineering. The paper discusses ideas for physical security (“boxing”), as well as problems involved with trying to program the AI to only answer questions. In the end, the paper reaches the cautious conclusion of Oracle AIs probably being safer than free AGIs.

In a related work, Dreams of Friendliness, Eliezer Yudkowsky gives an informal argument stating that all oracles will be agent-like, that is, driven by its own goals. He rests on the idea that anything considered “intelligent” must choose the correct course of action among all actions available. That means that the Oracle will have many possible things to believe, although very few of them are correct. Therefore believing the correct thing means some method was used to select the correct belief from the many incorrect beliefs. By definition, this is an optimization process which has a goal of selecting correct beliefs.

One can then imagine all the things that might be useful in achieving the goal of “have correct beliefs”. For instance, acquiring more computing power and resources could help this goal. As such, an Oracle could determine that it might answer more accurately and easily to a certain question if it turned all matter outside the box to computronium, therefore killing all the existing life.

Taxonomy

Based on an old draft by Daniel Dewey, Luke Muehlhauser has published a possible taxonomy of Oracle AIs, broadly divided between True Oracular AIs and Oracular non-AIs.

True Oracular AIs

Given that true AIs are goal-oriented agents, it follows that a True Oracular AI has some kind of oracular goals. These act as the motivation system for the Oracle to give us the information we ask and nothing else.

It is first noted that such a True AI is not actually nor causally isolated from the world, as it has at least an input (questions and information) and an output (answers) channel. Since we expect such an intelligent agent to be able to have a deep impact on the world even through these limited channels, it can only be safe if its goals are fully compatible with human goals.

This means that a True Oracular AI has to have a full specification of human values, thus making it a FAI-complete problem – if we could achieve such skill and knowledge we could just build a Friendly AI and bypass the Oracle AI concept.

Oracular non-AIs

Any system that acts only as an informative machine, only answering questions and has no goals is by definition not an AI at all. That means that a non-AI Oracular is but a calculator of outputs based on inputs. Since the term in itself is heterogeneous, the proposals made for a sub-division are merely informal.

An Advisor can be seen as a system that gathers data from the real world and computes the answer to an informal “what we ought to do?” question. They also represent a FAI-complete problem.

A Question-Answerer is a similar system that gathers data from the real world but coupled with a question. It then somehow computes the answer. The difficulty can lay on distinguishing it from an Advisor and controlling the safety of its answers.

Finally, a Predictor is seen as a system that takes a corpus of data and produces a probability distribution over future possible data. There are some proposed dangers with predictors, namely exhibiting goal-seeking behavior which does not converge with humanity goals and the ability to influence us through the predictions.

The Parable of Predict-O-Matic

abramdemskiOct 15, 2019, 12:49 AM

359 points

43 comments14 min readLW link 2 reviews

A taxonomy of Oracle AIs

lukeprogMar 8, 2012, 11:14 PM

25 points

53 comments4 min readLW link

Superintelligence 15: Oracles, genies and sovereigns

KatjaGraceDec 23, 2014, 2:01 AM

11 points

30 comments7 min readLW link

Oracle AI: Human beliefs vs human values

Stuart_ArmstrongJul 22, 2015, 11:54 AM

4 points

14 comments1 min readLW link

Yet another safe oracle AI proposal

jacobtFeb 26, 2012, 11:45 PM

4 points

33 comments12 min readLW link

Why safe Oracle AI is easier than safe general AI, in a nutshell

Stuart_ArmstrongDec 3, 2011, 12:33 PM

5 points

61 comments1 min readLW link

Under a week left to win $1,000! By questioning Oracle AIs.

Stuart_ArmstrongAug 25, 2019, 5:02 PM

12 points

2 comments1 min readLW link

Contest: $1,000 for good questions to ask to an Oracle AI

Stuart_ArmstrongJul 31, 2019, 6:48 PM

59 points

154 comments3 min readLW link

In defense of Oracle (“Tool”) AI research

Steven ByrnesAug 7, 2019, 7:14 PM

22 points

11 comments4 min readLW link

Brain emulations and Oracle AI

Stuart_ArmstrongOct 14, 2011, 5:51 PM

10 points

5 comments1 min readLW link

A Proof Against Oracle AI

aiiixiiiMar 6, 2020, 9:42 PM

11 points

11 comments1 min readLW link

Oracles: reject all deals—break superrationality, with superrationality

Stuart_ArmstrongDec 5, 2019, 1:51 PM

20 points

4 comments8 min readLW link

Results of $1,000 Oracle contest!

Stuart_ArmstrongJun 17, 2020, 5:44 PM

60 points

2 comments1 min readLW link

Search Engines and Oracles

HalMorrisJul 8, 2014, 2:27 PM

8 points

8 comments2 min readLW link

Is it possible to build a safe oracle AI?

KarlApr 20, 2011, 12:54 PM

1 point

25 comments1 min readLW link

Reflexive Oracles and superrationality: prisoner’s dilemma

Stuart_ArmstrongMay 24, 2017, 8:34 AM

14 points

5 comments4 min readLW link

Finding reflective oracle distributions using a Kakutani map

jessicataMay 2, 2017, 2:12 AM

1 point

0 comments2 min readLW link

Strategy For Conditioning Generative Models

james.lucassen and evhub

Sep 1, 2022, 4:34 AM

31 points

4 comments18 min readLW link

A Safer Oracle Setup?

OferFeb 9, 2018, 12:16 PM

5 points

4 comments4 min readLW link

Three Oracle designs

Stuart_ArmstrongJul 20, 2016, 3:16 PM

2 points

0 comments1 min readLW link

Book Review: AI Safety and Security

Michaël TrazziAug 21, 2018, 10:23 AM

52 points

2 comments11 min readLW link

Resource-Limited Reflective Oracles

DiffractorJun 6, 2018, 2:50 AM

15 points

2 comments4 min readLW link

Breaking Oracles: superrationality and acausal trade

Stuart_ArmstrongNov 25, 2019, 10:40 AM

25 points

15 comments1 min readLW link

Counterfactual Oracles = online supervised learning with random selection of training episodes

Wei DaiSep 10, 2019, 8:29 AM

52 points

26 comments3 min readLW link

An Oracle standard trick

Stuart_ArmstrongJun 3, 2015, 2:25 PM

2 points

0 comments1 min readLW link

Optimiser to Oracle

Stuart_ArmstrongSep 22, 2015, 10:27 AM

0 points

0 comments1 min readLW link

Reflective oracles as a solution to the converse Lawvere problem

SamEisenstatNov 29, 2018, 3:23 AM

35 points

2 comments7 min readLW link

Oracles, sequence predictors, and self-confirming predictions

Stuart_ArmstrongMay 3, 2019, 2:09 PM

22 points

0 comments3 min readLW link

Bounded Oracle Induction

DiffractorNov 28, 2018, 8:11 AM

25 points

0 comments9 min readLW link

Non-manipulative oracles

Stuart_ArmstrongFeb 6, 2015, 5:05 PM

3 points

1 comment1 min readLW link

Self-confirming prophecies, and simplified Oracle designs

Stuart_ArmstrongJun 28, 2019, 9:57 AM

10 points

1 comment5 min readLW link

Resource-Limited Reflective Oracles

DiffractorJun 6, 2018, 2:50 AM

15 points

2 comments4 min readLW link

[Question] What is the risk of asking a counterfactual oracle a question that already had its answer erased?

Chris_LeongFeb 3, 2023, 3:13 AM

7 points

0 comments1 min readLW link

Analysing: Dangerous messages from future UFAI via Oracles

Stuart_ArmstrongNov 22, 2019, 2:17 PM

22 points

16 comments4 min readLW link

From halting oracles to modal logic

Benya_FallensteinFeb 3, 2015, 7:26 PM

1 point

4 comments6 min readLW link

Self-Supervised Learning and AGI Safety

Steven ByrnesAug 7, 2019, 2:21 PM

30 points

9 comments12 min readLW link

The algorithm isn’t doing X, it’s just doing Y.

Cleo NardoMar 16, 2023, 11:28 PM

53 points

43 comments5 min readLW link

Reflective oracles and the procrastination paradox

jessicataMar 26, 2015, 10:18 PM

3 points

4 comments2 min readLW link

Cooperative Oracles: Nonexploited Bargaining

Scott GarrabrantJun 3, 2017, 12:39 AM

6 points

6 comments3 min readLW link

Oracle machines for automated philosophy

NisanFeb 17, 2015, 3:10 PM

1 point

1 comment4 min readLW link

Cooperative Oracles: Introduction

Scott GarrabrantJun 3, 2017, 12:36 AM

12 points

3 comments2 min readLW link

Cooperative Oracles: Stratified Pareto Optima and Almost Stratified Pareto Optima

Scott GarrabrantJun 3, 2017, 12:38 AM

5 points

8 comments4 min readLW link

Simplicity priors with reflective oracles

Benya_FallensteinNov 15, 2014, 6:39 AM

1 point

0 comments6 min readLW link

An Oracle standard trick

Stuart_ArmstrongJun 3, 2015, 2:17 PM

7 points

33 comments1 min readLW link

Probabilistic Oracle Machines and Nash Equilibria

jessicataFeb 6, 2015, 1:14 AM

5 points

0 comments1 min readLW link

An Idea For Corrigible, Recursively Improving Math Oracles

jimrandomhJul 20, 2015, 3:35 AM

10 points

5 comments2 min readLW link

Standard ML Oracles vs Counterfactual ones

Stuart_ArmstrongOct 10, 2018, 8:01 PM

18 points

5 comments6 min readLW link

Epiphenomenal Oracles Ignore Holes in the Box

SilentCalJan 31, 2018, 8:08 PM

17 points

8 comments2 min readLW link

Multibit reflective oracles

Benya_FallensteinJan 25, 2015, 2:23 AM

5 points

1 comment8 min readLW link

Reflective oracles and superationality

Stuart_ArmstrongNov 18, 2015, 12:30 PM

16 points

0 comments6 min readLW link

Counterfactuals and reflective oracles

NisanSep 5, 2018, 8:54 AM

9 points

0 comments6 min readLW link

Cooperative Oracles

DiffractorSep 1, 2018, 8:05 AM

19 points

9 comments12 min readLW link

Reflexive Oracles and superrationality: Pareto

Stuart_ArmstrongMay 24, 2017, 8:35 AM

16 points

0 comments2 min readLW link

Optimizing arbitrary expressions with a linear number of queries to a Logical Induction Oracle (Cartoon Guide)

Donald HobsonJul 23, 2020, 9:37 PM

4 points

2 comments2 min readLW link

Oracles, Informers, and Controllers

ozziegooenMay 25, 2021, 2:16 PM

15 points

2 comments3 min readLW link

Forum Digest: Reflective Oracles

jessicataMar 22, 2015, 4:02 AM

6 points

0 comments3 min readLW link

UDT in the Land of Probabilistic Oracles

jessicataFeb 8, 2015, 9:13 AM

4 points

1 comment3 min readLW link

Strategy Nonconvexity Induced by a Choice of Potential Oracles

DiffractorJan 27, 2018, 12:41 AM

2 points

0 comments3 min readLW link

Safe questions to ask an Oracle?

Stuart_ArmstrongJan 27, 2012, 6:33 PM

3 points

41 comments1 min readLW link

Oracle design as de-black-boxer.

Stuart_ArmstrongSep 2, 2016, 1:38 PM

0 points

0 comments1 min readLW link

Static Place AI Makes Agentic AI Redundant: Multiversal AI Alignment & Rational Utopia

ankFeb 13, 2025, 10:35 PM

1 point

2 comments11 min readLW link

Artificial Static Place Intelligence: Guaranteed Alignment

ankFeb 15, 2025, 11:08 AM

2 points

2 comments2 min readLW link

Simulators

janusSep 2, 2022, 12:45 PM

633 points

168 comments41 min readLW link 8 reviews

(generative.ink)

Proper scoring rules don’t guarantee predicting fixed points

Johannes Treutlein, Rubi J. Hudson and Caspar Oesterheld

Dec 16, 2022, 6:22 PM

79 points

8 comments21 min readLW link

Gaia Network: An Illustrated Primer

Rafael Kaufmann Nedal and Roman Leventov

Jan 18, 2024, 6:23 PM

3 points

2 comments15 min readLW link

Annotated reply to Bengio’s “AI Scientists: Safe and Useful AI?”

Roman LeventovMay 8, 2023, 9:26 PM

18 points

2 comments7 min readLW link

(yoshuabengio.org)

[Question] Oracle AGI—How can it escape, other than security issues? (Steganography?)

RationalSieveDec 25, 2022, 8:14 PM

3 points

6 comments1 min readLW link

Prosaic misalignment from the Solomonoff Predictor

Cleo NardoDec 9, 2022, 5:53 PM

42 points

3 comments5 min readLW link

Some reasons why a predictor wants to be a consequentialist

Lauro LangoscoApr 15, 2022, 3:02 PM

23 points

16 comments5 min readLW link

Anthropomorphic AI and Sandboxed Virtual Universes

jacob_cannellSep 3, 2010, 7:02 PM

4 points

124 comments5 min readLW link

How to safely use an optimizer

Simon FischerMar 28, 2024, 4:11 PM

47 points

21 comments7 min readLW link

Interacting with a Boxed AI

aphyerApr 1, 2022, 10:42 PM

12 points

19 comments4 min readLW link

Underspecification of Oracle AI

Rubi J. Hudson, Adam Jermyn and Johannes Treutlein

Jan 15, 2023, 8:10 PM

30 points

12 comments19 min readLW link

AI oracles on blockchain

CaravaggioApr 6, 2021, 8:13 PM

5 points

0 comments3 min readLW link

Nick Land: Orthogonality

lumpenspaceFeb 4, 2025, 9:07 PM

12 points

37 comments8 min readLW link

A multi-disciplinary view on AI safety research

Roman LeventovFeb 8, 2023, 4:50 PM

46 points

4 comments26 min readLW link

The Binding of Isaac & Transparent Newcomb’s Problem

suvjectibityFeb 22, 2024, 6:56 PM

−11 points

0 comments10 min readLW link

Where Free Will and Determinism Meet

David BravoApr 4, 2023, 10:59 AM

0 points

0 comments3 min readLW link

Conditioning Generative Models for Alignment

JozdienJul 18, 2022, 7:11 AM

60 points

8 comments20 min readLW link

Stop-gradients lead to fixed point predictions

Johannes Treutlein, Caspar Oesterheld, Rubi J. Hudson and Emery Cooper

Jan 28, 2023, 10:47 PM

37 points

2 comments24 min readLW link

Places of Loving Grace [Story]

ankFeb 18, 2025, 11:49 PM

−1 points

0 comments4 min readLW link

Training goals for large language models

Johannes TreutleinJul 18, 2022, 7:09 AM

28 points

5 comments19 min readLW link

plex Aug 29, 2021, 4:05 PM
1 point
I think this should be in the AI category, likely under Alignment Theory.

Or­a­cle AI

See also

Safety

Taxonomy

True Oracular AIs

Oracular non-AIs

Further reading & References

Oracle AI