Or­a­cle AI

TagLast edit: 25 May 2022 14:52 UTC by Joel Burget

An Oracle AI is a regularly proposed solution to the problem of developing Friendly AI. It is conceptualized as a super-intelligent system which is designed for only answering questions, and has no ability to act in the world. The name was first suggested by Nick Bostrom.

See also


The question of whether Oracles – or just keeping an AGI forcibly confined—are safer than fully free AGIs has been the subject of debate for a long time. Armstrong, Sandberg and Bostrom discuss Oracle safety at length in their Thinking inside the box: using and controlling an Oracle AI. In the paper, the authors review various methods which might be used to measure an Oracle’s accuracy. They also try to shed some light on some weaknesses and dangers that can emerge on the human side, such as psychological vulnerabilities which can be exploited by the Oracle through social engineering. The paper discusses ideas for physical security (“boxing”), as well as problems involved with trying to program the AI to only answer questions. In the end, the paper reaches the cautious conclusion of Oracle AIs probably being safer than free AGIs.

In a related work, Dreams of Friendliness, Eliezer Yudkowsky gives an informal argument stating that all oracles will be agent-like, that is, driven by its own goals. He rests on the idea that anything considered “intelligent” must choose the correct course of action among all actions available. That means that the Oracle will have many possible things to believe, although very few of them are correct. Therefore believing the correct thing means some method was used to select the correct belief from the many incorrect beliefs. By definition, this is an optimization process which has a goal of selecting correct beliefs.

One can then imagine all the things that might be useful in achieving the goal of “have correct beliefs”. For instance, acquiring more computing power and resources could help this goal. As such, an Oracle could determine that it might answer more accurately and easily to a certain question if it turned all matter outside the box to computronium, therefore killing all the existing life.


Based on an old draft by Daniel Dewey, Luke Muehlhauser has published a possible taxonomy of Oracle AIs, broadly divided between True Oracular AIs and Oracular non-AIs.

True Oracular AIs

Given that true AIs are goal-oriented agents, it follows that a True Oracular AI has some kind of oracular goals. These act as the motivation system for the Oracle to give us the information we ask and nothing else.

It is first noted that such a True AI is not actually nor causally isolated from the world, as it has at least an input (questions and information) and an output (answers) channel. Since we expect such an intelligent agent to be able to have a deep impact on the world even through these limited channels, it can only be safe if its goals are fully compatible with human goals.

This means that a True Oracular AI has to have a full specification of human values, thus making it a FAI-complete problem – if we could achieve such skill and knowledge we could just build a Friendly AI and bypass the Oracle AI concept.

Oracular non-AIs

Any system that acts only as an informative machine, only answering questions and has no goals is by definition not an AI at all. That means that a non-AI Oracular is but a calculator of outputs based on inputs. Since the term in itself is heterogeneous, the proposals made for a sub-division are merely informal.

An Advisor can be seen as a system that gathers data from the real world and computes the answer to an informal “what we ought to do?” question. They also represent a FAI-complete problem.

A Question-Answerer is a similar system that gathers data from the real world but coupled with a question. It then somehow computes the answer. The difficulty can lay on distinguishing it from an Advisor and controlling the safety of its answers.

Finally, a Predictor is seen as a system that takes a corpus of data and produces a probability distribution over future possible data. There are some proposed dangers with predictors, namely exhibiting goal-seeking behavior which does not converge with humanity goals and the ability to influence us through the predictions.

Further reading & References

A tax­on­omy of Or­a­cle AIs

lukeprog8 Mar 2012 23:14 UTC
25 points
53 comments4 min readLW link

The Parable of Pre­dict-O-Matic

abramdemski15 Oct 2019 0:49 UTC
328 points
41 comments14 min readLW link2 reviews

Why safe Or­a­cle AI is eas­ier than safe gen­eral AI, in a nutshell

Stuart_Armstrong3 Dec 2011 12:33 UTC
5 points
61 comments1 min readLW link

A Proof Against Or­a­cle AI

aiiixiii6 Mar 2020 21:42 UTC
11 points
11 comments1 min readLW link

Brain em­u­la­tions and Or­a­cle AI

Stuart_Armstrong14 Oct 2011 17:51 UTC
10 points
5 comments1 min readLW link

Yet an­other safe or­a­cle AI proposal

jacobt26 Feb 2012 23:45 UTC
4 points
33 comments12 min readLW link

Con­test: $1,000 for good ques­tions to ask to an Or­a­cle AI

Stuart_Armstrong31 Jul 2019 18:48 UTC
59 points
154 comments3 min readLW link

Un­der a week left to win $1,000! By ques­tion­ing Or­a­cle AIs.

Stuart_Armstrong25 Aug 2019 17:02 UTC
12 points
2 comments1 min readLW link

In defense of Or­a­cle (“Tool”) AI research

Steven Byrnes7 Aug 2019 19:14 UTC
22 points
11 comments4 min readLW link

Or­a­cles: re­ject all deals—break su­per­ra­tional­ity, with superrationality

Stuart_Armstrong5 Dec 2019 13:51 UTC
20 points
4 comments8 min readLW link

Or­a­cle AI: Hu­man be­liefs vs hu­man values

Stuart_Armstrong22 Jul 2015 11:54 UTC
4 points
14 comments1 min readLW link

Su­per­in­tel­li­gence 15: Or­a­cles, ge­nies and sovereigns

KatjaGrace23 Dec 2014 2:01 UTC
11 points
30 comments7 min readLW link

Re­sults of $1,000 Or­a­cle con­test!

Stuart_Armstrong17 Jun 2020 17:44 UTC
60 points
2 comments1 min readLW link

Reflec­tive or­a­cles and superationality

Stuart_Armstrong18 Nov 2015 12:30 UTC
16 points
0 comments6 min readLW link

An Or­a­cle stan­dard trick

Stuart_Armstrong3 Jun 2015 14:17 UTC
7 points
33 comments1 min readLW link

Co­op­er­a­tive Or­a­cles: Non­ex­ploited Bargaining

Scott Garrabrant3 Jun 2017 0:39 UTC
6 points
6 comments3 min readLW link

Co­op­er­a­tive Or­a­cles: Strat­ified Pareto Op­tima and Al­most Strat­ified Pareto Optima

Scott Garrabrant3 Jun 2017 0:38 UTC
5 points
8 comments4 min readLW link

Co­op­er­a­tive Or­a­cles: Introduction

Scott Garrabrant3 Jun 2017 0:36 UTC
12 points
3 comments2 min readLW link

Prob­a­bil­is­tic Or­a­cle Machines and Nash Equilibria

jessicata6 Feb 2015 1:14 UTC
5 points
0 comments1 min readLW link

Reflec­tive or­a­cles and the pro­cras­ti­na­tion paradox

jessicata26 Mar 2015 22:18 UTC
3 points
4 comments2 min readLW link

Three Or­a­cle designs

Stuart_Armstrong20 Jul 2016 15:16 UTC
2 points
0 comments1 min readLW link

An Or­a­cle stan­dard trick

Stuart_Armstrong3 Jun 2015 14:25 UTC
2 points
0 comments1 min readLW link

Stan­dard ML Or­a­cles vs Coun­ter­fac­tual ones

Stuart_Armstrong10 Oct 2018 20:01 UTC
18 points
5 comments6 min readLW link

A Safer Or­a­cle Setup?

Ofer9 Feb 2018 12:16 UTC
5 points
4 comments4 min readLW link

Multibit re­flec­tive oracles

Benya_Fallenstein25 Jan 2015 2:23 UTC
5 points
0 comments8 min readLW link

Non-ma­nipu­la­tive oracles

Stuart_Armstrong6 Feb 2015 17:05 UTC
3 points
1 comment1 min readLW link

Find­ing re­flec­tive or­a­cle dis­tri­bu­tions us­ing a Kaku­tani map

jessicata2 May 2017 2:12 UTC
1 point
0 comments2 min readLW link

From halt­ing or­a­cles to modal logic

Benya_Fallenstein3 Feb 2015 19:26 UTC
1 point
4 comments6 min readLW link

Op­ti­miser to Oracle

Stuart_Armstrong22 Sep 2015 10:27 UTC
0 points
0 comments1 min readLW link

Coun­ter­fac­tu­als and re­flec­tive oracles

Nisan5 Sep 2018 8:54 UTC
9 points
0 comments6 min readLW link

Search Eng­ines and Oracles

HalMorris8 Jul 2014 14:27 UTC
8 points
8 comments2 min readLW link

Fo­rum Digest: Reflec­tive Oracles

jessicata22 Mar 2015 4:02 UTC
6 points
0 comments3 min readLW link

Re­source-Limited Reflec­tive Oracles

Diffractor6 Jun 2018 2:50 UTC
5 points
1 comment4 min readLW link

Re­source-Limited Reflec­tive Oracles

Diffractor6 Jun 2018 2:50 UTC
5 points
1 comment4 min readLW link

Sim­plic­ity pri­ors with re­flec­tive oracles

Benya_Fallenstein15 Nov 2014 6:39 UTC
1 point
0 comments6 min readLW link

Self-con­firm­ing prophe­cies, and sim­plified Or­a­cle designs

Stuart_Armstrong28 Jun 2019 9:57 UTC
10 points
1 comment5 min readLW link

Safe ques­tions to ask an Or­a­cle?

Stuart_Armstrong27 Jan 2012 18:33 UTC
3 points
41 comments1 min readLW link

UDT in the Land of Prob­a­bil­is­tic Oracles

jessicata8 Feb 2015 9:13 UTC
4 points
1 comment3 min readLW link

Analysing: Danger­ous mes­sages from fu­ture UFAI via Oracles

Stuart_Armstrong22 Nov 2019 14:17 UTC
22 points
16 comments4 min readLW link

An Idea For Cor­rigible, Re­cur­sively Im­prov­ing Math Oracles

jimrandomh20 Jul 2015 3:35 UTC
7 points
5 comments2 min readLW link

Is it pos­si­ble to build a safe or­a­cle AI?

Karl20 Apr 2011 12:54 UTC
1 point
25 comments1 min readLW link

Strat­egy Non­con­vex­ity In­duced by a Choice of Po­ten­tial Oracles

Diffractor27 Jan 2018 0:41 UTC
2 points
0 comments3 min readLW link

Op­ti­miz­ing ar­bi­trary ex­pres­sions with a lin­ear num­ber of queries to a Log­i­cal In­duc­tion Or­a­cle (Car­toon Guide)

Donald Hobson23 Jul 2020 21:37 UTC
4 points
2 comments2 min readLW link

Or­a­cles, In­form­ers, and Controllers

ozziegooen25 May 2021 14:16 UTC
15 points
2 comments3 min readLW link

Self-Su­per­vised Learn­ing and AGI Safety

Steven Byrnes7 Aug 2019 14:21 UTC
29 points
9 comments12 min readLW link

Bounded Or­a­cle Induction

Diffractor28 Nov 2018 8:11 UTC
25 points
0 comments9 min readLW link

Book Re­view: AI Safety and Security

Michaël Trazzi21 Aug 2018 10:23 UTC
51 points
2 comments11 min readLW link

Or­a­cle de­sign as de-black-boxer.

Stuart_Armstrong2 Sep 2016 13:38 UTC
0 points
0 comments1 min readLW link

Or­a­cle ma­chines for au­to­mated philosophy

Nisan17 Feb 2015 15:10 UTC
1 point
1 comment4 min readLW link

Coun­ter­fac­tual Or­a­cles = on­line su­per­vised learn­ing with ran­dom se­lec­tion of train­ing episodes

Wei Dai10 Sep 2019 8:29 UTC
48 points
26 comments3 min readLW link

Epiphe­nom­e­nal Or­a­cles Ig­nore Holes in the Box

SilentCal31 Jan 2018 20:08 UTC
15 points
8 comments2 min readLW link

Or­a­cles, se­quence pre­dic­tors, and self-con­firm­ing predictions

Stuart_Armstrong3 May 2019 14:09 UTC
22 points
0 comments3 min readLW link

Break­ing Or­a­cles: su­per­ra­tional­ity and acausal trade

Stuart_Armstrong25 Nov 2019 10:40 UTC
25 points
15 comments1 min readLW link

Co­op­er­a­tive Oracles

Diffractor1 Sep 2018 8:05 UTC
18 points
9 comments12 min readLW link

Reflec­tive or­a­cles as a solu­tion to the con­verse Law­vere problem

SamEisenstat29 Nov 2018 3:23 UTC
31 points
0 comments7 min readLW link

Reflex­ive Or­a­cles and su­per­ra­tional­ity: Pareto

Stuart_Armstrong24 May 2017 8:35 UTC
14 points
0 comments2 min readLW link

Reflex­ive Or­a­cles and su­per­ra­tional­ity: pris­oner’s dilemma

Stuart_Armstrong24 May 2017 8:34 UTC
14 points
5 comments4 min readLW link

Strat­egy For Con­di­tion­ing Gen­er­a­tive Models

1 Sep 2022 4:34 UTC
31 points
4 comments18 min readLW link

[Question] What is the risk of ask­ing a coun­ter­fac­tual or­a­cle a ques­tion that already had its an­swer erased?

Chris_Leong3 Feb 2023 3:13 UTC
7 points
0 comments1 min readLW link

The al­gorithm isn’t do­ing X, it’s just do­ing Y.

Cleo Nardo16 Mar 2023 23:28 UTC
53 points
43 comments5 min readLW link

Con­di­tion­ing Gen­er­a­tive Models for Alignment

Jozdien18 Jul 2022 7:11 UTC
56 points
8 comments20 min readLW link

Train­ing goals for large lan­guage models

Johannes Treutlein18 Jul 2022 7:09 UTC
28 points
5 comments19 min readLW link

Where Free Will and Deter­minism Meet

David Bravo4 Apr 2023 10:59 UTC
0 points
0 comments3 min readLW link


janus2 Sep 2022 12:45 UTC
611 points
116 comments41 min readLW link

Pro­saic mis­al­ign­ment from the Solomonoff Predictor

Cleo Nardo9 Dec 2022 17:53 UTC
40 points
2 comments5 min readLW link

Proper scor­ing rules don’t guaran­tee pre­dict­ing fixed points

16 Dec 2022 18:22 UTC
62 points
8 comments21 min readLW link

[Question] Or­a­cle AGI—How can it es­cape, other than se­cu­rity is­sues? (Steganog­ra­phy?)

RationalSieve25 Dec 2022 20:14 UTC
3 points
6 comments1 min readLW link

Stop-gra­di­ents lead to fixed point predictions

28 Jan 2023 22:47 UTC
36 points
2 comments24 min readLW link

Un­der­speci­fi­ca­tion of Or­a­cle AI

15 Jan 2023 20:10 UTC
30 points
12 comments19 min readLW link

AI or­a­cles on blockchain

Caravaggio6 Apr 2021 20:13 UTC
5 points
0 comments3 min readLW link

A multi-dis­ci­plinary view on AI safety research

Roman Leventov8 Feb 2023 16:50 UTC
42 points
4 comments26 min readLW link

An­no­tated re­ply to Ben­gio’s “AI Scien­tists: Safe and Use­ful AI?”

Roman Leventov8 May 2023 21:26 UTC
18 points
1 comment7 min readLW link

Some rea­sons why a pre­dic­tor wants to be a consequentialist

Lauro Langosco15 Apr 2022 15:02 UTC
23 points
16 comments5 min readLW link

In­ter­act­ing with a Boxed AI

aphyer1 Apr 2022 22:42 UTC
12 points
19 comments4 min readLW link

An­thro­po­mor­phic AI and Sand­boxed Vir­tual Uni­verses

jacob_cannell3 Sep 2010 19:02 UTC
4 points
124 comments5 min readLW link