Contest: $1,000 for good questions to ask to an Oracle AI

Edit: con­test closed now, will start as­sess­ing the en­tries.

The contest

I’m offer­ing $1,000 for good ques­tions to ask of AI Or­a­cles. Good ques­tions are those that are safe and use­ful: that al­lows us to get in­for­ma­tion out of the Or­a­cle with­out in­creas­ing risk.

To en­ter, put your sug­ges­tion in the com­ments be­low. The con­test ends at the end[1] of the 31st of Au­gust, 2019.

Oracles

A peren­nial sug­ges­tion for a safe AI de­sign is the Or­a­cle AI: an AI con­fined to a sand­box of some sort, that in­ter­acts with the world only by an­swer­ing ques­tions.

This is, of course, not safe in gen­eral; an Or­a­cle AI can in­fluence the world through the con­tents of its an­swers, al­low­ing it to po­ten­tially es­cape the sand­box.

Two of the safest de­signs seem to be the coun­ter­fac­tual Or­a­cle, and the low band­width Or­a­cle. Th­ese are de­tailed here, here, and here, but in short:

  • A coun­ter­fac­tual Or­a­cle is one whose ob­jec­tive func­tion (or re­ward, or loss func­tion) is only non-triv­ial in wor­lds where its an­swer is not seen by hu­mans. Hence it has no mo­ti­va­tion to ma­nipu­late hu­mans through its an­swer.

  • A low band­width Or­a­cle is one that must se­lect its an­swers off a rel­a­tively small list. Though this an­swer is a self-con­firm­ing pre­dic­tion, the nega­tive effects and po­ten­tial for ma­nipu­la­tion is re­stricted be­cause there are only a few pos­si­ble an­swers available.

Note that both of these Or­a­cles are de­signed to be epi­sodic (they are run for sin­gle epi­sodes, get their re­wards by the end of that epi­sode, aren’t asked fur­ther ques­tions be­fore the epi­sode ends, and are only mo­ti­vated to best perform on that one epi­sode), to avoid in­cen­tives to longer term ma­nipu­la­tion.

Get­ting use­ful answers

The coun­ter­fac­tual and low band­width Or­a­cles are safer than un­re­stricted Or­a­cles, but this safety comes at a price. The price is that we can no longer “ask” the Or­a­cle any ques­tion we feel like, and we cer­tainly can’t have long dis­cus­sions to clar­ify terms and so on. For the coun­ter­fac­tual Or­a­cle, the an­swer might not even mean any­thing real to us—it’s about an­other world, that we don’t in­habit.

De­spite this, its pos­si­ble to get a sur­pris­ing amount of good work out of these de­signs. To give one ex­am­ple, sup­pose we want to fund var­i­ous one of a mil­lion pro­jects on AI safety, but are un­sure which one would perform bet­ter. We can’t di­rectly ask ei­ther Or­a­cle, but there are in­di­rect ways of get­ting ad­vice:

  • We could ask the low band­width Or­a­cle which team A we should fund; we then choose a team B at ran­dom, and re­ward the Or­a­cle if, at the end of a year, we judge A to have performed bet­ter than B.

  • The coun­ter­fac­tual Or­a­cle can an­swer a similar ques­tion, in­di­rectly. We com­mit that, if we don’t see its an­swer, we will se­lect team A and team B at ran­dom and fund them for year, and com­pare their perfor­mance at the end of the year. We then ask for which team A[2] it ex­pects to most con­sis­tently out­perform any team B.

Both these an­swers get around some of the re­stric­tions by defer­ring to the judge­ment of our fu­ture or coun­ter­fac­tual selves, av­er­aged across many ran­domised uni­verses.

But can we do bet­ter? Can we do more?

Your bet­ter questions

This is the pur­pose of this con­test: for you to pro­pose ways of us­ing ei­ther Or­a­cle de­sign to get the most safe-but-use­ful work.

So I’m offer­ing $1,000 for in­ter­est­ing new ques­tions we can ask of these Or­a­cles. Of this:

  • $350 for the best ques­tion to ask a coun­ter­fac­tual Or­a­cle.

  • $350 for the best ques­tion to ask a low band­width Or­a­cle.

  • $300 to be dis­tributed as I see fit among the non-win­ning en­tries; I’ll be mainly look­ing for in­no­va­tive and in­ter­est­ing ideas that don’t quite work.

Ex­cep­tional re­wards go to those who open up a whole new cat­e­gory of use­ful ques­tions.

Ques­tions and criteria

Put your sug­gested ques­tions in the com­ment be­low. Be­cause of the illu­sion of trans­parency, it is bet­ter to ex­plain more rather than less (within rea­son).

Com­ments that are sub­mis­sions must be on their sep­a­rate com­ment threads, start with “Sub­mis­sion”, and you must spec­ify which Or­a­cle de­sign you are sub­mit­ting for. You may sub­mit as many as you want; I will still delete them if I judge them to be spam. Any­one can com­ment on any sub­mis­sion. I may choose to ask for clar­ifi­ca­tions on your de­sign; you may also choose to edit the sub­mis­sion to add clar­ifi­ca­tions (la­bel these as ed­its).

It may be use­ful for you to in­clude de­tails of the phys­i­cal setup, what the Or­a­cle is try­ing to max­imise/​min­imise/​pre­dict and what the coun­ter­fac­tual be­havi­our of the Or­a­cle users hu­mans are as­sumed to be (in the coun­ter­fac­tual Or­a­cle setup). Ex­pla­na­tions as to how your de­sign is safe or use­ful could be helpful, un­less it’s ob­vi­ous. Some short ex­am­ples can be found here.

EDIT af­ter see­ing some of the an­swers: de­cide on the length of each epi­sode, and how the out­come is calcu­lated. The Or­a­cle is run once an epi­sode only (and other Or­a­cles can’t gen­er­ally be used on the same prob­lem; if you want to run mul­ti­ple Or­a­cles, you have to jus­tify why this would work), and has to get ob­jec­tive/​loss/​re­ward by the end of that epi­sode, which there­fore has to be es­ti­mated in some way at that point.


  1. A note on time­zones: as long as it’s still the 31 of Au­gust, any­where in the world, your sub­mis­sion will be counted. ↩︎

  2. Th­ese kind of con­di­tional ques­tions can be an­swered by a coun­ter­fac­tual Or­a­cle, see the pa­per here for more de­tails. ↩︎