avturchin comments on Contest: $1,000 for good questions to ask to an Oracle AI

avturchin 2 Jul 2019 6:48 UTC
6 points
0
Suggestion for low bandwidth OAI:
General principle: Use the Oracle AI where the true answer is easily checkable and profitable, and no human person will ever read the answer, so there is no informational hazard that untrue answer will have some dangerous information in it.
Example: There are many bitcoin valets’ passwords for which are forgotten by the owners. OAI could guess the passwords, and owners will pay a share of money from the valet to get the rest. Moreover, nobody will read the password, as it will be copy-pasted automatically from OAI into the valet. The money could be used for AI safety research.
- Stuart_Armstrong 3 Jul 2019 11:52 UTC
  4 points
  0
  Parent
  This is against the technical definition of low bandwidth (small space of possible solutions), but somewhat in the spirit (low bandwidth for solutions humans get to see). I’ll accept it as a submission.
- Liam Donovan 2 Jul 2019 20:08 UTC
  3 points
  0
  Parent
  How would this be low-bandwidth? If we’re able to give the oracle a list of passwords to guess from, can’t we just check them all?
  - avturchin 3 Jul 2019 1:15 UTC
    1 point
    0
    Parent
    We don’t give it a list of passwords. We give it only a a valet address, and it generates the password. Here the output is low-bandwidth, but the Oracle may be either guessing passwords using some quantum cryptography tools, and in that case the input is also relatively low-bandwidth – or, and that case it needs to simulate all human history, by guessing psychological processes in the valet owner’s mind. But in the second case it needs to have access to all the data of internet, which is risky.
- avturchin 3 Jul 2019 1:23 UTC
  1 point
  0
  Parent
  Another variant for checkable and profitable Orcale: use the AI to solve math problems, publish the solutions and earn money on it. In that case the input may be very short, like “prove Goldbach conjecture”, but output could be arbitrary long and may be dangerous, if there are some encoded messages. To escape it, we need another Oracle, which only checks proofs on some formal language.
  Here is assumed that if the proof is true and is in a formal language, there is no hidden messages in the text.
  Another way to check proofs is to run two slightly different non-communicating Oracles and compare outputs.
  - Stuart_Armstrong 3 Jul 2019 11:54 UTC
    2 points
    0
    Parent
    
    Here is assumed that if the proof is true and is in a formal language, there is no hidden messages in the text.
    
    That is never something safe to assume. I can write formally correct proofs that contain hidden messages quite easily—add extra lemmas and extra steps. Unless we’re very smart, it would be hard for us to detect which steps are unnecessary and which are needed, especially if it rewrites the main proof thread somewhat.
    
    Another way to check proofs is to run two slightly different non-communicating Oracles and compare outputs.
    
    I’ll accept that as a part of a submission if a) you develop it more, in a formal way, and b) you repost it as a top level comment.