Here are some submission examples (note I’m not saying they are good examples, just well formatted; the Edits in the third submission are deliberate):
“Submission. For the counterfactual Oracle, ask the Oracle what Google’s stock price will be next month (counterfactually if we didn’t see the Oracle’s answer). In that case, loss function computed as ||predicted price—actual price after erasure||^2. If we don’t see the answer, the programmers are assumed to not ask the question again for a month, neither to this Oracle nor to any other. This will demonstrate the true value of Google, and can ultimately be used to remove noise from the stock market.
“Submission: low-bandwidth oracle. Give it a list of a thousand companies, and ask which one will most increase in value in percentage terms over the week. At the end of the month, rank the companies by percentage increase. Loss function is rank of the company the oracle selected. Programmer will try to invest in selected company, but will do it discreetly. This will help to gather resources for AI safety research.
“Submission. Ask the low bandwidth Oracle which of my friends will surprise me most this fortnight. It choose from a list of friends; I’ll decide which one surprises me most. Loss function is 1 if it choose the wrong friend, 0 if it choose the right one. This will help me figure out myself and my social circle, and better focus on AI safety. The risk is low because none of my friends are particularly important, positively or negatively, to the world. EDIT: To be clear, I also want to use this to figure out what the word “surprise” means to me, and what the AI predicts it will mean to me. EDIT 2: People have pointed out that it might be dangerous to have the AI construct my own meanings for categories, but it only has three bits or so of optimisation power (I don’t have that many friends :-(, so it’s mainly me thinking this through, not the AI manipulating me.
Here are some submission examples (note I’m not saying they are good examples, just well formatted; the Edits in the third submission are deliberate):
“Submission. For the counterfactual Oracle, ask the Oracle what Google’s stock price will be next month (counterfactually if we didn’t see the Oracle’s answer). In that case, loss function computed as ||predicted price—actual price after erasure||^2. If we don’t see the answer, the programmers are assumed to not ask the question again for a month, neither to this Oracle nor to any other. This will demonstrate the true value of Google, and can ultimately be used to remove noise from the stock market.
“Submission: low-bandwidth oracle. Give it a list of a thousand companies, and ask which one will most increase in value in percentage terms over the week. At the end of the month, rank the companies by percentage increase. Loss function is rank of the company the oracle selected. Programmer will try to invest in selected company, but will do it discreetly. This will help to gather resources for AI safety research.
“Submission. Ask the low bandwidth Oracle which of my friends will surprise me most this fortnight. It choose from a list of friends; I’ll decide which one surprises me most. Loss function is 1 if it choose the wrong friend, 0 if it choose the right one. This will help me figure out myself and my social circle, and better focus on AI safety. The risk is low because none of my friends are particularly important, positively or negatively, to the world. EDIT: To be clear, I also want to use this to figure out what the word “surprise” means to me, and what the AI predicts it will mean to me. EDIT 2: People have pointed out that it might be dangerous to have the AI construct my own meanings for categories, but it only has three bits or so of optimisation power (I don’t have that many friends :-(, so it’s mainly me thinking this through, not the AI manipulating me.