I vaguely remember seeing a stop button problem solution (partial) that utilized internal betting markets on LessWrong years ago, but have not been able to find it since. Does anyone else know what I’m talking about?
“Internal betting markets” may be a reference to the Logical Induction paper? Unsure it ties strongly to stop-button/corrigibility.
I vaguely remember seeing a stop button problem solution (partial) that utilized internal betting markets on LessWrong years ago, but have not been able to find it since. Does anyone else know what I’m talking about?
“Internal betting markets” may be a reference to the Logical Induction paper? Unsure it ties strongly to stop-button/corrigibility.