Again, you seem to be under the impression I am pushing the MIRI party line. I’m not. I’m not paid money by MIRI, though it would totally be cool if I was since then I’d get to do cool stuff a lot of the time.
Observe all the discussions regarding “Oracle AI” which absolutely doesn’t need to work like a maximiser of something real.
The problem with Oracle AI is that we can intuitively imagine a “man in a box” who functions as a safe Oracle (or an unsafe one, hence the dispute), but nobody has actually proposed a formalized algorithm for an Oracle yet. If someone proposes an algorithm and proves that their algorithm can “talk” (that is: it can convey bytes onto an output stream), can learn about the world given input data in a very general way, but has no optimization criteria of its own… then I’ll believe them and so should you. And that would be awesome, actually, because a safe Oracle would be a great tool for asking questions like, “So actually, how do I build an active-environment Ethical AI?”
At which point you’d be able to build an Ethical AI, and that would be the end of that.
No reflection is necessary for a compiler-like tool to improve itself.
With respect: yes, some kind of specialized reflection logic is necessary. Ordinary programs tend to run on first-order logic. Specialized logic programs and automated theorem proofs run on higher-order logics in which some proofs/programs (those are identical according to the Curry Howard Isomorphism) are incomputable (ie: the prover will loop forever). Which ones are incomputable? Well, self-reflective ones and any others that require reasoning about the reasoning of a Turing-complete computer.
So you could either design your AI to have an internal logic that isn’t even Turing complete (in which case, it’ll obviously get ground to dust by Turing complete “enemies”), or you can find some way to let it reason self-reflectively.
The current MIRI approach to this issue is probabilistic: prove that one can bound the probability of a self-reflective proposition to within 1.0 - epsilon, for an arbitrarily small epsilon. That would be your “acceptable risk level”. This would let you do things like, say, design AGIs that can improve themselves in super-Goedelian/Turing Complete ways (ie: they can prove the safety of self-improvements that involve logics of a higher order than first-order) while only having their existing goals or beliefs “go wrong” once in a quadrillion gajillion years or whatever.
You are correct that if a self-rewrite’s optimality can be proven within first-order logic, of course, then any old agent can do it. But a great many problems in fields like, say, compilers, static analysis, theorem proving, programming language semantics, etc are actually more complex than first-order logic can handle. (This is basically what I have my actual, professional training in, at a half-decent level, so I know this.)
Without both theorem-proving and higher-order logics, you would basically just have to go do something like, try to write some speedups for your own code, and then realize you can’t actually trust the GNU C Compiler to recompile you faithfully. Since there have been backdoors in C compilers before, this would be a legitimate worry for an AI to have.
There areverified compilers, but oh shit, those require that logic above the first order in order to understand the verification! I mean, you do want to verify that the New You really is you, don’t you? You don’t want to just sit back and trust that your self-rewrite succeeded, right, and that it didn’t make you believe things are more likely to happen when they’ve never happened before?
Brief reply—thanks for the interesting conversation but I am probably going to be busier over the next days (basically I had been doing contract work where I have to wait on stuff, which makes me spend time on-line).
re: oracle
The failure modes of something that’s not quite right (time-wiring we discussed, heh, it definitely needs a good name) don’t have to be as bad as ‘kills everyone’.
Dismissal of possibility of oracle gone as far as arguments that something which amounts to literally an approximate argmax would kill everyone because it would convert universe to computronium to be a better argmax. That is clearly silly. I presume this is not at all what you’re speaking about.
I’m not entirely sure what your idea of oracle is supposed to do, though. Metaphorically speaking—provide me with a tea recipe if I ask “how to make tea”?
So, for the given string Q you need to output a string A so that some answer fitness function f(Q,A) is maximized. I don’t see why it has to involve some tea-seeking utility function over expected futures. Granted, we don’t know what a good f looks like, but we don’t know how to define tea as a function over the gluons and quarks either. edit: and at least we could learn a lot of properties of f from snooped conversations between humans.
I think the issue here is that agency is an ontologically basic thing in humans, and so there’s very strong tendency to try to “reduce” anything that is kind of sort of intelligent, to an agency. Or on your words, a man in a box.
I see the “oracle” as a component of composite intelligence, which needs to communicate with another component of said intelligence in a pre-existing protocol.
re: reflection, what I meant is that a piece of advanced optimization software—implementing higher order logic, or doing a huge amount of empirical-ish testing—can be run with it’s own source as input, instead of “understanding” correspondence between some real world object and it’s self, and doing instrumental self improvement. Sorry if I was not clear. The “improver” works on a piece of code, in an abstract fashion, caring not if that piece is itself or anything else.
I’m not entirely sure what your idea of oracle is supposed to do, though. Metaphorically speaking—provide me with a tea recipe if I ask “how to make tea”?
Bingo. Without doing anything else other than answering your question.
So, for the given string Q you need to output a string A so that some answer fitness function f(Q,A) is maximized. I don’t see why it has to involve some tea-seeking utility function over expected futures. Granted, we don’t know what a good f looks like, but we don’t know how to define tea as a function over the gluons and quarks either. edit: and at least we could learn a lot of properties of f from snooped conversations between humans.
Yes, that model is a good model. There would be some notion of “answer fitness for the question”, which the agent learns from and tries to maximize. This would be basically a reinforcement learner with text-only output. “Wireheading” would be a form of overfitting, and the question would then be reduced to: can a not-so-super intelligence still win the AI Box Game even while giving its creepy mind-control signals in the form of tea recipes?
Bingo. Without doing anything else other than answering your question.
I think the important criterion is lack of extensive optimization of what it says for the sake of creation of tea or other real world goal. The reason I can’t really worry about all that is that I don’t think a “lack of extensive search” is hard to ensure in actual engineered solutions (built on limited hardware), even if it is very unwieldy to express in simple formalisms that specify an iteration over all possible answers. The optimization to make the general principle work on limited hardware requires to cull the search.
There’s no formalization of Siri that’s substantially simpler than the actual implementation, either. I don’t think ease of making a simple formal model at all corresponds with likelihood of actual construction, especially when formal models do grossly bruteforce things (making their actual implementation require a lot of effort and be predicated on precisely the ability to formalize restricted solutions and restricted ontologies).
If we can allow non-natural language communication: you can express goals such as “find a cure for cancer” as a functions over fixed, limited model of the world, and apply said actions inside the model (where you can watch how it works).
Let’s suppose that in the step 1 we learn a model of the world, say, in Solomonoff Induction—ish way. In practice with the controls over what sort of precision we need and where, because our computer’s computational power is usually a microscopic fraction of what it’s trying to predict. In the step 2, we find an input to the model that puts the model into desired state. We don’t have a real world manipulator linked up to the model, and we don’t update the model. Instead we have a visualizer (which can be set up even in an opaque model by requiring it to learn to predict a view from arbitrarily moveable camera).
Again, you seem to be under the impression I am pushing the MIRI party line. I’m not. I’m not paid money by MIRI, though it would totally be cool if I was since then I’d get to do cool stuff a lot of the time.
Your argument has been made before, and was basically correct.
The problem with Oracle AI is that we can intuitively imagine a “man in a box” who functions as a safe Oracle (or an unsafe one, hence the dispute), but nobody has actually proposed a formalized algorithm for an Oracle yet. If someone proposes an algorithm and proves that their algorithm can “talk” (that is: it can convey bytes onto an output stream), can learn about the world given input data in a very general way, but has no optimization criteria of its own… then I’ll believe them and so should you. And that would be awesome, actually, because a safe Oracle would be a great tool for asking questions like, “So actually, how do I build an active-environment Ethical AI?”
At which point you’d be able to build an Ethical AI, and that would be the end of that.
With respect: yes, some kind of specialized reflection logic is necessary. Ordinary programs tend to run on first-order logic. Specialized logic programs and automated theorem proofs run on higher-order logics in which some proofs/programs (those are identical according to the Curry Howard Isomorphism) are incomputable (ie: the prover will loop forever). Which ones are incomputable? Well, self-reflective ones and any others that require reasoning about the reasoning of a Turing-complete computer.
So you could either design your AI to have an internal logic that isn’t even Turing complete (in which case, it’ll obviously get ground to dust by Turing complete “enemies”), or you can find some way to let it reason self-reflectively.
The current MIRI approach to this issue is probabilistic: prove that one can bound the probability of a self-reflective proposition to within 1.0 - epsilon, for an arbitrarily small epsilon. That would be your “acceptable risk level”. This would let you do things like, say, design AGIs that can improve themselves in super-Goedelian/Turing Complete ways (ie: they can prove the safety of self-improvements that involve logics of a higher order than first-order) while only having their existing goals or beliefs “go wrong” once in a quadrillion gajillion years or whatever.
You are correct that if a self-rewrite’s optimality can be proven within first-order logic, of course, then any old agent can do it. But a great many problems in fields like, say, compilers, static analysis, theorem proving, programming language semantics, etc are actually more complex than first-order logic can handle. (This is basically what I have my actual, professional training in, at a half-decent level, so I know this.)
Without both theorem-proving and higher-order logics, you would basically just have to go do something like, try to write some speedups for your own code, and then realize you can’t actually trust the GNU C Compiler to recompile you faithfully. Since there have been backdoors in C compilers before, this would be a legitimate worry for an AI to have.
There are verified compilers, but oh shit, those require that logic above the first order in order to understand the verification! I mean, you do want to verify that the New You really is you, don’t you? You don’t want to just sit back and trust that your self-rewrite succeeded, right, and that it didn’t make you believe things are more likely to happen when they’ve never happened before?
Brief reply—thanks for the interesting conversation but I am probably going to be busier over the next days (basically I had been doing contract work where I have to wait on stuff, which makes me spend time on-line).
re: oracle
The failure modes of something that’s not quite right (time-wiring we discussed, heh, it definitely needs a good name) don’t have to be as bad as ‘kills everyone’.
Dismissal of possibility of oracle gone as far as arguments that something which amounts to literally an approximate argmax would kill everyone because it would convert universe to computronium to be a better argmax. That is clearly silly. I presume this is not at all what you’re speaking about.
I’m not entirely sure what your idea of oracle is supposed to do, though. Metaphorically speaking—provide me with a tea recipe if I ask “how to make tea”?
So, for the given string Q you need to output a string A so that some answer fitness function f(Q,A) is maximized. I don’t see why it has to involve some tea-seeking utility function over expected futures. Granted, we don’t know what a good f looks like, but we don’t know how to define tea as a function over the gluons and quarks either. edit: and at least we could learn a lot of properties of f from snooped conversations between humans.
I think the issue here is that agency is an ontologically basic thing in humans, and so there’s very strong tendency to try to “reduce” anything that is kind of sort of intelligent, to an agency. Or on your words, a man in a box.
I see the “oracle” as a component of composite intelligence, which needs to communicate with another component of said intelligence in a pre-existing protocol.
re: reflection, what I meant is that a piece of advanced optimization software—implementing higher order logic, or doing a huge amount of empirical-ish testing—can be run with it’s own source as input, instead of “understanding” correspondence between some real world object and it’s self, and doing instrumental self improvement. Sorry if I was not clear. The “improver” works on a piece of code, in an abstract fashion, caring not if that piece is itself or anything else.
Bingo. Without doing anything else other than answering your question.
Yes, that model is a good model. There would be some notion of “answer fitness for the question”, which the agent learns from and tries to maximize. This would be basically a reinforcement learner with text-only output. “Wireheading” would be a form of overfitting, and the question would then be reduced to: can a not-so-super intelligence still win the AI Box Game even while giving its creepy mind-control signals in the form of tea recipes?
I think the important criterion is lack of extensive optimization of what it says for the sake of creation of tea or other real world goal. The reason I can’t really worry about all that is that I don’t think a “lack of extensive search” is hard to ensure in actual engineered solutions (built on limited hardware), even if it is very unwieldy to express in simple formalisms that specify an iteration over all possible answers. The optimization to make the general principle work on limited hardware requires to cull the search.
There’s no formalization of Siri that’s substantially simpler than the actual implementation, either. I don’t think ease of making a simple formal model at all corresponds with likelihood of actual construction, especially when formal models do grossly bruteforce things (making their actual implementation require a lot of effort and be predicated on precisely the ability to formalize restricted solutions and restricted ontologies).
If we can allow non-natural language communication: you can express goals such as “find a cure for cancer” as a functions over fixed, limited model of the world, and apply said actions inside the model (where you can watch how it works).
Let’s suppose that in the step 1 we learn a model of the world, say, in Solomonoff Induction—ish way. In practice with the controls over what sort of precision we need and where, because our computer’s computational power is usually a microscopic fraction of what it’s trying to predict. In the step 2, we find an input to the model that puts the model into desired state. We don’t have a real world manipulator linked up to the model, and we don’t update the model. Instead we have a visualizer (which can be set up even in an opaque model by requiring it to learn to predict a view from arbitrarily moveable camera).