I feel that is correct. I feel that is proved etc.
Regardless of the answer, it will ultimately involve our minds expressing a preference. We cannot escape our psychology. If our minds are deterministic computational machines within a universe without any objective value, all our goals are merely elaborate ways to make us feel content with our choices and a possibly inconsistent set of mental motivations. Attempting to model our psychology seems like the most efficient way to solve this problem. Is the idea that there is some other kind of answer? How would could it be shown to be legitimate?
I suspect that the desire for another answer is preventing practical progress in creating any meaningful solution. There are many problems and goals that would be relatively uncontroversial for an AI system to attempt to address. The outcome of the work need only be better than what we currently have to be useful we don’t have to solve all problems before addressing some of them and indeed without attempting to address some of them I doubt we will make significant progress on the rest.
If our minds are deterministic computational machines within a universe without any objective value, all our goals are merely elaborate ways to make us feel content with our choices and a possibly inconsistent set of mental motivations. Attempting to model our psychology seems like the most efficient way to solve this problem.
Which problem? You need to define which action should AI choose, in whatever problem it’s solving, including the problems that are not humanly comprehensible. This is naturally done in terms of actual humans with all their psychology (as the only available source of sufficiently detailed data about what we want), but it’s not at all clear in what way you’d want to use (interpret) that human data.
“Attempting to model psychology” doesn’t answer any questions. Assume you have a proof-theoretic oracle and a million functioning uploads living in a virtual world however structured, so that you can run any number of experiments involving them, restart these experiments, infer the properties of whole infinite collections of such experiments and so on. You still won’t know how to even approach creating a FAI.
If there is an answer to the problem of creating an FAI, it will result from a number of discussions and ideas that lead a set of people to agreeing that a particular course of action is a good one. By modelling psychology it will be possible to determine all the ways this can be done. The question then is why choose one over any of the others? As soon as one is chosen it will work and everyone will go along with it. How could we rate each one? (they would all be convincing by definition). Is it meaningful to compare them? Is the idea that there is some transcendent answer that is correct or important that doesn’t boil down to what is convincing to people?
Understanding the actual abstract reasons for agents’ decisions (such as decisions about agreeing with a given argument) seems to me a promising idea, I’m trying to make progress on that (agents’ decisions don’t need to be correct or well-defined on most inputs for the reasons behind their more well-defined behaviors to lead the way to figuring out what to do in other situations or what should be done where the agents err). Note that if you postulate an algorithm that makes use of humans as its elements, you’d still have the problems of failure modes, regret for bad design decisions and of the capability to answer humanly incomprehensible questions, and these problems need to be already solved before you start the thing up.
Interesting, if I understand correctly the idea is to find a theoretically correct basis for deciding on a course of action given existing knowledge and then to make this calculation efficient and then direct towards a formally defined objective.
As distinct from a system which potentially sub optimally, attempts solutions and tries to learn improved strategies. i.e. one in which the theoretical basis for decision making is ultimately discovered by the agent over time (e.g. as we have done with the development of probability theory). I think the perspective I’m advocating is to produce a system that is more like an advanced altruistic human (with a lot of evolutionary motivations removed) than a provably correct machine. Ideally such a system could itself propose solutions to the FAI problem that would be convincing, as a result of an increasingly sophisticated understanding of human reasoning and motivations.
I realise there is a fear that such a system could develop convincing yet manipulative solutions. However the output need only be more trustworthy than a human’s response to be legitimate (for example based on an analysis of its reasoning algorithm it appears to lack a Machiavellian capability, unlike humans).
Or put another way, can a robot Vladimir (Eliezer etc.) be made that solves the problem faster than their human counterparts do. And is there any reason to think this process is less safe (particularly when AI developments will continue regardless)?
Interesting, if I understand correctly the idea is to find a theoretically correct basis for deciding on a course of action given existing knowledge and then to make this calculation efficient and then direct towards a formally defined objective.
Yes, but there is only one top-level objective, to do the right thing, so one doesn’t need to define an objective separately from the goal system itself (and improving state of knowledge is just another thing one can do to accomplish the goal, so again not a separate issue).
FAI really stands for a method of efficient production of goodness, as we would want it produced, and there are many landmines on this path, in particular humanity in its current form doesn’t seem to be able to retain its optimization goal in the long run, and the same applies to most obvious hacks that don’t have explicit notions of preference, such as upload societies. It’s not just a question of speed, but also of ability to retain the original goal after quadrillions of incompletely understood self-modifications.
The current approach is to have a number of human intelligences continue to explore this problem until they enter a mental state C (for convinced they have the answer to FAI). The next stage is to implement it.
We have no other route to knowledge other than to use our internal sense of being convinced. I.e. no oracle to tell us if we are right or not.
So what if we formally define what this mental state C consists of and then construct a GAI which provably pursues only the objective of creating this state. The advantage being that we now have a means of judging our progress because we have a formally defined measurable criteria for success. (In fact this process is a valuable goal regardless of the use of AI but it now makes it possible to use AI techniques to solve it).
When I say feel, I include:
I feel that is correct. I feel that is proved etc.
Regardless of the answer, it will ultimately involve our minds expressing a preference. We cannot escape our psychology. If our minds are deterministic computational machines within a universe without any objective value, all our goals are merely elaborate ways to make us feel content with our choices and a possibly inconsistent set of mental motivations. Attempting to model our psychology seems like the most efficient way to solve this problem. Is the idea that there is some other kind of answer? How would could it be shown to be legitimate?
I suspect that the desire for another answer is preventing practical progress in creating any meaningful solution. There are many problems and goals that would be relatively uncontroversial for an AI system to attempt to address. The outcome of the work need only be better than what we currently have to be useful we don’t have to solve all problems before addressing some of them and indeed without attempting to address some of them I doubt we will make significant progress on the rest.
Which problem? You need to define which action should AI choose, in whatever problem it’s solving, including the problems that are not humanly comprehensible. This is naturally done in terms of actual humans with all their psychology (as the only available source of sufficiently detailed data about what we want), but it’s not at all clear in what way you’d want to use (interpret) that human data.
“Attempting to model psychology” doesn’t answer any questions. Assume you have a proof-theoretic oracle and a million functioning uploads living in a virtual world however structured, so that you can run any number of experiments involving them, restart these experiments, infer the properties of whole infinite collections of such experiments and so on. You still won’t know how to even approach creating a FAI.
If there is an answer to the problem of creating an FAI, it will result from a number of discussions and ideas that lead a set of people to agreeing that a particular course of action is a good one. By modelling psychology it will be possible to determine all the ways this can be done. The question then is why choose one over any of the others? As soon as one is chosen it will work and everyone will go along with it. How could we rate each one? (they would all be convincing by definition). Is it meaningful to compare them? Is the idea that there is some transcendent answer that is correct or important that doesn’t boil down to what is convincing to people?
Understanding the actual abstract reasons for agents’ decisions (such as decisions about agreeing with a given argument) seems to me a promising idea, I’m trying to make progress on that (agents’ decisions don’t need to be correct or well-defined on most inputs for the reasons behind their more well-defined behaviors to lead the way to figuring out what to do in other situations or what should be done where the agents err). Note that if you postulate an algorithm that makes use of humans as its elements, you’d still have the problems of failure modes, regret for bad design decisions and of the capability to answer humanly incomprehensible questions, and these problems need to be already solved before you start the thing up.
Interesting, if I understand correctly the idea is to find a theoretically correct basis for deciding on a course of action given existing knowledge and then to make this calculation efficient and then direct towards a formally defined objective.
As distinct from a system which potentially sub optimally, attempts solutions and tries to learn improved strategies. i.e. one in which the theoretical basis for decision making is ultimately discovered by the agent over time (e.g. as we have done with the development of probability theory). I think the perspective I’m advocating is to produce a system that is more like an advanced altruistic human (with a lot of evolutionary motivations removed) than a provably correct machine. Ideally such a system could itself propose solutions to the FAI problem that would be convincing, as a result of an increasingly sophisticated understanding of human reasoning and motivations.
I realise there is a fear that such a system could develop convincing yet manipulative solutions. However the output need only be more trustworthy than a human’s response to be legitimate (for example based on an analysis of its reasoning algorithm it appears to lack a Machiavellian capability, unlike humans).
Or put another way, can a robot Vladimir (Eliezer etc.) be made that solves the problem faster than their human counterparts do. And is there any reason to think this process is less safe (particularly when AI developments will continue regardless)?
Yes, but there is only one top-level objective, to do the right thing, so one doesn’t need to define an objective separately from the goal system itself (and improving state of knowledge is just another thing one can do to accomplish the goal, so again not a separate issue).
FAI really stands for a method of efficient production of goodness, as we would want it produced, and there are many landmines on this path, in particular humanity in its current form doesn’t seem to be able to retain its optimization goal in the long run, and the same applies to most obvious hacks that don’t have explicit notions of preference, such as upload societies. It’s not just a question of speed, but also of ability to retain the original goal after quadrillions of incompletely understood self-modifications.
Ok, so how about this work around.
The current approach is to have a number of human intelligences continue to explore this problem until they enter a mental state C (for convinced they have the answer to FAI). The next stage is to implement it.
We have no other route to knowledge other than to use our internal sense of being convinced. I.e. no oracle to tell us if we are right or not.
So what if we formally define what this mental state C consists of and then construct a GAI which provably pursues only the objective of creating this state. The advantage being that we now have a means of judging our progress because we have a formally defined measurable criteria for success. (In fact this process is a valuable goal regardless of the use of AI but it now makes it possible to use AI techniques to solve it).