Just to be clear, you are proposing that mere friendliness is insufficient, and we also want optimality with respect to getting as much of the cosmos as we can? This seems contained in friendliness, but OK. You are not proposing that optimally taking over the universe is sufficient for friendliness, right?
white box metaphilosophy
I’ve been thinking a lot about this, and I also think this is most likely to work. On general principle, understanding the problem and indirectly solving it is more promising than trying to solve the problem directly without knowing what it is. If we do the directly normative or black-box approach without knowing what the problem is, how do we even know that it is solved.
I would amend, though, that just the nature of metaphilosophy is not going to be enough and there will have to be a certain level of blackboxiness in that certain questions (ie what is good) are only answerable with respect to human brains. I’m unsure if this falls under what you mean by blackbox approaches, though.
i just want to make clear that white-box is not coming up with some simple theory of how all of everything can be derived from pure reason, and more like a relatively simple theory of how the structure of the human brain, human culture, physics, logic, and so on relate to the answer to the philosophical questions.
More generally, there is a spectrum of how meta you go on what the problem is:
Directly going about life is the lowest level
Thinking about what you want and being a bit strategic
Realizing that you want a powerful agent acting in your interest, and building an AI to solve that problem. (Your normative AI)
explicitly modelling the reasoning you did to come up with parts of your AI, except doing it better with less constraints (your black box metaphilosophy)
explicitly asking “what general problems are we solving when doing philosophy, and how would we do that in general?”, and building that process (your white-box metaphilosophy)
Something even further?
This spectrum generalizes to other problems, for example I use it a lot in my engineering work. “What problem are we solving?” is an extremely valuable question to answer, IMO.
This relates to the expert-at/expert-on dichotomy and discussing problems before proposing solutions and so on.
Anyways, I think it is a mostly smooth spectrum of increasing understanding of what exactly the problem is, and we want to position ourselves optimally on that, rather than a trichotomy.
For going to the grocery store, going full meta is probably useless, but for FAI, I agree that we should go as far as we can towards a full understanding of “what problem are we solving, and how would we get a solution, in principle?”. And then get the AI to do the actual solving, because the details are likely beyond human ability.
Just to be clear, you are proposing that mere friendliness is insufficient, and we also want optimality with respect to getting as much of the cosmos as we can? This seems contained in friendliness, but OK. You are not proposing that optimally taking over the universe is sufficient for friendliness, right?
I’ve been thinking a lot about this, and I also think this is most likely to work. On general principle, understanding the problem and indirectly solving it is more promising than trying to solve the problem directly without knowing what it is. If we do the directly normative or black-box approach without knowing what the problem is, how do we even know that it is solved.
I would amend, though, that just the nature of metaphilosophy is not going to be enough and there will have to be a certain level of blackboxiness in that certain questions (ie what is good) are only answerable with respect to human brains. I’m unsure if this falls under what you mean by blackbox approaches, though.
i just want to make clear that white-box is not coming up with some simple theory of how all of everything can be derived from pure reason, and more like a relatively simple theory of how the structure of the human brain, human culture, physics, logic, and so on relate to the answer to the philosophical questions.
More generally, there is a spectrum of how meta you go on what the problem is:
Directly going about life is the lowest level
Thinking about what you want and being a bit strategic
Realizing that you want a powerful agent acting in your interest, and building an AI to solve that problem. (Your normative AI)
explicitly modelling the reasoning you did to come up with parts of your AI, except doing it better with less constraints (your black box metaphilosophy)
explicitly asking “what general problems are we solving when doing philosophy, and how would we do that in general?”, and building that process (your white-box metaphilosophy)
Something even further?
This spectrum generalizes to other problems, for example I use it a lot in my engineering work. “What problem are we solving?” is an extremely valuable question to answer, IMO.
This relates to the expert-at/expert-on dichotomy and discussing problems before proposing solutions and so on.
Anyways, I think it is a mostly smooth spectrum of increasing understanding of what exactly the problem is, and we want to position ourselves optimally on that, rather than a trichotomy.
For going to the grocery store, going full meta is probably useless, but for FAI, I agree that we should go as far as we can towards a full understanding of “what problem are we solving, and how would we get a solution, in principle?”. And then get the AI to do the actual solving, because the details are likely beyond human ability.