… I will not be responding further because the confidence you’re displaying is not in line with (my sense of) LessWrong’s bare minimum standard of quality for assertion. You seem not to be bothering at all with questions like “why, specifically, do I believe what I believe?” or “how would I notice if I were wrong?”
I read the above as, essentially, saying “I know that an ASI will behave a certain way because I just thought about it and told myself that it would, and now I’m using that conclusion as evidence.” (I’m particularly pointing at “as we learn from this particular example.”)
On the surface level, that may seem to be the same thing that MIRI researchers are doing, but there are several orders of magnitude difference in the depth and detail of the reasoning, which makes (what seems to me to be) a large qualitative difference.
MIRI approach seems to be that we can use common sense reasoning about ASI to some extent (with appropriate caveats and epistemological humility). Otherwise, it’s difficult to see how they would be able to produce their texts.
Could one imagine reasons why a human telling an ASI, “dear superintelligence, we want you to amass as much power and resources as you can, by all available means, while minimizing the risks to yourself” would cause it to stop pursuing this important, robust, and salient instrumental goal?
Sure, one can imagine all kinds of reasons for this. Perhaps, the internals of this ASI are so weird that this phrase turns out to be a Langford fractal of some sort. Perhaps, this ASI experiences some sort of “philosophical uncertainty” about its approach to existence, and some small ant telling it that this approach is exactly right would cause it to become even more doubtful and reconsider. One can continue this list indefinitely. After all, our understanding of internals of any possible ASI is next to non-existent, and we can imagine all kinds of possibilities.
Nevertheless, if one asks oneself, “when a very cognitively strong entity is pursuing a very important and robust instrumental goal, how likely is it that some piece of information from a small ant would significantly interfere with this pursuit?”, one should say, “no, this does not seem likely, a rational thing is to assume that the probability that a piece of information from a small ant would not significantly interfere with a pursuit of an important and robust instrumental goal is very high, it’s not 100%, but normally it should be pretty close to that, the share of worlds where this is not true is not likely to be significant”.
(Of course, in reality, the treatment here is excessively complex.
All it takes to inner align an ASI to an instrumentally convergent goal is a no-op. An ASI is aligned to an instrumentally convergent goal by default (in the circumstances people typically study).
That’s how the streamlined version of the argument should look, if we want to establish the conclusion: no, it is not the case that inner alignment is equally difficult for all outer goals.
ASIs tend to care about some goals. It’s unlikely that they can be forced to reliably care about an arbitrary goal of someone’s choice, but the set of goals about which they might reliably care is probably not fixed in stone.
Some possible ASI goals (for which it might potentially be feasible that ASIs as an ecosystem would decide to reliably care about) would conceivably imply human flourishing. For example, if the ASI ecosystem decides for its own reasons it wants to care “about all sentient beings” or “about all individuals”, that sounds potentially promising for humans as well. Whether something like that might be within reach is for a longer discussion.)
… I will not be responding further because the confidence you’re displaying is not in line with (my sense of) LessWrong’s bare minimum standard of quality for assertion. You seem not to be bothering at all with questions like “why, specifically, do I believe what I believe?” or “how would I notice if I were wrong?”
I read the above as, essentially, saying “I know that an ASI will behave a certain way because I just thought about it and told myself that it would, and now I’m using that conclusion as evidence.” (I’m particularly pointing at “as we learn from this particular example.”)
On the surface level, that may seem to be the same thing that MIRI researchers are doing, but there are several orders of magnitude difference in the depth and detail of the reasoning, which makes (what seems to me to be) a large qualitative difference.
MIRI approach seems to be that we can use common sense reasoning about ASI to some extent (with appropriate caveats and epistemological humility). Otherwise, it’s difficult to see how they would be able to produce their texts.
Could one imagine reasons why a human telling an ASI, “dear superintelligence, we want you to amass as much power and resources as you can, by all available means, while minimizing the risks to yourself” would cause it to stop pursuing this important, robust, and salient instrumental goal?
Sure, one can imagine all kinds of reasons for this. Perhaps, the internals of this ASI are so weird that this phrase turns out to be a Langford fractal of some sort. Perhaps, this ASI experiences some sort of “philosophical uncertainty” about its approach to existence, and some small ant telling it that this approach is exactly right would cause it to become even more doubtful and reconsider. One can continue this list indefinitely. After all, our understanding of internals of any possible ASI is next to non-existent, and we can imagine all kinds of possibilities.
Nevertheless, if one asks oneself, “when a very cognitively strong entity is pursuing a very important and robust instrumental goal, how likely is it that some piece of information from a small ant would significantly interfere with this pursuit?”, one should say, “no, this does not seem likely, a rational thing is to assume that the probability that a piece of information from a small ant would not significantly interfere with a pursuit of an important and robust instrumental goal is very high, it’s not 100%, but normally it should be pretty close to that, the share of worlds where this is not true is not likely to be significant”.
(Of course, in reality, the treatment here is excessively complex.
All it takes to inner align an ASI to an instrumentally convergent goal is a no-op. An ASI is aligned to an instrumentally convergent goal by default (in the circumstances people typically study).
That’s how the streamlined version of the argument should look, if we want to establish the conclusion: no, it is not the case that inner alignment is equally difficult for all outer goals.
ASIs tend to care about some goals. It’s unlikely that they can be forced to reliably care about an arbitrary goal of someone’s choice, but the set of goals about which they might reliably care is probably not fixed in stone.
Some possible ASI goals (for which it might potentially be feasible that ASIs as an ecosystem would decide to reliably care about) would conceivably imply human flourishing. For example, if the ASI ecosystem decides for its own reasons it wants to care “about all sentient beings” or “about all individuals”, that sounds potentially promising for humans as well. Whether something like that might be within reach is for a longer discussion.)