But I argue that giving people who are unreflective and prone to value drift god-like powers to reshape the universe and themselves could easily lead to catastrophic outcomes on par with takeover by unaligned AIs, since in both cases the universe becomes optimized for essentially random values.
I wonder whether, if you framed your concerns in this concrete way, you’d convince more people in alignment to devote attention to these issues? As compared to speaking more abstractly about solving metaethics or metaphilosophy.
(Of course, you may not think that’s a helpful alternative, if you think solving metaethics or metaphilosophy is the main goal, and other concrete issues will just continue to show up in different forms unless we do it.)
In any case, regarding the passage I quoted, this issue seems potentially relevant independent of whether one thinks metaphilosophy is an important focus area or whether metaethics is already solved.
For instance, I’m also concerned as an anti-realist that giving people their “aligned” AIs to do personal reflection will likely go poorly and lead to outcomes we wouldn’t want for the sake of those people or for humanity as a collective. (My reasoning is that while I don’t think there’s necessarily a single correct reflection target, there are certainly bad ways to go about moral reflection, meaning there are pitfalls to avoid. For examples, see the subsection Pitfalls of Reflection Procedures in my moral uncertainty/moral reflection post, where I remember you made comments. There’s also the practical concern of getting societal buy-in for any specific way of distributing influence over the future and designing reflection and maybe voting procedures: even absent the concern about doing things the normatively correct way, it would create serious practical problems if alignment researchers were to propose a specific method but they’re not able to convince many others that their method was (1) even trying to be fair (as opposed to being selfishly motivated or motivated by fascism or whatever, if we imagine uncharitable but “totally a thing that might happen” sorts of criticism), and (2) did a good job at being fair given constraints of it being a tough problem with tradeoffs.
I wonder whether, if you framed your concerns in this concrete way, you’d convince more people in alignment to devote attention to these issues? As compared to speaking more abstractly about solving metaethics or metaphilosophy.
I’m not sure. It’s hard for me to understand other humans a lot of the time, for example these concerns (both concrete and abstract) seem really obvious to me, and it mystifies me why so few people share them (at least to the extent of trying to do anything about them, like writing a post to explain the concern, spending time to try to solve the relevant problems, or citing these concerns as another reason for AI pause).
Also I guess I did already talk about the concrete problem, without bringing up metaethics or metaphilosophy, in this post.
(Of course, you may not think that’s a helpful alternative, if you think solving metaethics or metaphilosophy is the main goal, and other concrete issues will just continue to show up in different forms unless we do it.)
I think a lot of people in AI alignment think they already have a solution for metaethics (including Eliezer who explicitly said this in his metaethics sequence), which is something I’m trying to talk them out of, because assuming a wrong metaethical theory in one’s alignment approach is likely to make the concrete issues worse instead of better.
For instance, I’m also concerned as an anti-realist that giving people their “aligned” AIs to do personal reflection will likely go poorly and lead to outcomes we wouldn’t want for the sake of those people or for humanity as a collective.
This illustrates the phenomenon I talked about in my draft, where people in AI safety would confidently state “I am X” or “As an X” where X is some controversial meta-ethical position that they shouldn’t be very confident in, whereas they’re more likely to avoid overconfidence in other areas of philosophy like normative ethics.
I take your point that people who think they’ve solved meta-ethics can also share my concrete concern about possible catastrophe caused by bad reflection among some or all humans, but as mentioned above, I’m pretty worried that if their assumed solution is wrong, they’re likely to contribute to making the problem worse instead of better.
BTW, are you actually a full-on anti-realist, or actually take one of the intermediate positions between realism and anti-realism? (See my old post Six Plausible Meta-Ethical Alternatives for a quick intro/explanation.)
I wonder whether, if you framed your concerns in this concrete way, you’d convince more people in alignment to devote attention to these issues? As compared to speaking more abstractly about solving metaethics or metaphilosophy.
(Of course, you may not think that’s a helpful alternative, if you think solving metaethics or metaphilosophy is the main goal, and other concrete issues will just continue to show up in different forms unless we do it.)
In any case, regarding the passage I quoted, this issue seems potentially relevant independent of whether one thinks metaphilosophy is an important focus area or whether metaethics is already solved.
For instance, I’m also concerned as an anti-realist that giving people their “aligned” AIs to do personal reflection will likely go poorly and lead to outcomes we wouldn’t want for the sake of those people or for humanity as a collective. (My reasoning is that while I don’t think there’s necessarily a single correct reflection target, there are certainly bad ways to go about moral reflection, meaning there are pitfalls to avoid. For examples, see the subsection Pitfalls of Reflection Procedures in my moral uncertainty/moral reflection post, where I remember you made comments. There’s also the practical concern of getting societal buy-in for any specific way of distributing influence over the future and designing reflection and maybe voting procedures: even absent the concern about doing things the normatively correct way, it would create serious practical problems if alignment researchers were to propose a specific method but they’re not able to convince many others that their method was (1) even trying to be fair (as opposed to being selfishly motivated or motivated by fascism or whatever, if we imagine uncharitable but “totally a thing that might happen” sorts of criticism), and (2) did a good job at being fair given constraints of it being a tough problem with tradeoffs.
I’m not sure. It’s hard for me to understand other humans a lot of the time, for example these concerns (both concrete and abstract) seem really obvious to me, and it mystifies me why so few people share them (at least to the extent of trying to do anything about them, like writing a post to explain the concern, spending time to try to solve the relevant problems, or citing these concerns as another reason for AI pause).
Also I guess I did already talk about the concrete problem, without bringing up metaethics or metaphilosophy, in this post.
I think a lot of people in AI alignment think they already have a solution for metaethics (including Eliezer who explicitly said this in his metaethics sequence), which is something I’m trying to talk them out of, because assuming a wrong metaethical theory in one’s alignment approach is likely to make the concrete issues worse instead of better.
This illustrates the phenomenon I talked about in my draft, where people in AI safety would confidently state “I am X” or “As an X” where X is some controversial meta-ethical position that they shouldn’t be very confident in, whereas they’re more likely to avoid overconfidence in other areas of philosophy like normative ethics.
I take your point that people who think they’ve solved meta-ethics can also share my concrete concern about possible catastrophe caused by bad reflection among some or all humans, but as mentioned above, I’m pretty worried that if their assumed solution is wrong, they’re likely to contribute to making the problem worse instead of better.
BTW, are you actually a full-on anti-realist, or actually take one of the intermediate positions between realism and anti-realism? (See my old post Six Plausible Meta-Ethical Alternatives for a quick intro/explanation.)