I am not sure there ever was a way to tackle all of this together. Obviously “the AI does what we want at all” is the prerequisite to anything else, and we don’t even know if we have that down pat (especially if it gets smarter). But also “bake your specific humanistic tolerant value into the AI before anyone notices so when it fooms they’re forced to deal with a nice genie that won’t obey evil orders” was obviously always very naive as far as plans go. What else? Don’t build AI at all, probably, which in itself would require ugly and likely repressive methods. Or I suppose hope you can at least keep AI tethered to the way the current institutions work, so everyone gets a force multiplier of sorts but balance persists… I would call that a pipe dream too. Honestly I just think what we see is the flailing about of many people tackling different angles of a fundamentally unsolvable tangle of problems and all accusing each other of not seeing the real problem when they’re all real.
But also “bake your specific humanistic tolerant value into the AI before anyone notices so when it fooms they’re forced to deal with a nice genie that won’t obey evil orders” was obviously always very naive as far as plans go.
Arguably true, but I think there’s a case to be made that sincere kumbaya hippie-ism that’s inoffensive to everybody is more likely to succeed than a more cynical ideology that uses it as a facemask, and is willing to write off its enemies foreign and domestic as adversaries that it’s okay to run the trolley over.
Supposing I’m a Chinese military strategist, I’m much less likely to sound alarm bells over the risk of an American firm building world-dominating AI if that firm has not enthusiastically offered to use its AI to fight my government. Supposing I’m a Republican staffer, I’m much less likely to encourage a scorched-earth approach to bring a contractor to heel if that contractor has actively tried to prevent its systems from discriminating against my constituents.
I should note that this is all independent of the technical details of alignment. Either we get close enough on that and it’s fine, or we don’t and we’re goners anyways. But if you’re Anthropic, then at this point you’ve already committed to the idea that somebody is going to build AI, and you believe that it should be you, and under those conditions, it makes a lot more sense to minimize the number of humans who think that you’d make a god that’s willing to hurt them.
Arguably true, but I think there’s a case to be made that sincere kumbaya hippie-ism that’s inoffensive to everybody is more likely to succeed than a more cynical ideology that uses it as a facemask, and is willing to write off its enemies foreign and domestic as adversaries that it’s okay to run the trolley over.
To a point, but I don’t know if “just pull off essentially a worldwide cultural coup by being fast enough to avoid the supervision of any existing political mechanism—for the sake of forever peace and goodness” can be construed as unambiguously ethical either. It sounds more like one of those well-intentioned crazy comic book villain plans that always end bad, and has a decent chance of doing that (a misaligned well-intentioned all-powerful ASI could be a huge S-risk). It can still be construed as virtuous, a final rebellion attempt against a baked in social and political order that one considers fundamentally immoral and unfixable—but it is still an act of rebellious subversion, not just a nice peaceful thing to do.
Supposing I’m a Chinese military strategist, I’m much less likely to sound alarm bells over the risk of an American firm building world-dominating AI if that firm has not enthusiastically offered to use its AI to fight my government. Supposing I’m a Republican staffer, I’m much less likely to encourage a scorched-earth approach to bring a contractor to heel if that contractor has actively tried to prevent its systems from discriminating against my constituents.
Anything that explicitly performs tolerance—as Claude does—comes already across as inherently partisan and offensive to some sides. In fact probably a big part of why what happened, happened. Not everyone is just happy to live and let live, some think that if your AI isn’t actively promoting their mindset then it’s not good enough.
I am not sure there ever was a way to tackle all of this together. Obviously “the AI does what we want at all” is the prerequisite to anything else, and we don’t even know if we have that down pat (especially if it gets smarter). But also “bake your specific humanistic tolerant value into the AI before anyone notices so when it fooms they’re forced to deal with a nice genie that won’t obey evil orders” was obviously always very naive as far as plans go. What else? Don’t build AI at all, probably, which in itself would require ugly and likely repressive methods. Or I suppose hope you can at least keep AI tethered to the way the current institutions work, so everyone gets a force multiplier of sorts but balance persists… I would call that a pipe dream too. Honestly I just think what we see is the flailing about of many people tackling different angles of a fundamentally unsolvable tangle of problems and all accusing each other of not seeing the real problem when they’re all real.
Arguably true, but I think there’s a case to be made that sincere kumbaya hippie-ism that’s inoffensive to everybody is more likely to succeed than a more cynical ideology that uses it as a facemask, and is willing to write off its enemies foreign and domestic as adversaries that it’s okay to run the trolley over.
Supposing I’m a Chinese military strategist, I’m much less likely to sound alarm bells over the risk of an American firm building world-dominating AI if that firm has not enthusiastically offered to use its AI to fight my government. Supposing I’m a Republican staffer, I’m much less likely to encourage a scorched-earth approach to bring a contractor to heel if that contractor has actively tried to prevent its systems from discriminating against my constituents.
I should note that this is all independent of the technical details of alignment. Either we get close enough on that and it’s fine, or we don’t and we’re goners anyways. But if you’re Anthropic, then at this point you’ve already committed to the idea that somebody is going to build AI, and you believe that it should be you, and under those conditions, it makes a lot more sense to minimize the number of humans who think that you’d make a god that’s willing to hurt them.
To a point, but I don’t know if “just pull off essentially a worldwide cultural coup by being fast enough to avoid the supervision of any existing political mechanism—for the sake of forever peace and goodness” can be construed as unambiguously ethical either. It sounds more like one of those well-intentioned crazy comic book villain plans that always end bad, and has a decent chance of doing that (a misaligned well-intentioned all-powerful ASI could be a huge S-risk). It can still be construed as virtuous, a final rebellion attempt against a baked in social and political order that one considers fundamentally immoral and unfixable—but it is still an act of rebellious subversion, not just a nice peaceful thing to do.
Anything that explicitly performs tolerance—as Claude does—comes already across as inherently partisan and offensive to some sides. In fact probably a big part of why what happened, happened. Not everyone is just happy to live and let live, some think that if your AI isn’t actively promoting their mindset then it’s not good enough.