To be honest, I never found this very convincing as a reason other than “Tolkien needed this to not happen to make the story he has work”, which is fine, but the issue is that in LOTR, you are already dependent on what are essentially benevolent dictators, and if the king is unaligned to your values or is incompetent, the country and you will be ruined, so trying to be aligned to other values while using the ring can only have positive expected value under your value system.
And critically, this doesn’t change after Sauron’s defeat.
Another point is that if you believe that the long-term future matters most, than democracy/preventing dictatorships are very intractable even on a scale of centuries, so your solution to AI safety can’t rely on democracy working, and thus they must assume something about their values.
So outside of the logic of “we need to keep the story on rails”, I just flat-out disagree with Gandalf here.
I don’t think it’s a matter of agreeing or disagreeing—call it author fiat, but the way the One Ring works in LOTR is that it alters your mind and your goals. So Gandalf is just refusing to come close with something he knows will impair his cognition and eventually twist his goals. If I knew the power-giving artefact will also make me want things that I don’t want now, why would I take it? I would just create a future me that is the enemy of my present goals, while making my current self even more powerless.
But of course at a metaphorical level Tolkien is saying that power warps your morals. Which seemed an appropriate reference to me in context because I think it’s exactly what happened with many of those companies (certainly with Sam Altman! I’m not too soured up with Anthropic yet, though maybe it’s just that I lack enough information). The inevitable grind of keeping the power becomes so all-consuming that it ends up eroding your other values, or at least very often does so. And you get from “I want to build safe ASI for everyone’s sake!” to “ok well I guess I’ll build ASI that just obeys me, which I can do faster, but good enough since I’m not that evil, and also it’ll have a 95% chance of killing everyone but that’s better than the competition’s 96% which I just estimated out of my own ass”. Race to the bottom, and everything ends up thrown under the bus except the tiniest most unrecognizable sliver of your original goals.
To be honest, I never found this very convincing as a reason other than “Tolkien needed this to not happen to make the story he has work”, which is fine, but the issue is that in LOTR, you are already dependent on what are essentially benevolent dictators, and if the king is unaligned to your values or is incompetent, the country and you will be ruined, so trying to be aligned to other values while using the ring can only have positive expected value under your value system.
And critically, this doesn’t change after Sauron’s defeat.
Another point is that if you believe that the long-term future matters most, than democracy/preventing dictatorships are very intractable even on a scale of centuries, so your solution to AI safety can’t rely on democracy working, and thus they must assume something about their values.
So outside of the logic of “we need to keep the story on rails”, I just flat-out disagree with Gandalf here.
I don’t think it’s a matter of agreeing or disagreeing—call it author fiat, but the way the One Ring works in LOTR is that it alters your mind and your goals. So Gandalf is just refusing to come close with something he knows will impair his cognition and eventually twist his goals. If I knew the power-giving artefact will also make me want things that I don’t want now, why would I take it? I would just create a future me that is the enemy of my present goals, while making my current self even more powerless.
But of course at a metaphorical level Tolkien is saying that power warps your morals. Which seemed an appropriate reference to me in context because I think it’s exactly what happened with many of those companies (certainly with Sam Altman! I’m not too soured up with Anthropic yet, though maybe it’s just that I lack enough information). The inevitable grind of keeping the power becomes so all-consuming that it ends up eroding your other values, or at least very often does so. And you get from “I want to build safe ASI for everyone’s sake!” to “ok well I guess I’ll build ASI that just obeys me, which I can do faster, but good enough since I’m not that evil, and also it’ll have a 95% chance of killing everyone but that’s better than the competition’s 96% which I just estimated out of my own ass”. Race to the bottom, and everything ends up thrown under the bus except the tiniest most unrecognizable sliver of your original goals.