one thing I’d highlight: there’s a point where you conflate two claims that feel very different to me:
“It seems plausible that the best thing to do if you really take AI x-risk seriously is to just stop working on AI at all.”
And that’s what I’ve been trying to say this whole time, whenever anyone asks me about my career. That I don’t want to try to have a big impact, if I can’t be certain that that impact will be positive rather than negative for the world — and I can’t be certain.
I think that there are a bunch of “AI safety” people such that the world would be better off if they stopped working on AI at all. But that doesn’t mean that they (or anyone else) should be aiming to have certainty of positive impact—that’s a very high bar.
Instead, one way I think about it is that there’s a skill of avoiding self-deception (and being virtuous more generally), and the more you cultivate this skill, then the more you’re able to have a robustly positive impact even when you’re not certain.
Any pointers to further reading on cultivating self-deception-avoidance to robust-ify positive impact? At a glance, Distributed vs centralized agents doesn’t seem to be about this.
My post on pessimization talks about a bunch of the mechanisms by which you might have negative impact.
I have some posts in the works on virtue ethics, but for now probably the most relevant thing I’ve written is this sequence on replacing fear. My sense is that a lot of self-deception is caused by fear-based motivations.
Trying to avoid self-deception seems like an important piece of it (although it seems non-trivial, eg it’s easy to self deceive about one’s own level of self deception). But for high-variance, high-impact stuff it separately seems especially important to try to take actions which are good over as many worlds as possible. Consequentialism doesn’t necessarily do this, since single factors can dominate the calculus. Which causes optimizer’s curse problems but more generally: in highly uncertain domains probability estimates are just really often wrong. And especially when such a misstep can cause massive harm, I think it’s also worth trying to compensate for the uncertainty in the direction of being more robust to those errors.
good post, thanks.
one thing I’d highlight: there’s a point where you conflate two claims that feel very different to me:
I think that there are a bunch of “AI safety” people such that the world would be better off if they stopped working on AI at all. But that doesn’t mean that they (or anyone else) should be aiming to have certainty of positive impact—that’s a very high bar.
Instead, one way I think about it is that there’s a skill of avoiding self-deception (and being virtuous more generally), and the more you cultivate this skill, then the more you’re able to have a robustly positive impact even when you’re not certain.
Any pointers to further reading on cultivating self-deception-avoidance to robust-ify positive impact? At a glance, Distributed vs centralized agents doesn’t seem to be about this.
My post on pessimization talks about a bunch of the mechanisms by which you might have negative impact.
I have some posts in the works on virtue ethics, but for now probably the most relevant thing I’ve written is this sequence on replacing fear. My sense is that a lot of self-deception is caused by fear-based motivations.
Trying to avoid self-deception seems like an important piece of it (although it seems non-trivial, eg it’s easy to self deceive about one’s own level of self deception). But for high-variance, high-impact stuff it separately seems especially important to try to take actions which are good over as many worlds as possible. Consequentialism doesn’t necessarily do this, since single factors can dominate the calculus. Which causes optimizer’s curse problems but more generally: in highly uncertain domains probability estimates are just really often wrong. And especially when such a misstep can cause massive harm, I think it’s also worth trying to compensate for the uncertainty in the direction of being more robust to those errors.