I think there is a different post that feels missing from the discourse, that ties together “what goodness is, with gears-level models all the way up and down.” (Which is, like, sort of a massive project. But, it’d be nice to gesturing at enough of the details to get the structure across)
Some people have reacted to this sort of statement with “so, you’re saying if it were practical to stop AI with terrorism, it would be worth it?”. In one of the twitter threads, Eliezer said “no I didn’t say that” and linked to Ends Don’t Justify Means (Among Humans).
Some other AI safety people said “Yes, it is evil to try to murder Sam Altman for the same reasons it’s usually evil. But, to the people contemplating terrorism, that isn’t very persuasive. But, yes, for the record, it is evil and wrong.”
I feel a bit confused and dissatisfied with the situation.
I’m a persnickety rationalist. I think “goodness” and “evil” are underspecified and possibly-confused categories that I don’t have a complete understanding of.
Nonetheless, I am aware that at least part of what gives some people the heebie-jibbies, when they see long arguments like “terrorism wouldn’t work” instead of loudly, simply stating “terrorism is wrong!”, is… it’s obvious the person saying the long complex argument is going “off-script.” They are stepping outside the simple-seeming moral frameworks people are familiar with.
Some rationalists go out of their way to promote virtue ethics or deontology.
But, normal people don’t say words like “virtue ethics” or “deontology.”
People who say weird words you don’t understand… man, those people could do anything. You’d have to read all their words to understand them, and then probably still wouldn’t follow the arguments, and then still wouldn’t be sure they weren’t a Clever Arguer who was trying to pull one over on you.”
Saying “our notion of goodness is underspecified and maybe-confused and maybe aconflationary alliance” is the sort of thing a clever arguer says, which increases the odds that person is going to do something surprising you don’t like later.
The thing I think it (approximately) means, to say “It’s still evil and wrong, to do terrorism”, is “It’s no accident that we have the conception of Goodness we apply in most mundane situations. There’s a structural reason it all still applies in the extreme situations (i.e. ‘People still need to be able to trust each other and society still needs to function at the end of the world. That’s one of the times you most want people to trust each other!’).”
But, (from my current epistemic state), I don’t actually feel that confident that that’s true. (I think Eliezer has thought explicitly about this sort of thing that it feels plausible to me he would have a justified true belief that it’s robustly true).
A post I’d appreciate someday would be one that tries to sketch out the end-2-end broad strokes here, including which bits you can robustly argue for, vs “look this is either still complicated, or still fuzzy/unknown”. Ideally, one where the explanation hangs together at first glance to a layman, while pointing to ‘this sort of math says these things about cooperation/communication’, to convey that there’s a deeper structure.
I liked this post.
I think there is a different post that feels missing from the discourse, that ties together “what goodness is, with gears-level models all the way up and down.” (Which is, like, sort of a massive project. But, it’d be nice to gesturing at enough of the details to get the structure across)
Some people have reacted to this sort of statement with “so, you’re saying if it were practical to stop AI with terrorism, it would be worth it?”. In one of the twitter threads, Eliezer said “no I didn’t say that” and linked to Ends Don’t Justify Means (Among Humans).
Some other AI safety people said “Yes, it is evil to try to murder Sam Altman for the same reasons it’s usually evil. But, to the people contemplating terrorism, that isn’t very persuasive. But, yes, for the record, it is evil and wrong.”
I feel a bit confused and dissatisfied with the situation.
I’m a persnickety rationalist. I think “goodness” and “evil” are underspecified and possibly-confused categories that I don’t have a complete understanding of.
Nonetheless, I am aware that at least part of what gives some people the heebie-jibbies, when they see long arguments like “terrorism wouldn’t work” instead of loudly, simply stating “terrorism is wrong!”, is… it’s obvious the person saying the long complex argument is going “off-script.” They are stepping outside the simple-seeming moral frameworks people are familiar with.
Some rationalists go out of their way to promote virtue ethics or deontology.
But, normal people don’t say words like “virtue ethics” or “deontology.”
People who say weird words you don’t understand… man, those people could do anything. You’d have to read all their words to understand them, and then probably still wouldn’t follow the arguments, and then still wouldn’t be sure they weren’t a Clever Arguer who was trying to pull one over on you.”
Saying “our notion of goodness is underspecified and maybe-confused and maybe a conflationary alliance” is the sort of thing a clever arguer says, which increases the odds that person is going to do something surprising you don’t like later.
The thing I think it (approximately) means, to say “It’s still evil and wrong, to do terrorism”, is “It’s no accident that we have the conception of Goodness we apply in most mundane situations. There’s a structural reason it all still applies in the extreme situations (i.e. ‘People still need to be able to trust each other and society still needs to function at the end of the world. That’s one of the times you most want people to trust each other!’).”
But, (from my current epistemic state), I don’t actually feel that confident that that’s true. (I think Eliezer has thought explicitly about this sort of thing that it feels plausible to me he would have a justified true belief that it’s robustly true).
A post I’d appreciate someday would be one that tries to sketch out the end-2-end broad strokes here, including which bits you can robustly argue for, vs “look this is either still complicated, or still fuzzy/unknown”. Ideally, one where the explanation hangs together at first glance to a layman, while pointing to ‘this sort of math says these things about cooperation/communication’, to convey that there’s a deeper structure.