Alignment by default is a minority opinion. Surveying the wide range range of even truly informed opinions, it seems clear to me that we collectively don’t know how hard alignment is.
But that doesn’t mean technical alignment is the only thing worth caring about, even if you’re a utilitarian. Societal issues surrounding AI could be crucial for success, and support for people doing work on AI safety are crucial even on a model in which AI is the most important topic. There’s also public outreach and lobbying work to be done.
And of course everyone needs to make prioritize their own emotional health so they can keep working on anything effectively.
Alignment by default is a minority opinion. Surveying the wide range range of even truly informed opinions, it seems clear to me that we collectively don’t know how hard alignment is.
Totally. I think it’s “arguable” in the sense of inside-views, not outside-views, if that makes sense? Like: it could be someone’s personal vibe that alignment-by-default is >99%. Should they have that as their all-things-considered view? Seems wrong to me, we should be considerably more uncertain here.
But okay, then: we should have some spread of bets across different possible worlds, and put a solid chunk of probability on alignment by default. Even if it’s a minority probability, this could matter a lot for what you actually try to do!
For example: I think worlds with short timelines, hard takeoff, and no alignment-by-default are pretty doomed. It’s easy to focus on those worlds and feel drawn to plans that are pretty costly and are incongruent with virtue and being-good-collaborators. e.g. “we should have One Winning AGI Project that’s Safe and Smart Enough to Get Things Right”, the theory of victory that brought you OpenAI.
My intuition is that worlds with at least one of those variables flipped tend to convergently favor solutions that are more virtuous / collaborative and are more likely to fail gracefully.
(I’m tired and not maximally articulate rn, but could try to say more if that feels useful.)
If alignment by default is not the majority opinion, then what is (pardon my ignorance as someone who mostly interacts with alignment community via LessWrong)? Is it 1) that we are all ~doomed or 2) that alignment is hard but we have a decent shot at solving it or 3) something else entirely?
I got a feeling like people used to be a lot more pessimistic about our chances of survival in 2023 than in 2024 or 2025 (in other words, pessimism seems to be going down somewhat), but I could be completely wrong about this.
Alignment by default is a minority opinion. Surveying the wide range range of even truly informed opinions, it seems clear to me that we collectively don’t know how hard alignment is.
But that doesn’t mean technical alignment is the only thing worth caring about, even if you’re a utilitarian. Societal issues surrounding AI could be crucial for success, and support for people doing work on AI safety are crucial even on a model in which AI is the most important topic. There’s also public outreach and lobbying work to be done.
And of course everyone needs to make prioritize their own emotional health so they can keep working on anything effectively.
Totally. I think it’s “arguable” in the sense of inside-views, not outside-views, if that makes sense? Like: it could be someone’s personal vibe that alignment-by-default is >99%. Should they have that as their all-things-considered view? Seems wrong to me, we should be considerably more uncertain here.
But okay, then: we should have some spread of bets across different possible worlds, and put a solid chunk of probability on alignment by default. Even if it’s a minority probability, this could matter a lot for what you actually try to do!
For example: I think worlds with short timelines, hard takeoff, and no alignment-by-default are pretty doomed. It’s easy to focus on those worlds and feel drawn to plans that are pretty costly and are incongruent with virtue and being-good-collaborators. e.g. “we should have One Winning AGI Project that’s Safe and Smart Enough to Get Things Right”, the theory of victory that brought you OpenAI.
My intuition is that worlds with at least one of those variables flipped tend to convergently favor solutions that are more virtuous / collaborative and are more likely to fail gracefully.
(I’m tired and not maximally articulate rn, but could try to say more if that feels useful.)
If alignment by default is not the majority opinion, then what is (pardon my ignorance as someone who mostly interacts with alignment community via LessWrong)? Is it 1) that we are all ~doomed or 2) that alignment is hard but we have a decent shot at solving it or 3) something else entirely?
I got a feeling like people used to be a lot more pessimistic about our chances of survival in 2023 than in 2024 or 2025 (in other words, pessimism seems to be going down somewhat), but I could be completely wrong about this.