You’re allowed to care about things besides AI safety
I worry that a lot of AI safety / x-risk people have imbibed a vibe of urgency, impossibility, and overwhelming-importance to solving alignment in particular; that this vibe distorts thinking; that the social sphere around AI x-risk makes it harder for people to update.
Yesterday I talked to an AI safety researcher who said he’s pretty sure alignment will be solved by default. But whenever he talks to people about this, they just say “surely you don’t think it’s >99% likely? shouldn’t you just keep working for the sake of that 1% chance?”
Obviously there’s something real here − 1% of huge is huge. But equally—people should notice and engage when their top priority just got arguably 100x less important! And people should be socially-allowed to step back from pushing the boulder.
The idea that safety is the only thing that matters is pretty load-bearing for many people in this community, and that seems bad for epistemics and for well-being.
I’ve noticed similar feelings in myself—I think part of it is being stuck in the 2014 or even 2020 vibe of “jesus christ, society needs to wake up! AGI is coming, maybe very soon, and safety is a huge deal.” Now—okay, society-at-large still mostly doesn’t care, but—relevant bits of society (AI companies, experts, policymakers) are aware and many care a lot.
And if safety isn’t the only-overwhelming-priority, if it’s a tens of percents thing and not a 1-epsilon thing, we ought to care about the issues that persist when safety is solved—things like “how the hell does society actually wield this stuff responsibly”, “how do we keep it secure”, etc. And issues that frankly should have always been on the table, like “how do we avoid moral atrocities like torturing sentient AIs at scale”.
And on a personal & social level, we ought to care about investments that help us grapple with the situation—including supporting people as they step back from engaging directly with the problem, and try to figure out what else they could or should be doing.
Alignment by default is a minority opinion. Surveying the wide range range of even truly informed opinions, it seems clear to me that we collectively don’t know how hard alignment is.
But that doesn’t mean technical alignment is the only thing worth caring about, even if you’re a utilitarian. Societal issues surrounding AI could be crucial for success, and support for people doing work on AI safety are crucial even on a model in which AI is the most important topic. There’s also public outreach and lobbying work to be done.
And of course everyone needs to make prioritize their own emotional health so they can keep working on anything effectively.
Alignment by default is a minority opinion. Surveying the wide range range of even truly informed opinions, it seems clear to me that we collectively don’t know how hard alignment is.
Totally. I think it’s “arguable” in the sense of inside-views, not outside-views, if that makes sense? Like: it could be someone’s personal vibe that alignment-by-default is >99%. Should they have that as their all-things-considered view? Seems wrong to me, we should be considerably more uncertain here.
But okay, then: we should have some spread of bets across different possible worlds, and put a solid chunk of probability on alignment by default. Even if it’s a minority probability, this could matter a lot for what you actually try to do!
For example: I think worlds with short timelines, hard takeoff, and no alignment-by-default are pretty doomed. It’s easy to focus on those worlds and feel drawn to plans that are pretty costly and are incongruent with virtue and being-good-collaborators. e.g. “we should have One Winning AGI Project that’s Safe and Smart Enough to Get Things Right”, the theory of victory that brought you OpenAI.
My intuition is that worlds with at least one of those variables flipped tend to convergently favor solutions that are more virtuous / collaborative and are more likely to fail gracefully.
(I’m tired and not maximally articulate rn, but could try to say more if that feels useful.)
If alignment by default is not the majority opinion, then what is (pardon my ignorance as someone who mostly interacts with alignment community via LessWrong)? Is it 1) that we are all ~doomed or 2) that alignment is hard but we have a decent shot at solving it or 3) something else entirely?
I got a feeling like people used to be a lot more pessimistic about our chances of survival in 2023 than in 2024 or 2025 (in other words, pessimism seems to be going down somewhat), but I could be completely wrong about this.
You’re allowed to care about things besides AI safety
I worry that a lot of AI safety / x-risk people have imbibed a vibe of urgency, impossibility, and overwhelming-importance to solving alignment in particular; that this vibe distorts thinking; that the social sphere around AI x-risk makes it harder for people to update.
Yesterday I talked to an AI safety researcher who said he’s pretty sure alignment will be solved by default. But whenever he talks to people about this, they just say “surely you don’t think it’s >99% likely? shouldn’t you just keep working for the sake of that 1% chance?”
Obviously there’s something real here − 1% of huge is huge. But equally—people should notice and engage when their top priority just got arguably 100x less important! And people should be socially-allowed to step back from pushing the boulder.
The idea that safety is the only thing that matters is pretty load-bearing for many people in this community, and that seems bad for epistemics and for well-being.
I’ve noticed similar feelings in myself—I think part of it is being stuck in the 2014 or even 2020 vibe of “jesus christ, society needs to wake up! AGI is coming, maybe very soon, and safety is a huge deal.” Now—okay, society-at-large still mostly doesn’t care, but—relevant bits of society (AI companies, experts, policymakers) are aware and many care a lot.
And if safety isn’t the only-overwhelming-priority, if it’s a tens of percents thing and not a 1-epsilon thing, we ought to care about the issues that persist when safety is solved—things like “how the hell does society actually wield this stuff responsibly”, “how do we keep it secure”, etc. And issues that frankly should have always been on the table, like “how do we avoid moral atrocities like torturing sentient AIs at scale”.
And on a personal & social level, we ought to care about investments that help us grapple with the situation—including supporting people as they step back from engaging directly with the problem, and try to figure out what else they could or should be doing.
Alignment by default is a minority opinion. Surveying the wide range range of even truly informed opinions, it seems clear to me that we collectively don’t know how hard alignment is.
But that doesn’t mean technical alignment is the only thing worth caring about, even if you’re a utilitarian. Societal issues surrounding AI could be crucial for success, and support for people doing work on AI safety are crucial even on a model in which AI is the most important topic. There’s also public outreach and lobbying work to be done.
And of course everyone needs to make prioritize their own emotional health so they can keep working on anything effectively.
Totally. I think it’s “arguable” in the sense of inside-views, not outside-views, if that makes sense? Like: it could be someone’s personal vibe that alignment-by-default is >99%. Should they have that as their all-things-considered view? Seems wrong to me, we should be considerably more uncertain here.
But okay, then: we should have some spread of bets across different possible worlds, and put a solid chunk of probability on alignment by default. Even if it’s a minority probability, this could matter a lot for what you actually try to do!
For example: I think worlds with short timelines, hard takeoff, and no alignment-by-default are pretty doomed. It’s easy to focus on those worlds and feel drawn to plans that are pretty costly and are incongruent with virtue and being-good-collaborators. e.g. “we should have One Winning AGI Project that’s Safe and Smart Enough to Get Things Right”, the theory of victory that brought you OpenAI.
My intuition is that worlds with at least one of those variables flipped tend to convergently favor solutions that are more virtuous / collaborative and are more likely to fail gracefully.
(I’m tired and not maximally articulate rn, but could try to say more if that feels useful.)
If alignment by default is not the majority opinion, then what is (pardon my ignorance as someone who mostly interacts with alignment community via LessWrong)? Is it 1) that we are all ~doomed or 2) that alignment is hard but we have a decent shot at solving it or 3) something else entirely?
I got a feeling like people used to be a lot more pessimistic about our chances of survival in 2023 than in 2024 or 2025 (in other words, pessimism seems to be going down somewhat), but I could be completely wrong about this.