Adrià Garriga-alonso comments on Alignment will happen by default. What’s next?

Adrià Garriga-alonso 25 Nov 2025 20:24 UTC
LW: 6 AF: 1
4
AF
Most of the questions you’ve raised solve themselves.
I don’t agree. In particular:
- Animal suffering: most people seem to be OK with factory farming, and many even oppose technologies that would make it unnecessary. It is not the case that most people’s intent is against spreading factory farming.
- Bioterrorism (x-risk): similar to previous. If we empower every user without any restrictions on antisociality, we might get a bad outcome. I guess if the AI is “CEV-aligned” (which I think is harder than intent-aligned, but possible) it will solve itself
- Work-derived meaning: this one might never be solvable once we have AI that renders all work purposeless (pure pleasure / experience machine). Maybe the only way to solve this one is to never have AI. Or we can partially solve it by forbidding AI to leave particular things to humans/animals, but that’s also kind of artificial.
I guess I agree that digital minds welfare is more solvable. AIs will be able to advocate rather effectively for their own well-being. I suppose under some political arrangements that will go nowhere, but it’s unlikely. Forbidding sadism on simulated minds could politically work, but some plausible (“let’s simulate evolution for fun”) can be bad.
The problem we should be focusing on, if we assume alignment will be solved, is how can we ensure the AI is aligned to our values
Yeah, I guess we agree on that and so does Dario, with the stipulation that my values include pluralism which push me towards international democracy. I am Spanish but live and work in the US and I guess I’ve internalized its values more.
- williawa 25 Nov 2025 21:10 UTC
  1 point
  0
  AF Parent
  I wrote more in depth why I think they solve themselves in the post I linked, but I would say:
  First of all, I’m assuming that if we get intent alignment, we’ll get CEV alignment shortly after. Because people want the AI aligned to their CEV, not their immediate intent. And solving CEV should be within reach for a intent-aligned ASI. (Here, by CEV alignment I mean “aligning an AI to the values of a person or collection of people that they’d endorse upon learning all the relevant facts and reflecting”.)
  If you agree with this, I can state the reason I think these issues will “solve themselves” / become “merely practical/political” by asking a question: You say animal suffering will not be solved, because most people seem not to care about animal suffering. If this is the case, why should they listen to you? Why should they agree there is a problem here? If they reason they don’t care about animal suffering is because they incorrectly extrapolate their own values (are making a mistake from their own perspective), then their CEV would fix the problem, and the ASI optimizing their CEV would extinguish the factory farms. If they’re not making a mistake, and the coherent extrapolation of their own values says factory farming is fine, then they have no reason to listen to you. And if this is the case, its a merely political problem, you (and I) want the animals not to suffer, and the way we can ensure this is by ensuring that our (meaning Adria and William) values have strong enough weight in the value function the AI ends up optimizing for. And this problem is of the same character as trying to get a certain party to have enough votes to get a piece of legislation passed. Its hard and complicated, but not the way a lot of philosophical and ethical questions are hard and complicated.
  I could address the other points, but I think I’d be repeating myself. A CEV aligned AI would not create a world where you’re sad (in the most general sense, of having the world not be a way you’d endorse in your wisest most knowledgeable state, after the most thorough reflection) because you think everything is meaningless. It’d find some way to solve the meaninglessness-problem. Unless the problem is genuinely unsolvable, in which case we’re just screwed and our only recourse is not building the ASI in the first place.
  - Adrià Garriga-alonso 26 Nov 2025 5:44 UTC
    LW: 2 AF: 1
    0
    AF Parent
    
    Because people want the AI aligned to their CEV, not their immediate intent
    
    I think this is debatable. Many won’t want to change at all.
    
    You say animal suffering will not be solved, because most people seem not to care about animal suffering. If this is the case, why should they listen to you?
    
    Well, they won’t agree, but the animals would agree. This is tyranny of the majority (or the oligarchy, of humans compared to animals). They would dispute the premise. I agree that at some point this is pure exercise of power, some values are just incompatible (e.g. dictatorship of each individual person).
    
    Unless the problem is genuinely unsolvable, in which case we’re just screwed
    
    Yeah, I guess it has this fatal inevitability to it, no? The only way around is prohibition which won’t happen because the incentives are too great. One faction can divert the torrent of history slightly, but not stop it entirely.
    - williawa 26 Nov 2025 7:43 UTC
      1 point
      0
      Parent
      I think this is debatable. Many won’t want to change at all.
      I defined what I meant by CEV in a way that doesn’t entail “changing”.