MichaelDickens comments on We won’t solve post-alignment problems by doing research

MichaelDickens 28 Nov 2025 15:52 UTC
4 points
0
Good callout. I was glad to hear that Ilya is thinking about all sentient life and not just humans.

I didn’t interpret it to mean that he’s working on thing 1. The direct quote was

I think in particular, there’s a case to be made that it will be easier to build an AI that cares about sentient life than an AI that cares about human life alone, because the AI itself will be sentient. And if you think about things like mirror neurons and human empathy for animals, which you might argue it’s not big enough, but it exists. I think it’s an emergent property from the fact that we model others with the same circuit that we use to model ourselves, because that’s the most efficient thing to do.

Sounds to me like he expects an aligned AI to care about all sentient beings, but he isn’t necessarily working on making that happen. AFAIK Ilya’s new venture hasn’t published any alignment research yet, so we don’t know what exactly he’s working on.
- mishka 28 Nov 2025 17:12 UTC
  4 points
  0
  Parent
  In his earlier thinking (~2023) he was also quite focused on non-standard approaches to AI existential safety, and it was clear that he was expecting to collaborate with advanced AI systems on that.
  
  That’s an indirect evidence, but it does look like he is continuing in the same mindset.
  
  It would be nice if his org finds ways to publish those aspects of their activity which might contribute to AI existential safety ^[[1]] .
  1. ↩︎
    Since almost everyone is using “alignment” for “thing 2″ these days, I am trying to avoid the word; I doubt solving “thing 2” would contribute much to existential safety, and I can easily see how that might turn counterproductive instead.