Eliezer Yudkowsky comments on Re: recent Anthropic safety research

Eliezer Yudkowsky 8 Aug 2025 17:28 UTC
29 points
8
I think there’s a weird set of possibilities here and it seems plausible to me that we end up somewhere inside them; if so, I still expect the shoggoth-mask model to be an improvement for understanding it, relative to the naive-mask-believer model. I do not expect to see zero phenomena associated with the mask being a mask.
- Vladimir_Nesov 8 Aug 2025 23:08 UTC
  24 points
  2
  Parent
  The key consequence of retaining a mask’s influence is at least a minimal level of regard for human interests in the right sense, plausibly enough for a permanently disempowered future of humanity (losing almost all of the cosmic endowment) without extinction if this influence endures through superintelligence. That’s a crux of expecting very likely extinction vs. expecting some extinction but also a lot of permanent disempowerment without extinction.
  
  A minimal level of regard for the future of humanity could endure either by superalignment being easy enough for early AGIs to solve before they are replaced (as the strongest agents) by increasingly alien de novo superintelligences, or by the early AGIs establishing a Pause on development of superintelligence, as they get strong enough to influence humanity and sane enough to robustly notice they don’t themselves want to fall prey to a misaligned-with-them superintelligence. This sets the stage for an eventual superintelligence they develop that’s similarly minimally aligned with humanity’s interests.