habryka comments on How might we safely pass the buck to AI?

habryka 20 Feb 2025 3:29 UTC
LW: 14 AF: 6
5
AF
Seems good!
FWIW, at least in my mind this is in some sense approximately the only and central core of the alignment problem, and so having it left unaddressed feels confusing. It feels a bit like making a post about how to make a nuclear reactor where you happen to not say anything about how to prevent the uranium from going critical, but you did spend a lot of words about the how to make the cooling towers and the color of the bikeshed next door and how to translate the hot steam into energy.
Like, it’s fine, and I think it’s not crazy to think there are other hard parts, but it felt quite confusing to me.
- joshc 20 Feb 2025 3:42 UTC
  LW: 11 AF: 3
  0
  AF Parent
  I’m sympathetic to this reaction.
  
  I just don’t actually think many people agree that it’s the core of the problem, so I figured it was worth establishing this (and I think there are some other supplementary approaches like automated control and incentives that are worth throwing into the mix) before digging into the ‘how do we avoid alignment faking’ question
- Stephen McAleese 22 Feb 2025 11:58 UTC
  2 points
  0
  Parent
  I agree that this seems like a core alignment problem. The problem you are describing seems like a rephrasing of the ELK problem.