FlorianH comments on The Monster in Our Heads

FlorianH 20 Jan 2025 23:23 UTC
1 point
0
Assumption 1: Most of us are not saints.
Assumption 2: AI safety is a public good.^[1]
[..simple standard incentives..]
Implication: The AI safety researcher, eventually finding himself rather too unlikely to individually be pivotal on either side, may rather ‘rationally’^[2] switch to ‘standard’ AI work.^[3]
So: A rather simple explanation seems to suffice to make sense of the big picture basic pattern you describe.
Doesn’t mean, the inner tension you point out isn’t interesting. But I don’t think very deep psychological factors needed to explain the general ‘AI safety becomes AI instead’ tendency, which I had the impression the post was meant to suggest.
1. ^
  Or, unaligned/unloving/whatever AGI a public bad.
2. ^
  I mean: individually ‘rational’ once we factor in another trait—Assumption 1b: The unfathomable scale of potential aggregate disutility from AI gone wrong, bottoms out into a constrained ‘negative’ individual utility in terms of the emotional value non-saint Joe places on it. So a 0.1 permille probability of saving the universe may individually rationally be dominated by mundane stuff like having an still somewhat cool and well-paying job or something.
3. ^
  The switch may psychologically be even easier if the employer had started out as actually well-intent and may now still have a bit of an ambiguous flair.