GradientDissenter comments on LWLW’s Shortform

GradientDissenter 7 Nov 2025 23:33 UTC
15 points
26
I don’t understand how working on “AI control” here is any worse than working on AI alignment (I’m assuming you don’t feel the same about alignment since you don’t mention it).
In my mind, two different ways AI could cause bad things to happen are: (1) misuse: people use the AI use it for bad things, and (2) misalignment: regardless of anyone’s intent, the AI does bad things of its own accord.
Both seem bad. Alignment research and control are both ways to address misalignment problems, I don’t see how they differ for the purposes of your argument (though maybe I’m failing to understand your argument).
Addressing misalignment slightly increases people’s ability to misuse AI, but I think the effect is fairly small and outweighed by the benefit of decreasing the odds a misaligned AI takes catastrophic actions.
- waterlubber 8 Nov 2025 1:04 UTC
  5 points
  13
  Parent
  It’s not. Alignment is de facto capabilities (principal agent problem makes aligned employees more economically valuable) and unless we have a surefire way to ensure that the AI is aligned to some “universal,” or even cultural, values, it’ll be aligned by default to Altman, Amodei, et. al.
- clone of saturn 8 Nov 2025 0:25 UTC
  2 points
  1
  Parent
  We don’t know of an alignment target that everyone can agree on, so solving alignment pretty much guarantees misuse by at least some people’s lights.
  - habryka 8 Nov 2025 0:40 UTC
    11 points
    0
    Parent
    I mean “not solving alignment” pretty much guarantees misuse by everyone’s lights? (In both cases conditional on building ASI)
    - clone of saturn 8 Nov 2025 2:45 UTC
      12 points
      8
      Parent
      It pretty much guarantees extinction, but people can have different opinions on how bad that is relative to disempowerment, S-risks, etc.