RobertM comments on Alignment Implications of LLM Successes: a Debate in One Act

RobertM 27 Oct 2023 5:12 UTC
4 points
1
Of course very few “doomers” think that current LLMs behave in ways that we parse as “nice” because they have a “hidden, instrumental motive for being nice” (in the sense that I expect you meant that). Current LLMs likely aren’t coherent & self-aware enough to have such hidden, instrumental motives at all.
- 1a3orn 27 Oct 2023 9:08 UTC
  5 points
  3
  Parent
  I agree with you about LLMs!
  
  If MIRI-adjacent pessimists think that, I think they should stop saying things like this, which—if you don’t think LLMs have instrumental motives—is the actual opposite of good communication:
  
  @Pradyumna: “I’m struggling to understand why LLMs are existential risks. So let’s say you did have a highly capable large language model. How could RLHF + scalable oversight fail in the training that could lead to every single person on this earth dying?”
  
  @ESYudkowsky: “Suppose you captured an extremely intelligent alien species that thought 1000 times faster than you, locked their whole civilization in a spatial box, and dropped bombs on them from the sky whenever their output didn’t match a desired target—as your own intelligence tried to measure that.
  
  What could they do to you, if when the ‘training’ phase was done, you tried using them the same way as current LLMs—eg, connecting them directly to the Internet?”
  
  (To the reader, lest you are concerned by this—the process of RLHF has no resemblance to this.)