Rohin Shah comments on Ways to buy time

Rohin Shah 21 Nov 2022 18:29 UTC
2 points
0
Why do you think that the number of people who could make a convincing case to you is so low?
Because I ran the experiment and very few people passed. (The extrapolation from that to an estimate for the world is guesswork.)
Where do they normally mess up?
There’s a lot of different arguments people give, that I dislike for different reasons, but one somewhat common theme was that their argument was not robust to “it seems like InstructGPT is basically doing what its users want when it is capable of it, why not expect scaled up InstructGPT to just continue doing what its users want?”
(And when I explicitly said something like that, they didn’t have a great response.)
- Chris_Leong 22 Nov 2022 1:05 UTC
  2 points
  0
  Parent
  Yeah… I suppose you could go through Evan Hubringer’s arguments in “How likely is deceptive alignment?”, but I suppose you’d probably have some further pushback which would be hard to answer.