Daniel Kokotajlo comments on Arguments for optimism on AI Alignment (I don’t endorse this version, will reupload a new version soon.)

Daniel Kokotajlo 16 Oct 2023 13:51 UTC
2 points
0
r one particular example, you can randomly double your training data, or the size of the model, and it will work usually just fine. A rocket would explode if you tried to double the size of your fuel tanks.

The analogy was about the alignment problem, not the capabilities problem.

A rocket won’t get to the moon if you randomly double one of the variables used to navigate, like the amount of thrust applied in maneuvers or the angle of attack. (well, not unless you’ve built in good error-correction and redundancy etc.)
- Noosphere89 16 Oct 2023 15:22 UTC
  2 points
  5
  Parent
  The point here is that there are enough results in ML like this that I’m more skeptical of the security mindset being accurate, and ML/AI alignment is a strange enough domain such that we shouldn’t port over intuitions from other fields, like you shouldn’t port over intuitions from the large scale to quantum mechanics.
  
  For a specific example relevant to alignment, I talked about SGD’s corrective properties in a section of the post.
  
  Another good example has to do with with the fact that AIs are generally modular and you can switch out parts without breaking the AI, which couldn’t be done under a security mindset as it would predict that either the AI spits out nonsense or breaks it’s security, none of which have happened.