Dmitry Vaintrob comments on Alexander Gietelink Oldenziel’s Shortform

Dmitry Vaintrob 5 Jan 2025 23:14 UTC
2 points
0
You mean on more general algorithms being good vs. bad?
- Alexander Gietelink Oldenziel 5 Jan 2025 23:14 UTC
  2 points
  0
  Parent
  Yes.
  - Dmitry Vaintrob 5 Jan 2025 23:18 UTC
    2 points
    0
    Parent
    I haven’t thought about this enough to have a very mature opinion. On one hand being more general means you’re liable to goodheart more (i.e., with enough deeply general processing power, you understand that manipulating the market to start World War 3 will make your stock portfolio grow, so you act misaligned). On the other hand being less general means that AI’s are more liable to “partially memorize” how to act aligned in familiar situations, and go off the rails when sufficiently out-of-distribution situations are encountered. I think this is related to the question of “how general are humans”, and how stable are human values to being much more or much less general
    - Alexander Gietelink Oldenziel 5 Jan 2025 23:33 UTC
      2 points
      0
      Parent
      I guess im mostly thinking about the regime where AIs are more capable and general than humans.
      
      It seems at first glance that the latter failure mode is more of a capability failure. Something one would expect to go away as AI truly surpasses humans. It doesnt seem core to the alignment problem to me.
    - Dmitry Vaintrob 5 Jan 2025 23:23 UTC
      2 points
      0
      Parent
      Maybe a reductive summary is “general is good if outer alignment is easy but inner alignment is hard, but bad in the opposite case”
      - Alexander Gietelink Oldenziel 5 Jan 2025 23:38 UTC
        2 points
        0
        Parent
        Isn’t it the other way around ?
        
        If inner alignment is hard then general is bad because applying less selection pressure, i.e. more generally, more simplicity prior, means more daemons/gremlins