Alexander Gietelink Oldenziel comments on Alexander Gietelink Oldenziel’s Shortform

Alexander Gietelink Oldenziel 5 Jan 2025 22:18 UTC
4 points
0
The free energy talk probably confuses more than that it elucidates. Im not talking about random diffusion per se but connection between uniformly sampling and simplicity and simplicity-accuracy tradeoff.

Ive tried explaining more carefully where my thinking is currently at in my reply to lucius.

Also caveat that shortforms are halfbaked-by-design.
- Dmitry Vaintrob 5 Jan 2025 22:26 UTC
  4 points
  0
  Parent
  Yep, have been recently posting shortforms (as per your recommendation), and totally with you on the “halfbaked-by-design” concept (if Cheeseboard can do it, it must be a good idea right? :)
  
  I still don’t agree that free energy is core here. I think that the relevant question, which can be formulated without free energy, is whether various “simplicity/generality” priors push towards or away from human values (and you can then specialize to questions of effective dimension/llc, deep vs. shallow networks, ICL vs. weight learning, generalized ood generalization measurements, and so on to operationalize the inductive prior better). I don’t think there’s a consensus on whether generality is “good” or “bad”—I know Paul Christiano and ARC has gone both ways on this at various points.
  - Noosphere89 5 Jan 2025 22:42 UTC
    2 points
    0
    Parent
    I think simplicity/generality priors effectively have 0 effect on whether it’s pushed towards or away from human values, and is IMO kind of orthogonal to alignment-relevant questions.
    - Alexander Gietelink Oldenziel 5 Jan 2025 22:52 UTC
      2 points
      0
      Parent
      I’d be curious how you would describe the core problem of alignment.
      - Noosphere89 5 Jan 2025 23:25 UTC
        2 points
        0
        Parent
        I’d split it into how do we manage to instill in any goal/value that is ideally at least somewhat stable, ala inner alignment, and outer alignment, which is selecting a goal that is resistant to Goodharting.
        Alexander Gietelink Oldenziel 5 Jan 2025 23:36 UTC
        2 points
        0
        Parent
        Let’s focus on inner alignment. By instill you presumably mean train. What values get trained is ultimately a learning problem which in many cases (as long as one can formulate approximately a boltzmann distribution) comes down to a simplicity-accuracy tradeoff.
  - Alexander Gietelink Oldenziel 5 Jan 2025 22:34 UTC
    2 points
    0
    Parent
    Could you give some examples of what you are thinking of here ?
    - Dmitry Vaintrob 5 Jan 2025 23:14 UTC
      2 points
      0
      Parent
      You mean on more general algorithms being good vs. bad?
      - Alexander Gietelink Oldenziel 5 Jan 2025 23:14 UTC
        2 points
        0
        Parent
        Yes.
        Dmitry Vaintrob 5 Jan 2025 23:18 UTC
        2 points
        0
        Parent
        I haven’t thought about this enough to have a very mature opinion. On one hand being more general means you’re liable to goodheart more (i.e., with enough deeply general processing power, you understand that manipulating the market to start World War 3 will make your stock portfolio grow, so you act misaligned). On the other hand being less general means that AI’s are more liable to “partially memorize” how to act aligned in familiar situations, and go off the rails when sufficiently out-of-distribution situations are encountered. I think this is related to the question of “how general are humans”, and how stable are human values to being much more or much less general
        Alexander Gietelink Oldenziel 5 Jan 2025 23:33 UTC
        2 points
        0
        Parent
        I guess im mostly thinking about the regime where AIs are more capable and general than humans.
        
        It seems at first glance that the latter failure mode is more of a capability failure. Something one would expect to go away as AI truly surpasses humans. It doesnt seem core to the alignment problem to me.
        Dmitry Vaintrob 5 Jan 2025 23:23 UTC
        2 points
        0
        Parent
        Maybe a reductive summary is “general is good if outer alignment is easy but inner alignment is hard, but bad in the opposite case”
        Alexander Gietelink Oldenziel 5 Jan 2025 23:38 UTC
        2 points
        0
        Parent
        Isn’t it the other way around ?
        
        If inner alignment is hard then general is bad because applying less selection pressure, i.e. more generally, more simplicity prior, means more daemons/gremlins