Steven Byrnes comments on Framing approaches to alignment and the hard problem of AI cognition

Steven Byrnes 16 Dec 2021 3:49 UTC
4 points
0
One thing that consistently infuriates me is the extent to which the AI-safety community has invented it’s own terminology/onotology that is largely at odds with DL/ML. For example I had to dig deep to discover that ‘inner alignment’ mostly maps to ‘generalization error’
Nobody likes jargon (well, nobody worth listening to likes jargon) but there’s a reason that healthy fields have jargon, and it’s because precise communication of ideas within a field is important. “Inner alignment” indeed has some relationship to “generalization error” but they’re not exactly the same thing, and we can communicate better by using both terms where appropriate.
If your complaint is lack of good pedagogical materials, fair enough. Good pedagogy often exists, but it’s sometimes scattered about. Plus Rob Miles I guess.
and that ‘consequentialist agent’ mostly maps to model-based RL agent.
“Consequentialist” is a common English word, defined in the dictionary as “choosing actions based on their anticipated consequences” or something. Then the interesting question is “to what extent do different AI algorithms give rise to consequentialist behaviors”? I don’t think it’s binary, I think it’s a continuum. Some algorithms are exceptionally good at estimating the consequences of actions, even OOD, and use those consequences as the exclusive selection criterion; those would be maximally consequentialist. Some algorithms like GPT-3 are not consequentialist at all.
I think I’d disagree with “model-based RL = consequentialist”. For example, a model-free RL agent, with a long time horizon, acting in-distribution, does lots of things that look foresighted and strategic, and it does those things because of their likely eventual consequences (as indirectly inferred from past experience). (What is a Q-value if not “anticipated consequences”?) So it seems to me that we should call model-free RL agents “consequentialist” too.
I would say that model-based RL agents do “explicit planning” (whereas model-free ones usually don’t). I don’t think “agent that does explicit planning” is exactly the same as “consequentialist agent”. But they’re not totally unrelated either. Explicit planning can make an agent more consequentialist, by helping it estimate consequences better, in a wider variety of circumstances.
(I could be wrong on any of these, this is just my current impression of how people use these terms.)
- jacob_cannell 16 Dec 2021 7:19 UTC
  4 points
  0
  Parent
  So I said consequentialist mostly maps to model-based RL because “choosing actions based on their anticipated consequences” is just a literal plain english description of how model-based RL works—with the model-based predictive planning being an implementation of “anticipating consequences”.
  
  It’s more complicated for model-free RL, in part because with enough diverse training data and regularization various forms of consequentalist/planning systems could potentially develop as viable low complexity solutions.
  
  But effective consequentalist-planning requires significant compute and recursion depth such that it’s outside the scope of many simpler model-free systems—and i’m thinking primarily of earlier DM atari agents—so instead they often seem to develop a collection of clever heuristics that work well in most situations, without the ability to explicitly evaluate the long term consequences of specific actions in novel situations—thus more deontological.
  - Steven Byrnes 16 Dec 2021 14:06 UTC
    4 points
    0
    Parent
    Hmm, I would say that DQN “chooses actions based on their anticipated consequences” in that the Q-function incorporates an estimate of anticipated consequences. (Especially with a low discount rate.)
    I’m happy to say that model-based RL might be generically better at anticipating consequences (especially in novel circumstances) than model-free RL. Neither is perfect though.
    DQN has an implicit plan encoded in the Q-function—i.e., in state S1 action A1 seems good, and that brings us to state S2 where action A2 seems good, etc. … all that stuff together is (IMO) an implicit plan, and such a plan can involve short-term sacrifices for longer-term benefit.
    Whereas model-based RL with tree search (for example) has an explicit plan: at timestep T, it has an explicit representation of what it’s planning to do at timesteps T+1, T+2, ….
    Humans are able to make explicit plans too, although it doesn’t look like one-timestep-at-a-time.
    - jacob_cannell 16 Dec 2021 18:41 UTC
      2 points
      0
      Parent
      Sure you can consider the TD style unrolling in model-free a sort of implicit planning, but it’s not really consequentialist in most situations as it can’t dynamically explore new relevant expansions of the state tree the way planning can. Or you could consider planning as a dynamic few-shot extension to fast learning/updating the decision function.
      
      Human planning is sometimes explicit timestep by timestep (when playing certain board games for example) when that is what efficient planning demands, but in the more general case human planning uses more complex approximations that more freely jump across spatio-temporal approximation hierarchies.