TurnTrout comments on Decision Transformer Interpretability

TurnTrout 7 Feb 2023 17:37 UTC
LW: 3 AF: 3
0
AF
This looks cool, going to read in detail later.
Start working on larger/harder RL tasks that involve more complicated algorithms and/or search and/or alignment relevant phenomena such as goal misgeneralization. The way I see this going is Minigrid D4RL < ProcGen < Atari < whatever tasks Gato does.
Note that team shard (MATS stream) is already doing MI on procgen. I’ve been supervising/working with them on procgen for the last few weeks. We have a nice set of visualization techniques, have reproduced some of Langosco et al.’s cheese-maze policies, and overall it’s been going quite well.
- Joseph Bloom 7 Feb 2023 21:07 UTC
  LW: 2 AF: 2
  0
  AF Parent
  Thank you for letting me know about your work on procgen with MI. It sounds like you’re making progress, particularly I’d be interested in your visualisation techniques (how do they compare to what was done in Understanding RL Vision?) and the reproduction of the cheese-maze policies (is this tricky? Do you think a DT could be well-calibrated on this problem?).
  
  Some questions that might be useful to discuss more:
  - What are the pros/cons of doing DT vs actor-critic MI? (You’re using Actor-Critic of some form?). It could also be interesting to study analogous circuits in the DT vs AC scenarios.
  - I haven’t done anything with CNNs yet, for simplicity, but I might be able to calibrate my expectations on the value/challenges involved by chatting to the team shard MATS stream.
  Glad to hear your progress is going well! I’ll be in the Bay Area for EAG if anyone from the team would like to chat.
  - TurnTrout 14 Feb 2023 0:14 UTC
    LW: 4 AF: 4
    0
    AF Parent
    We’re studying a net with the structure I commented below, trained via PPO. I’d be happy to discuss more at EAG.
    Not posting much publicly right now so that we can a: work on the research sprint and b: let people preregister credences in various mechint / generalization propositions, so that they can calibrate / see how their opinions evolve over time.
- Butanium 8 Feb 2023 1:00 UTC
  1 point
  0
  AF Parent
  Are you using decision transformers or other RL agents on procgens ? Also, do you plan to work on coinrun ?
  - TurnTrout 13 Feb 2023 23:39 UTC
    LW: 4 AF: 4
    0
    AF Parent
    We’re analyzing the mech-int-ungodly Impala architecture, from the paper. Basically
    === Impala conv maxpool2D ---- residual x2: relu conv relu conv residual add from input to this residual block === /IMPALA (repeat 2 more impalas) --- relu flatten fully connected relu --- linear policy and value heads
    so this mess has sixteen conv layers, was trained on pixels. We’re not doing coinrun for this MATS sprint, although a good amount of tooling should cross over.
    This has presented some challenges—no linearity from decomposing an ongoing residual stream into head-contributions.
    What links here?
    TurnTrout's comment on Decision Transformer Interpretability by Joseph Bloom (14 Feb 2023 0:14 UTC; 4 points)
  - [ ]
    [deleted]