Joseph Bloom comments on Decision Transformer Interpretability

Joseph Bloom 7 Feb 2023 21:07 UTC
LW: 2 AF: 2
0
AF
Thank you for letting me know about your work on procgen with MI. It sounds like you’re making progress, particularly I’d be interested in your visualisation techniques (how do they compare to what was done in Understanding RL Vision?) and the reproduction of the cheese-maze policies (is this tricky? Do you think a DT could be well-calibrated on this problem?).

Some questions that might be useful to discuss more:
- What are the pros/cons of doing DT vs actor-critic MI? (You’re using Actor-Critic of some form?). It could also be interesting to study analogous circuits in the DT vs AC scenarios.
- I haven’t done anything with CNNs yet, for simplicity, but I might be able to calibrate my expectations on the value/challenges involved by chatting to the team shard MATS stream.
Glad to hear your progress is going well! I’ll be in the Bay Area for EAG if anyone from the team would like to chat.
- TurnTrout 14 Feb 2023 0:14 UTC
  LW: 4 AF: 4
  0
  AF Parent
  We’re studying a net with the structure I commented below, trained via PPO. I’d be happy to discuss more at EAG.
  Not posting much publicly right now so that we can a: work on the research sprint and b: let people preregister credences in various mechint / generalization propositions, so that they can calibrate / see how their opinions evolve over time.