Start working on larger/harder RL tasks that involve more complicated algorithms and/or search and/or alignment relevant phenomena such as goal misgeneralization. The way I see this going is Minigrid D4RL < ProcGen < Atari < whatever tasks Gato does.
Note that team shard (MATS stream) is already doing MI on procgen. I’ve been supervising/working with them on procgen for the last few weeks. We have a nice set of visualization techniques, have reproduced some of Langosco et al.’s cheese-maze policies, and overall it’s been going quite well.
Thank you for letting me know about your work on procgen with MI. It sounds like you’re making progress, particularly I’d be interested in your visualisation techniques (how do they compare to what was done in Understanding RL Vision?) and the reproduction of the cheese-maze policies (is this tricky? Do you think a DT could be well-calibrated on this problem?).
Some questions that might be useful to discuss more:
What are the pros/cons of doing DT vs actor-critic MI? (You’re using Actor-Critic of some form?). It could also be interesting to study analogous circuits in the DT vs AC scenarios.
I haven’t done anything with CNNs yet, for simplicity, but I might be able to calibrate my expectations on the value/challenges involved by chatting to the team shard MATS stream.
Glad to hear your progress is going well! I’ll be in the Bay Area for EAG if anyone from the team would like to chat.
We’re studying a net with the structure I commented below, trained via PPO. I’d be happy to discuss more at EAG.
Not posting much publicly right now so that we can a: work on the research sprint and b: let people preregister credences in various mechint / generalization propositions, so that they can calibrate / see how their opinions evolve over time.
We’re analyzing the mech-int-ungodly Impala architecture, from the paper. Basically
=== Impala
conv
maxpool2D
---- residual x2:
relu
conv
relu
conv
residual add from input to this residual block
=== /IMPALA
(repeat 2 more impalas)
---
relu
flatten
fully connected
relu
---
linear policy and value heads
so this mess has sixteen conv layers, was trained on pixels. We’re not doing coinrun for this MATS sprint, although a good amount of tooling should cross over.
This has presented some challenges—no linearity from decomposing an ongoing residual stream into head-contributions.
This looks cool, going to read in detail later.
Note that team shard (MATS stream) is already doing MI on procgen. I’ve been supervising/working with them on procgen for the last few weeks. We have a nice set of visualization techniques, have reproduced some of Langosco et al.’s cheese-maze policies, and overall it’s been going quite well.
Thank you for letting me know about your work on procgen with MI. It sounds like you’re making progress, particularly I’d be interested in your visualisation techniques (how do they compare to what was done in Understanding RL Vision?) and the reproduction of the cheese-maze policies (is this tricky? Do you think a DT could be well-calibrated on this problem?).
Some questions that might be useful to discuss more:
What are the pros/cons of doing DT vs actor-critic MI? (You’re using Actor-Critic of some form?). It could also be interesting to study analogous circuits in the DT vs AC scenarios.
I haven’t done anything with CNNs yet, for simplicity, but I might be able to calibrate my expectations on the value/challenges involved by chatting to the team shard MATS stream.
Glad to hear your progress is going well! I’ll be in the Bay Area for EAG if anyone from the team would like to chat.
We’re studying a net with the structure I commented below, trained via PPO. I’d be happy to discuss more at EAG.
Not posting much publicly right now so that we can a: work on the research sprint and b: let people preregister credences in various mechint / generalization propositions, so that they can calibrate / see how their opinions evolve over time.
Are you using decision transformers or other RL agents on procgens ? Also, do you plan to work on coinrun ?
We’re analyzing the mech-int-ungodly Impala architecture, from the paper. Basically
so this mess has sixteen conv layers, was trained on pixels. We’re not doing coinrun for this MATS sprint, although a good amount of tooling should cross over.
This has presented some challenges—no linearity from decomposing an ongoing residual stream into head-contributions.