Quadratic Reciprocity comments on Team Shard Status Report

Quadratic Reciprocity 13 Aug 2022 6:27 UTC
2 points
1
We’re going to take it off-distribution and see whether it terminally values (1) just the coins, (2) a small handful of in-distribution proxies for getting coins, or (3) all of its in-distribution proxies for coins
Is (2) here just referring to the type of stuff seen in Goal Misgeneralization in Deep Reinforcement Learning (CoinRun agent navigates to right-hand end of level instead of fetching the coin)?
- David Udell 15 Aug 2022 18:33 UTC
  1 point
  0
  Parent
  Yes!
  We … were somewhat schizophrenic when previously laying out our experiments roadmap, and failed sufficiently consider this existing result during early planning. What we would have done after replicating that result would have been much more of that stuff, trying to extract the qualitative relationships between learned values and varying training conditions.
  We are currently switching to RL text adventures instead, though, because we expect to extract many more bits about these qualitative relationships from observing RL-tuned language models.
  - Quadratic Reciprocity 15 Aug 2022 22:58 UTC
    1 point
    0
    Parent
    Cool! How do you tell if it is (2) or (3)?
    - David Udell 16 Aug 2022 0:27 UTC
      1 point
      0
      Parent
      When you take the agent off-distribution, offer it several proxies for in-distribution reinforcement. When you offer these such that going out of your way for one proxy detours you from going after a different proxy, and if you can modulate which proxy the agent detours for (by bringing some proxy much closer to the agent, say), you learned that the agent must care some about all those proxies it pursues at cost. If the agent hasn’t come to value a proxy at all, then it will never take a detour to get to that proxy.