Vivek Hebbar comments on Understanding and controlling a maze-solving policy network

Vivek Hebbar 11 Mar 2023 23:48 UTC
LW: 4 AF: 2
−2
AF
Any idea why “cheese Euclidean distance to top-right corner” is so important? It’s surprising to me because the convolutional layers should apply the same filter everywhere.
- Vaniver 12 Mar 2023 21:44 UTC
  LW: 3 AF: 1
  0
  AF Parent
  My naive guess is that the other relationships are nonlinear, and this is the best way to approximate those relationships out of just linear relationships of the variables the regressor had access to.
  - TurnTrout 13 Mar 2023 15:23 UTC
    LW: 2 AF: 2
    0
    AF Parent
    Hm, what do you mean by “other relationships”? Is your guess that “cheese Euclidean distance to top-right” is a statistical artifact, or something else?
    If so—I’m quite confident that relationship isn’t an artifact (although I don’t strongly believe that the network is literally modulating its decisions on the basis of this exact formalization). For example, see footnote 4. I’d also be happy to generate additional vector field visualizations in support of this claim.
    - Vaniver 13 Mar 2023 17:51 UTC
      LW: 2 AF: 1
      0
      AF Parent
      Is the dataset you used for the regression available? Might be easier to generate the graphs that I’m thinking of then describe them.
      [EDIT: I was confused when I wrote the earlier comment, I thought Vivek was talking about the decision square distance to the top 5x5 corner, which I do think my naive guess is plausible for; I don’t have the same guess about cheese Euclidean distance to top right corner.]
      - TurnTrout 21 Mar 2023 3:00 UTC
        LW: 2 AF: 2
        0
        AF Parent
        Here’s a colab notebook (it takes a while to load the data, be warned). We’ll have a post out later.
      - TurnTrout 15 Mar 2023 3:44 UTC
        LW: 2 AF: 2
        0
        AF Parent
        Yeah, we’ll put up additional notebooks/resources/datasets soon.
        Monte M 15 Mar 2023 18:45 UTC
        10 points
        0
        Parent
        Thanks for the good thoughts and questions on this! We’re taking a closer look at the behavioral statistics modeling, and here are some heatmaps that visualize the “cheese Euclidean distance to top-right corner” metric’s relationship with the chance of successful cheese-finding.
        These plots show the frequency of cheese-finding over 10k random mazes (sampled from the “maze has a decision square” distribution) vs the x/y offset from the top-right corner to the cheese location. The raw data is shown, plus a version binned into 5x5 patches to get more samples in each bin. The bin counts are also plotted for reference. (The unequal sampling is expected, as all maze sizes can have small cheese-corner offsets, but only large mazes can have large offsets. The smallest 5x5 bin by count has 35 data points).
        We can see a pretty clear relationship between cheese-corner offset and probability of finding the cheese, with the expected perfect performance in the top-right 5x5 patch that was the only allowed cheese location during the training of this particular agent. But the relationship is non-linear, and of cause doesn’t provide direct evidence of causality.
- TurnTrout 13 Mar 2023 15:28 UTC
  LW: 2 AF: 2
  0
  AF Parent
  I’m also lightly surprised by the strength of the relationship, but not because of the convolutional layers. It seems like if “convolutional layers apply the same filter everywhere” makes me surprised by the cheese-distance influence, it should also make me be surprised by “the mouse behaves differently in a dead-end versus a long corridor” or “the mouse tends to go to the top-right.”
  (I have some sense of “maybe I’m not grappling with Vivek’s reasons for being surprised”, so feel free to tell me if so!)