Ramana Kumar comments on Simulators

Ramana Kumar 8 Sep 2022 10:24 UTC
LW: 9 AF: 5
9
AF
I think Dan’s point is good: that the weights don’t change, and the activations are reset between runs, so the same input (including rng) always produces the same output.
I agree with you that the weights and activations encode knowledge, but Dan’s point is still a limit on learning.
I think there are two options for where learning may be happening under these conditions:
- During the forward pass. Even though the function always produces the same output for a given output, the computation of that output involves some learning.
- Using the environment as memory. Think of the neural network function as a choose-your-own-adventure book that includes responses to many possible situations depending on which prompt is selected next by the environment (which itself depends on the last output from the function). Learning occurs in the selection of which paths are actually traversed.
These can occur together. E.g., the “same character” as was invoked by prompt 1 may be invoked by prompt 2, but they now have more knowledge (some of which was latent in the weights, some of which came in directly via prompt 2; but all of which was triggered by prompt 2).