Charlie Steiner comments on Incentive Learning vs Dead Sea Salt Experiment

Charlie Steiner 28 Jun 2024 11:58 UTC
4 points
0
I feel like there’s gotta be several different “time constants” for valence/viscera prediction, and I’m not totally clear on when you’re talking about which.
1. A time constant in a learning algorithm that’s similar to temporal credit assignment. E.g. If I get a ground truth signal (by tasting salt, say) maybe credit assignment says that everything I did/felt in the last half-second (for a “time constant” of 0.5s I guess) gets treated as predictive of salt, with some decay and potential extra massaging such that the most immediately proximal thoughts are associated strongest.
2. Very similar to #1, a time constant in a learning algorithm that’s similar to td learning. E.g. If I predict salt, everything I did in the last half-second gets treated as predictive of salt.
3. A learning speed of #s 1 or 2 in practice. E.g. if there’s a button that releases a lever, and pulling the lever dispenses salt water into my mouth, maybe my salt predictor pretty quickly learns to predict salt when I’m thinking about the lever, and only more slowly learns to predict salt when I’m thinking about the button. The rate at which this predictive power propagates back through time could be different between environments, and between predicting salt vs. valence.
4. A time constant in the eventual product of a learning algorithm such as #1 or #2. E.g. maybe thinking about the button from the previous example never becomes as predictive of salt as thinking about the lever, and if you introduce a time delay between the button and the lever the predictive power of the button gets weaker as the time delay increases.
5. A sort of “time constant” that’s really more of a measure of generalization power / ability to assign predictions to complicated thoughts. E.g. if my salt predictor is small and can’t do complicated computation, maybe it can only ever learn to fire when I’m really obviously about to get salt, while the larger valence predictor can quickly learn to fire on multi-step plans to do good things. This might just be a different way of looking at a major cause of #4.
6. A time constant for predictions as a function of the world model’s own internal time stamps. E.g. If I imagine going to the store and buying some salty crackers in an hour, maybe I salivate less than if I imagine going to the store and buying some salty crackers in 2 minutes, because my thought-assessors are actually using my model’s internal representation of time to help predict how much salt to expect.