Take the derivative of one of the output logits with respect to the input embeddings, and also the derivative of the output logits with respect to the input tokenization.
Perform SVD, see which individual inputs have the greatest effect on the output (sparse addition), and which overall vibes have the greatest effect (low rank decomposition singular vectors)
Do this combination for literally everything in the network, see if anything interesting pops out
I want to know how we can tell ahead of time what aspects of the environemnt are controlling an agent’s decision making
In an RL agent, we can imagine taking the derivative of its decision wrt its environment input, and also each layer.
For each layer matrix, do SVD, large right singular vectors will indicate aspects of the previous layer which most influence its decision.
How can we string this together with the left singular vectors, which end up going through ReLU?
Reason we’d want to string these together is so that we can hopefully put everything in terms of the original input of the network, tying the singular values to a known ontology
See if there are any differences first. May be that ReLU doesn’t actually do anything important here
See what corrections we’d have to implement between the singular vectors in order to make them equal.
How different are they, and in what way? If you made random columns of the U matrix zero (corresponding I think to making random entries of the left singular vector zero), does this make the singular vectors line up more?
What happens when you train a network, then remove all the ReLUs (or other nonlinear stuff)?
If its still ok approximation, then what happens if you just interpret new network’s output in terms of input singular vectors?
If its not an ok approximation, how many ReLUs do you need in order to bring it back to baseline? Which ones provide the greatest marginal loss increase? Which ones provide the least?
Which inputs are affected the most by the ReLU being taken away? Which inputs are affected the least?
In deviations from that correlation, are we able to locate non-linear influences?
Does this end up being related to shards? Shards as the things which determine relevance (and thus the spreading) of information through the rest of the network?
What happens if you cut off sufficiently small singular values? How many singular vectors do you actually need to describe the operation of GPT-2?
Take a maze-solving RL agent trained to competence, then start dis-rewarding it for getting to the cheese. What’s the new behavior? Does it still navigate & get to upper right, but then once in the upper right makes sure to do nothing? Or does it do something else? Seems like shard theory would say it would still navigate to upper right.
If it *does* navigate to upper right, but then do nothing, what changed in its weights? Parts that stayed the same (or changed the least) should correspond roughly to parts which have to do with navigating the maze. Parts that change have to do with going directly to the cheese.
Projects I’d do if only I were faster at coding
Take the derivative of one of the output logits with respect to the input embeddings, and also the derivative of the output logits with respect to the input tokenization.
Perform SVD, see which individual inputs have the greatest effect on the output (sparse addition), and which overall vibes have the greatest effect (low rank decomposition singular vectors)
Do this combination for literally everything in the network, see if anything interesting pops out
I want to know how we can tell ahead of time what aspects of the environemnt are controlling an agent’s decision making
In an RL agent, we can imagine taking the derivative of its decision wrt its environment input, and also each layer.
For each layer matrix, do SVD, large right singular vectors will indicate aspects of the previous layer which most influence its decision.
How can we string this together with the left singular vectors, which end up going through ReLU?
Reason we’d want to string these together is so that we can hopefully put everything in terms of the original input of the network, tying the singular values to a known ontology
See if there are any differences first. May be that ReLU doesn’t actually do anything important here
See what corrections we’d have to implement between the singular vectors in order to make them equal.
How different are they, and in what way? If you made random columns of the U matrix zero (corresponding I think to making random entries of the left singular vector zero), does this make the singular vectors line up more?
What happens when you train a network, then remove all the ReLUs (or other nonlinear stuff)?
If its still ok approximation, then what happens if you just interpret new network’s output in terms of input singular vectors?
If its not an ok approximation, how many ReLUs do you need in order to bring it back to baseline? Which ones provide the greatest marginal loss increase? Which ones provide the least?
Which inputs are affected the most by the ReLU being taken away? Which inputs are affected the least?
Information theoretic analysis of GPT-2. What does that do?
Can we trace what information is being thrown away when?
Does this correlate well with number of large singular values?
In deviations from that correlation, are we able to locate non-linear influences?
Does this end up being related to shards? Shards as the things which determine relevance (and thus the spreading) of information through the rest of the network?
What happens if you cut off sufficiently small singular values? How many singular vectors do you actually need to describe the operation of GPT-2?
Take a maze-solving RL agent trained to competence, then start dis-rewarding it for getting to the cheese. What’s the new behavior? Does it still navigate & get to upper right, but then once in the upper right makes sure to do nothing? Or does it do something else? Seems like shard theory would say it would still navigate to upper right.
If it *does* navigate to upper right, but then do nothing, what changed in its weights? Parts that stayed the same (or changed the least) should correspond roughly to parts which have to do with navigating the maze. Parts that change have to do with going directly to the cheese.
I would no longer do many if these projects