I would be curious if you have any thoughts about using this for steganography?
I might be understanding the post wrongly but here’s what I’m thinking of:
There’s some degree in which you can describe circuits or higher order ways of storing information in NNs through renormalization (or that’s at least the hypothesis). Essentially you might then be able to set up a “portfolio” of different lenses that all can be correct in various ways (due to polysemanticity).
If you then have all of the reconceptualizations you can then sample on these to check what type of information actually is being transmitted in multi-agent scenarios?
Since if there’s something like 6 stable renormalizations and we’re only checking one of them we can’t be sure there isn’t hidden information being transmitted?
Anyways, I’m very excited for this direction, it seems like a very good way of crossing the theory practice gap for interpretability!
I think @Dmitry Vaintrob has more context on this, but I’d be really interested in exploring this idea more. I’m not sure we’d be able to enumerate all possible ways to renormalize an interpretation, but I agree that having a portfolio RG-inspired lenses could help. Are you imagining that each agent would run the lenses over information from other agents and aggregate them in some way (how?)?
If AI systems get smart enough, they will develop understanding of various ways of categorizing their knowledge. For humans this manifests as emotions and various other things like body language that we through theory of mind we assume we share. This means that when we communicate we can hide a lot of subtext through what we say, or in other words there are various ways of interpreting this information signal?
This means that there will be various hidden ways for AIs to communicate with each other.
By sampling on something like when other AI systems change their behaviour from a communication but we don’t know what did it, we can discover communication that share hidden information.
We can then renormalize the signals with hidden information that are being sent between AI systems and therefore discover when they’re communicating hidden information?
The idea is not to do all ways, it is rather like a PCA that’s dependent on the computational power you have. Also, it wouldn’t be agent based, it is more like an overview and the main class citizen is the information signal itself if that makes sense? You can then do it with various AI configurations and find if there are any invariant renormalizations?
This is absolutely fascinating to me, great post!
I would be curious if you have any thoughts about using this for steganography?
I might be understanding the post wrongly but here’s what I’m thinking of:
There’s some degree in which you can describe circuits or higher order ways of storing information in NNs through renormalization (or that’s at least the hypothesis). Essentially you might then be able to set up a “portfolio” of different lenses that all can be correct in various ways (due to polysemanticity).
If you then have all of the reconceptualizations you can then sample on these to check what type of information actually is being transmitted in multi-agent scenarios?
Since if there’s something like 6 stable renormalizations and we’re only checking one of them we can’t be sure there isn’t hidden information being transmitted?
Anyways, I’m very excited for this direction, it seems like a very good way of crossing the theory practice gap for interpretability!
I think @Dmitry Vaintrob has more context on this, but I’d be really interested in exploring this idea more. I’m not sure we’d be able to enumerate all possible ways to renormalize an interpretation, but I agree that having a portfolio RG-inspired lenses could help. Are you imagining that each agent would run the lenses over information from other agents and aggregate them in some way (how?)?
So my thinking is something like this:
If AI systems get smart enough, they will develop understanding of various ways of categorizing their knowledge. For humans this manifests as emotions and various other things like body language that we through theory of mind we assume we share. This means that when we communicate we can hide a lot of subtext through what we say, or in other words there are various ways of interpreting this information signal?
This means that there will be various hidden ways for AIs to communicate with each other.
By sampling on something like when other AI systems change their behaviour from a communication but we don’t know what did it, we can discover communication that share hidden information.
We can then renormalize the signals with hidden information that are being sent between AI systems and therefore discover when they’re communicating hidden information?
The idea is not to do all ways, it is rather like a PCA that’s dependent on the computational power you have. Also, it wouldn’t be agent based, it is more like an overview and the main class citizen is the information signal itself if that makes sense? You can then do it with various AI configurations and find if there are any invariant renormalizations?