A general principle: if we constrain two neural networks to communicate via natural language, we need some pressure towards ensuring they actually use language in the same sense as humans do, rather than (e.g.) steganographically encoding the information they really care about.
The most robust way to do this: pass the language via a human, who tries to actually understand the language, then does their best to rephrase it according to their own understanding.
What do you lose by doing this? Mainly: you can no longer send messages too complex for humans to understand. Also: in general you can’t backprop through discrete language anyway, but I’d guess there are some tricks for approximating that which don’t work as well when a human is in the loop.
That doesn’t actually solve the problem. The system could just encode the desired information in the semantics of some unrelated sentences—e.g. talk about pasta to indicate X = 0, or talk about rain to indicate X = 1.
Another possible way to provide pressure towards using language in a human-sense way is some form of multi-tasking/multi-agent scenario, inspired by this paper: Multitasking Inhibits Semantic Drift. They show that if you pretrain multiple instructors and instruction executors to understand language in a human-like way (e.g. with supervised labels), and then during training mix the instructors and instruction executors, it makes it difficult to drift from the original semantics, as all the instructors and instruction executors would need to drift in the same direction; equivalently, any local change in semantics would be sub-optimal compared to using language in the semantically correct way. The examples in the paper are on quite toy problems, but I think in principle this could work.
Not being able to send messages too complex for humans to understand seems to me like it’s plausibly a benefit for many of the cases where you’d want to do this.
A general principle: if we constrain two neural networks to communicate via natural language, we need some pressure towards ensuring they actually use language in the same sense as humans do, rather than (e.g.) steganographically encoding the information they really care about.
The most robust way to do this: pass the language via a human, who tries to actually understand the language, then does their best to rephrase it according to their own understanding.
What do you lose by doing this? Mainly: you can no longer send messages too complex for humans to understand. Also: in general you can’t backprop through discrete language anyway, but I’d guess there are some tricks for approximating that which don’t work as well when a human is in the loop.
That doesn’t actually solve the problem. The system could just encode the desired information in the semantics of some unrelated sentences—e.g. talk about pasta to indicate X = 0, or talk about rain to indicate X = 1.
I expected you to bring up the Natural Abstraction Hypothesis here. Wouldn’t the communication between the parties naturally use the same concepts?
Same concepts yes, but that does not necessarily imply that they’re encoded in the same way as humans typically use language.
Another possible way to provide pressure towards using language in a human-sense way is some form of multi-tasking/multi-agent scenario, inspired by this paper: Multitasking Inhibits Semantic Drift. They show that if you pretrain multiple instructors and instruction executors to understand language in a human-like way (e.g. with supervised labels), and then during training mix the instructors and instruction executors, it makes it difficult to drift from the original semantics, as all the instructors and instruction executors would need to drift in the same direction; equivalently, any local change in semantics would be sub-optimal compared to using language in the semantically correct way. The examples in the paper are on quite toy problems, but I think in principle this could work.
Not being able to send messages too complex for humans to understand seems to me like it’s plausibly a benefit for many of the cases where you’d want to do this.
steganographically?
Ooops, yes, ty.