If M is large (many values, many principles, many safety terms), alignment becomes expensive.
If we can arrange that a very large proportion of our alignment process happens during pretraining, then needing a lot of bits is less of a problem. Suppose human values were very large, complex, and fragile — but also almost entirely deducable from the pretraining data, given the correct pointer (I almost want to say witness string?). Then after pretraining, we only have to pass it that.
Pretraining doesn’t evade the lower bound: a “pointer” is just a compressed index into a large hypothesis space, and constructing it already requires resolving the same M-way ambiguity during pretraining. The lower bound applies regardless of where the bits are paid.
Obviously so. But 30T tokens is approximately 10^15 bits — i.e. more than the network can actually store. Some bits are in practice much cheaper than others.
If we can arrange that a very large proportion of our alignment process happens during pretraining, then needing a lot of bits is less of a problem. Suppose human values were very large, complex, and fragile — but also almost entirely deducable from the pretraining data, given the correct pointer (I almost want to say witness string?). Then after pretraining, we only have to pass it that.
Pretraining doesn’t evade the lower bound: a “pointer” is just a compressed index into a large hypothesis space, and constructing it already requires resolving the same M-way ambiguity during pretraining. The lower bound applies regardless of where the bits are paid.
Obviously so. But 30T tokens is approximately 10^15 bits — i.e. more than the network can actually store. Some bits are in practice much cheaper than others.