RogerDearnaley comments on From Barriers to Alignment to the First Formal Corrigibility Guarantees

RogerDearnaley 24 Dec 2025 23:10 UTC
2 points
0
If $M$ is large (many values, many principles, many safety terms), alignment becomes expensive.
If we can arrange that a very large proportion of our alignment process happens during pretraining, then needing a lot of bits is less of a problem. Suppose human values were very large, complex, and fragile — but also almost entirely deducable from the pretraining data, given the correct pointer (I almost want to say witness string?). Then after pretraining, we only have to pass it that.
- Aran Nayebi 25 Dec 2025 14:22 UTC
  3 points
  2
  Parent
  Pretraining doesn’t evade the lower bound: a “pointer” is just a compressed index into a large hypothesis space, and constructing it already requires resolving the same M-way ambiguity during pretraining. The lower bound applies regardless of where the bits are paid.
  - RogerDearnaley 25 Dec 2025 16:48 UTC
    2 points
    0
    Parent
    Obviously so. But 30T tokens is approximately 10^15 bits — i.e. more than the network can actually store. Some bits are in practice much cheaper than others.