Hypothesis: any learning task can be framed as a predictive task[1]; hence, sufficiently powerful predictive models can learn anything.
A comprehensive and robust model of human preferences can be learned via SSL with a target of minimising predictive error on observed/recorded behaviour.
This is one of those ideas that naively seem like they basically solve the alignment problem, but surely it can’t be that easy.
Nonetheless recording this to come back to it after gitting gud at ML.
Potential Caveats
Maybe “sufficiently powerful predictive models” is doing a lot of heavy lifting.
Plausible the “irreducible entropy” in our records of human behaviour prevent learning values well (I don’t actually believe this).
Perhaps the dataset size required to get a sufficiently robust/comprehensive model is too large?
Hypothesis: any learning task can be framed as a predictive task[1]; hence, sufficiently powerful predictive models can learn anything.
A comprehensive and robust model of human preferences can be learned via SSL with a target of minimising predictive error on observed/recorded behaviour.
This is one of those ideas that naively seem like they basically solve the alignment problem, but surely it can’t be that easy.
Nonetheless recording this to come back to it after gitting gud at ML.
Potential Caveats
Maybe “sufficiently powerful predictive models” is doing a lot of heavy lifting.
Plausible the “irreducible entropy” in our records of human behaviour prevent learning values well (I don’t actually believe this).
Perhaps the dataset size required to get a sufficiently robust/comprehensive model is too large?
Another concern is the potential of mindcrime.
I don’t think this hypothesis is original to me, and I expect I learned it from “Introduction to reinforcement learning by Hado van Hasselt”. (If not this particular video, then the second one in that series.)
Isn’t prediction a subset of learning?
Yeah, I think so.
I don’t see this as necessarily refuting the hypothesis?
No, it sounded like tautology to me, so I wasn’t sure what it’s trying to address.
It’s not a tautology. If prediction is a proper subset of learning, then not all learning tasks will necessarily be framable as prediction tasks.
Which your hypothesis addresses