DragonGod comments on DragonGod’s Shortform

DragonGod 6 Feb 2023 4:25 UTC
3 points
0
Hypothesis: any learning task can be framed as a predictive task^[1]; hence, sufficiently powerful predictive models can learn anything.

A comprehensive and robust model of human preferences can be learned via SSL with a target of minimising predictive error on observed/recorded behaviour.

This is one of those ideas that naively seem like they basically solve the alignment problem, but surely it can’t be that easy.

Nonetheless recording this to come back to it after gitting gud at ML.

Potential Caveats

Maybe “sufficiently powerful predictive models” is doing a lot of heavy lifting.

Plausible the “irreducible entropy” in our records of human behaviour prevent learning values well (I don’t actually believe this).

Perhaps the dataset size required to get a sufficiently robust/comprehensive model is too large?

Another concern is the potential of mindcrime.
1. ↩︎
  I don’t think this hypothesis is original to me, and I expect I learned it from “Introduction to reinforcement learning by Hado van Hasselt”. (If not this particular video, then the second one in that series.)
- Bo Chin 6 Feb 2023 4:28 UTC
  1 point
  0
  Parent
  Isn’t prediction a subset of learning?
  - DragonGod 6 Feb 2023 4:38 UTC
    2 points
    0
    Parent
    Yeah, I think so.
    
    I don’t see this as necessarily refuting the hypothesis?
    - Bo Chin 6 Feb 2023 5:01 UTC
      1 point
      0
      Parent
      No, it sounded like tautology to me, so I wasn’t sure what it’s trying to address.
      - DragonGod 6 Feb 2023 5:04 UTC
        2 points
        0
        Parent
        It’s not a tautology. If prediction is a proper subset of learning, then not all learning tasks will necessarily be framable as prediction tasks.
        Bo Chin 6 Feb 2023 5:16 UTC
        1 point
        0
        Parent
        Which your hypothesis addresses