tailcalled comments on Humans provide an untapped wealth of evidence about alignment

tailcalled 14 Jul 2022 20:49 UTC
LW: 10 AF: 3
9
AF
I don’t think I’ve ever seen a truly mechanistic, play-by-play and robust explanation of how anything works in human psychology. At least not by how I would label things, but maybe you are using the labels differently; can you give an example?
- TurnTrout 17 Jul 2022 19:25 UTC
  LW: 8 AF: 4
  3
  AF Parent
  “Humans are nice because they were selected to be nice”—non-mechanistic.
  “Humans are nice because their contextually activated heuristics were formed by past reinforcement by reward circuits A, B, C; this convergently occurs during childhood because of experiences D, E, F; credit assignment worked appropriately at that time because their abstraction-learning had been mostly taken care of by self-supervised predictive learning, as evidenced by developmental psychology timelines in G, H, I, and also possibly natural abstractions.”—mechanistic (although I can only fill in parts of this story for now)
  Although I’m not a widely-read scholar on what theories people have for human values, of those which I have read, most (but not all) are more like the first story than the second.
- Quintin Pope 14 Jul 2022 23:11 UTC
  7 points
  3
  Parent
  My point was that no one so deeply understands human value formation that they can confidently rule out the possibility of adapting a similar process to ASI. It seems you agree with that (or at least our lack of understanding)? Do you think our current understanding is sufficient to confidently conclude that human-adjacent / inspired approaches will not scale beyond human level?
  - tailcalled 15 Jul 2022 7:05 UTC
    4 points
    2
    Parent
    I think it depends on which subprocess you consider. Some subprocesses can be ruled out as viable with less information, others require more information.
    
    And yes, without having an enumeration of all the processes, one cannot know that there isn’t some unknown process that scales more easily.