quetzal_rainbow comments on [Linkpost] Introducing Superalignment

quetzal_rainbow 6 Jul 2023 7:23 UTC
3 points
0
How are they going to ensure that “human-level alignment researcher” a) is human-level, b) stays at human level?

And, of course, it would be lovely to elaborate on training of misaligned models.
- FinalFormal2 7 Jul 2023 16:01 UTC
  1 point
  0
  Parent
  What would you mean by ‘stays at human level?’ I assume this isn’t going to be any kind of self-modifying?
  - quetzal_rainbow 8 Jul 2023 10:27 UTC
    1 point
    0
    Parent
    If I were a human-level intelligent computer program, I would put substantial effort to get ability to self-modify, but that’s not a point. My favorite analogy here is that humans were bad at addition before invention of positional arithmetic and then they became good. My concern is that we can invent seemingly human-level system which becomes above human-level after it learns some new cognitive strategy.