Vladimir_Nesov comments on TurnTrout’s shortform feed

Vladimir_Nesov 30 Aug 2022 2:33 UTC
2 points
0
Even more generally, many alignment proposals are more worrying than some by-default future GPT-n things, provided they are not fine-tuned too much as well.

generalizing human values to superintelligent level

Trying to learn human values as an explicit concept is already alarming. At least right now breakdown of robustness is also breakdown of capability. But if there are multiple subsystems, or training data is mostly generated by the system itself, then capability might survive when other subsystems don’t, resulting in a demonstration of orthogonality thesis.