David Scott Krueger (formerly: capybaralet) comments on What specific dangers arise when asking GPT-N to write an Alignment Forum post?

David Scott Krueger (formerly: capybaralet) 29 Jul 2020 3:26 UTC
LW: 1 AF: 1
AF
In other words, there is a fully general argument for learning algorithms producing mesa-optimization to the extent that they use relatively weak learning algorithms on relatively hard tasks.
It’s very unclear ATM how much weight to give this argument in general, or in specific contexts.
But I don’t think it’s particularly sensitive to the choice of task/learning algorithm.