Stuart_Armstrong comments on Siren worlds and the perils of over-optimised search

Stuart_Armstrong 23 Jan 2020 11:13 UTC
3 points
0

that your more recent writing on Goodhart-style problems suggests that you think we can deal with such problems to the best of our ability by just modelling everything we must already know about our uncertainty and about our preferences (e.g., that they have diminishing returns).

To a large extent I do, but there may be some residual effects similar to the above, so some anti-optimising pressure might still be useful.