I think current AIs having this property is probably slightly differentially harmful for harder-to-check tasks and generally contributes to underelicitation. I don’t have a very strong view on the sign of general underelicitation in current models, but I tenatively think underelicitation is slightly bad overall.
I think current AIs having this property is probably slightly differentially harmful for harder-to-check tasks and generally contributes to underelicitation. I don’t have a very strong view on the sign of general underelicitation in current models, but I tenatively think underelicitation is slightly bad overall.