Sure, it can be evidence of bad (or good) things, but that’s different from whether it’s safer in-and-of-itself. For me, it’s a positive update that Satisficers might be more natural than Maximizers.
For me, it seems really obviously the case that something that gets tired is less dangerous than something that doesn’t, all else equal.
I think current AIs having this property is probably slightly differentially harmful for harder-to-check tasks and generally contributes to underelicitation. I don’t have a very strong view on the sign of general underelicitation in current models, but I tenatively think underelicitation is slightly bad overall.
Sure, it can be evidence of bad (or good) things, but that’s different from whether it’s safer in-and-of-itself. For me, it’s a positive update that Satisficers might be more natural than Maximizers.
For me, it seems really obviously the case that something that gets tired is less dangerous than something that doesn’t, all else equal.
What is your threat model?
I think current AIs having this property is probably slightly differentially harmful for harder-to-check tasks and generally contributes to underelicitation. I don’t have a very strong view on the sign of general underelicitation in current models, but I tenatively think underelicitation is slightly bad overall.