The priors on what the correct action is are different if you’re facing a contrived test vs. a realistic scenario. In an academic setting, if you see a debugging solution that seems like it has a plurality of evidence and just one or two facts that don’t make sense, you can be pretty confident that that option is still the result. This often leaves the AI overconfident and means that it will return early, having identified what would be the solution if it were navigating an RL environment instead of the real world, when really it should have done more investigation.
What predictions does this model make?
The priors on what the correct action is are different if you’re facing a contrived test vs. a realistic scenario. In an academic setting, if you see a debugging solution that seems like it has a plurality of evidence and just one or two facts that don’t make sense, you can be pretty confident that that option is still the result. This often leaves the AI overconfident and means that it will return early, having identified what would be the solution if it were navigating an RL environment instead of the real world, when really it should have done more investigation.