It depends on properties of bounded search itself.
I.e., if you are properly calibrated domain expert who can make 200 statements on topic with assigned probability 0.5% and be wrong on average 1 time, then, when you arrive at probability 0.5% as a result of your search for examples, we can expect that your search space was adequate and wasn’t oversimplified, such that your result is not meaningless.
If you operate in confusing, novel, adversarial domain, especially when domain is “the future”, when you find yourself assigning probabilities 0.5% for any reason which is not literally theorems and physical laws, your default move should be to say “wait, this probability is ridiculous”.
Alternative hypothesis: when you recontextualize, you train model “you will get away with X, given this signal (encouragement to do X in system prompt)”. When model sees no such signal, it decides that it won’t get away with bad behavior and acts good.
I wonder what would happen if you recontextualize with pair of prompts “your actions and thoughts will be recorded and evaluated” vs “you are acting absolutely autonomously, no oversight”.