I think it means that whatever you get is conservative in cases where it’s unsure of whether it’s in training, which may translate to being conservative where it’s unsure of success in general.
I agree it doesn’t rule out an AI that takes a long shot at takeover! But whatever cognition we posit that the AI executes, it has to yield very high training performance. So AIs that think they have a very short window for influence or are less-than-perfect at detecting training environments are ruled out.
I think it means that whatever you get is conservative in cases where it’s unsure of whether it’s in training, which may translate to being conservative where it’s unsure of success in general.
I agree it doesn’t rule out an AI that takes a long shot at takeover! But whatever cognition we posit that the AI executes, it has to yield very high training performance. So AIs that think they have a very short window for influence or are less-than-perfect at detecting training environments are ruled out.