I’d conclude that most algorithms used today have the potential to be manipulative; but they may not be able to find the manipulative behaviour, given their limited capabilities.
I’d suspect that’s right, but I don’t think your title has the appropriate epistemic status. I think people in general should be more careful about for-all quantifiers wrt alignment work. There’s the use of the technical term “almost every”, but you did not prove the set of “powerful” algorithms which is not “manipulative” has measure zero. There’s also “would be” instead of “seems” (I think if you made this change, the title would be fine). I think it’s vitally important we use the correct epistemic markers; if not, this can lead to research predicated on obvious-seeming hunches stated as fact.
I’d suspect that’s right, but I don’t think your title has the appropriate epistemic status. I think people in general should be more careful about for-all quantifiers wrt alignment work. There’s the use of the technical term “almost every”, but you did not prove the set of “powerful” algorithms which is not “manipulative” has measure zero. There’s also “would be” instead of “seems” (I think if you made this change, the title would be fine). I think it’s vitally important we use the correct epistemic markers; if not, this can lead to research predicated on obvious-seeming hunches stated as fact.
Not that I disagree with your suspicion here.
Rephrased the title and the intro to make this clearer.