I think the language is a compromise, it’s not far from my view though. In particular I endorse:
it is unlikely that a single short sentence is sufficient for this sort of mind hack
and
Successful hacks may be safely detectable at first, [...], although this does not cover treacherous turns where the first successful hack frees a misaligned agent
and I do think that restricting debaters to short sentences helps, I just don’t think it fixes the problem.
I think the language is a compromise, it’s not far from my view though. In particular I endorse:
and
and I do think that restricting debaters to short sentences helps, I just don’t think it fixes the problem.