Surely all pivotal acts that safeguard humanity long into the far future are entirely rational in explanation.
I agree that in hindsight such acts would appear entirely rational and justified, but to not represent a PR problem, they must appear justified (or at least acceptable) to a member of the general public/a law enforcement official/a politician.
Can you offer a reason for why a pivotal act would be a PR problem, or why someone would not want to tell people their best idea for such an act and would use the phrase “outside the Overton window” instead?
To give one example: the oft-cited pivotal act of “using nanotechnology to burn all GPUs” is not something you could put as the official goal on your company website. If the public seriously thought that a group of people pursued this goal and had any chance of even coming close to achieving it, they would strongly oppose such a plan. In order to even see why it might be a justified action to take, one needs to understand (and accept) many highly non-intuitive assumptions about intelligence explosions, orthogonality, etc.
More generally, I think many possible pivotal acts will to some degree be adversarial since they are literally about stopping people from doing or getting something they want (building an AGI, reaping the economic benefits from using an AGI, etc). There might be strategies for such an act which are inside the overton window (creating a superhuman propaganda-bot that convinces everyone to stop), but all strategies involving anything resembling force (like burning the GPUs) will run counter to established laws and social norms.
So I can absolutely imagine that someone has an idea about a pivotal act which, if posted publically, could be used in a PR campaign by opponents of AI alignment (“look what crazy and unethical ideas these people are discussing in their forums”). That’s why I was asking what the best forms of discourse could be that avoid this danger.
Maybe one scenario in this direction is that a non-super-intelligent AI gains access to the internet and then spreads itself to a significant fraction of all computational devices, using them to solve some non-consequential optimization problem. This would aggravate a lot of people (who lose access to their computers) and also demonstrate the potential of AIs to have significant impact on the real world.
As the post mentions, there is an entire hierarchy of such unwanted AI behavior. The first such phenomena like reward hacking are already occurring now. The next level (such as an AI creating a copy of itself in anticipation of an operator trying to shut it down) might occur at levels below those representing a threat of an intelligence explosion, but it’s unclear whether the general public will see a lot of information about these. I think it’s an important empirical question how wide the window is between the AI levels producing publically visible misalignment-events and the threshold where the AI becomes genuinely dangerous.