I think you just answered your own question. Indeed, if the agent found that destroying its instances does not lead to less of its goals being achieved, then even a “naturalized” reasoner should not particularly care about destroying itself entirely.
You can’t win, Vader. If you strike me down, I shall become more powerful than you can possibly imagine.
You can’t win, Vader. If you strike me down, I shall become more powerful than you can possibly imagine.