In order to argue that alignment is importantly easier in slow takeoff worlds, you need to argue that there do not exist fatal problems which will not be found given more time.
I need something weaker; just that we should put some probability on there not being fatal problems which will not be found given more time. (I.e. , some probability that the extra time helps us find the last remaining fatal problems).
And that seems reasonable. In your toy model there’s 100% chance that we’re doomed. Sure, in that case extra time doesn’t help. But in models where our actions can prevent doom, extra time typically will help. And I think we should be uncertain enough about difficulty of the problem that we should put some probability on worlds where our actions can prevent doom. So we’ll end up concluding that more time does help.
This is the important part and it seems wrong.
Firstly, there’s going to be a community of people trying to find and fix the hard problems, and if they have longer to do that then they will be more likely to succeed.
Secondly, ‘nonobvious’ isn’t a an all-or-nothing term. There can easily be problems which are nonobvious enough that you don’t notice them with weeks of adversarial training but you do notice them with months or years.