Research bounties have an extremely serious flaw: you only get the money after you’ve done the work, while you probably need to pay for food, rent, and compute today.
My current situation is that I would love to do more work in technical alignment but nobody is paying me to do so, and I need to keep the lights on.
Smaller bounties could be a nice bonus for finishing a paper: if I had the option to take a grant paying something like £40k/year, I could justify working for 9 months to probably get an extra £20k bounty. I cannot justify working for free for nine months to probably get a £60k bounty, because if I fail or get scooped I’d then be dead broke.
So we still need grantmakers who were willing to sit down, evaluate people like me, and (sometimes) give them money. Maybe someone would fill that niche, e.g. giving me a £40k salary in exchange for £45k of the bounty. But if lots people were capable of doing that, why not just hire them to make the grants! This seems much easier than waiting for the good research-predictors to float to the top of the research bounty futures market.
Writing insecure code when instructed to write secure code is not really the same thing as being incorrigible. That’s just being disobedient.
Training an AI to be incorrigible would be a very weird process, since you’d be training it to not respond to certain types of training.