You assert that “Naturally, it is much worse to release a misaligned AI than to not release an aligned AI”.
I would disagree with that, if the odds of aligned AI, conditional on you not releasing this one were 50:50, then both mistakes are equally bad. If someone else is definitely going to release a paper-clipper next week, then it would make sense to release an AI with a 1% chance of being friendly now. (Bear in mind that no one would release an AI if they didn’t think it was friendly, so you might be defecting in an epistemic prisoners dilemma.)
I would put more weight on human researchers debating which AI is save before any code is run. I would also think that the space of friendly AIs is tiny compared to the space of all AIs, so making an AI that you put as much as 1% chance on being friendly is almost as hard as building a 99% chance friendly AI.
There could be specific circumstances where you know that another team will release a misaligned AI next week, but most of the time you’ll have a decent chance that you could just make a few more tweaks before releasing.
You assert that “Naturally, it is much worse to release a misaligned AI than to not release an aligned AI”.
I would disagree with that, if the odds of aligned AI, conditional on you not releasing this one were 50:50, then both mistakes are equally bad. If someone else is definitely going to release a paper-clipper next week, then it would make sense to release an AI with a 1% chance of being friendly now. (Bear in mind that no one would release an AI if they didn’t think it was friendly, so you might be defecting in an epistemic prisoners dilemma.)
I would put more weight on human researchers debating which AI is save before any code is run. I would also think that the space of friendly AIs is tiny compared to the space of all AIs, so making an AI that you put as much as 1% chance on being friendly is almost as hard as building a 99% chance friendly AI.
There could be specific circumstances where you know that another team will release a misaligned AI next week, but most of the time you’ll have a decent chance that you could just make a few more tweaks before releasing.