I am particularly concerned that a culture where it is acceptable for researchers to bargain with unaligned AI agents leads to individual researchers deciding to negotiate unilaterally.
That’s a very good point, now I find it much more plausible for things like this to be a net negative.
The negative isn’t that big, since a lot of these people would have negotiated unilaterally even without such a culture, and AI takeover probably doesn’t hinge on a few people defecting. But a lot of these people probably have morals stopping them from it if not for the normalization.
I still think it’s probably a net positive, but it’s now contingent on my guesstimate there’s significant chance it succeeds.
That’s a very good point, now I find it much more plausible for things like this to be a net negative.
The negative isn’t that big, since a lot of these people would have negotiated unilaterally even without such a culture, and AI takeover probably doesn’t hinge on a few people defecting. But a lot of these people probably have morals stopping them from it if not for the normalization.
I still think it’s probably a net positive, but it’s now contingent on my guesstimate there’s significant chance it succeeds.