Stephen Fowler comments on Proposal for making credible commitments to AIs.

Stephen Fowler 28 Jun 2025 12:38 UTC
14 points
2
“Something has already gone seriously wrong and we already are in damage control.”
My p-doom is high, but I am not convinced the AI safety idea space has been thoroughly explored enough so that attempting a literal Faustian bargain is our best option.
I put the probability that early 21st century humans are able to successfully bargain with adversarial systems known to be excellent at manipulation incredibly low.

”I agree. There needs to be ways to make sure these promises mainly influence what humans choose for the far future after we win, not what humans choose for the present in ways which can affect whether we win.”
I think I agree with this.

I am particularly concerned that a culture where it is acceptable for researchers to bargain with unaligned AI agents leads to individual researchers deciding to negotiate unilaterally.
- Knight Lee 28 Jun 2025 19:44 UTC
  1 point
  0
  Parent
  I am particularly concerned that a culture where it is acceptable for researchers to bargain with unaligned AI agents leads to individual researchers deciding to negotiate unilaterally.
  That’s a very good point, now I find it much more plausible for things like this to be a net negative.
  The negative isn’t that big, since a lot of these people would have negotiated unilaterally even without such a culture, and AI takeover probably doesn’t hinge on a few people defecting. But a lot of these people probably have morals stopping them from it if not for the normalization.
  I still think it’s probably a net positive, but it’s now contingent on my guesstimate there’s significant chance it succeeds.