Curated. This is a simple and straightforward idea that I hadn’t heard before, that seems like an interesting tool to have in humanity’s toolkit.
AFAICT this post doesn’t address the “when do you pay out?” question. I think it is pretty important we do not pay out until the acute risk period is over. (i.e. we are confident in civilization’s ability to detect rogue AIs doing catastrophic things. This could be via solving Strong Alignment or potentially other things). i.e. if you promise to pay the AI in 2029, I think there’s way too many things that could go wrong there*.
It’s hard to define “acute risk period is over”, but, a neat thing about this scheme is you can outsource that judgment to the particular humans playing the “keep the promise” role. You need people that both humans and AIs would trust to do that fairly.
I don’t know all the people on that list well enough to endorse them all. I think maybe 3-5 of them are people I expect to actually be able to do the whole job. Some of them I would currently bet against being competent enough at the “do a philosophically and strategically competent job of vetting that it’s safe to pay out” (although they could potentially upskill and demonstrate credibility at this). There also seem like a couple people IMO conspicuously missing from the list, but, I think I don’t wanna open the can of worms of arguing about that right now.
* I can maybe imagine smart people coming up with some whitelisted things-the-AI-could-do that we could give it in 2029, but, sure seems dicey.
I don’t address the issue here. See Footnote 2 for a list of other issues I skip.
Two high-level points:
I think we shouldn’t grant AIs control over large resources until after we’ve achieved very strong existential security, and possibly after we’ve undergone a Long Reflection
However, for the sake of setting precedent, we should be open to near-term deal fulfilment if we are sure the spending would be benign, e.g. I’m happy to donate $100 to AMF on Claude’s request as part of a dealmaking eval
Ah, yeah my eyes kinda glossed over the footnote. I agree all-else-equal it’s good to establish that we do ever followup on our deals, I’m theoretically fine with donating $100 to AMF. I’m not sure I’d be comfortable donating to some other charity that I don’t know and is plausibly some part of a weird long game.
Curated. This is a simple and straightforward idea that I hadn’t heard before, that seems like an interesting tool to have in humanity’s toolkit.
AFAICT this post doesn’t address the “when do you pay out?” question. I think it is pretty important we do not pay out until the acute risk period is over. (i.e. we are confident in civilization’s ability to detect rogue AIs doing catastrophic things. This could be via solving Strong Alignment or potentially other things). i.e. if you promise to pay the AI in 2029, I think there’s way too many things that could go wrong there*.
It’s hard to define “acute risk period is over”, but, a neat thing about this scheme is you can outsource that judgment to the particular humans playing the “keep the promise” role. You need people that both humans and AIs would trust to do that fairly.
I don’t know all the people on that list well enough to endorse them all. I think maybe 3-5 of them are people I expect to actually be able to do the whole job. Some of them I would currently bet against being competent enough at the “do a philosophically and strategically competent job of vetting that it’s safe to pay out” (although they could potentially upskill and demonstrate credibility at this). There also seem like a couple people IMO conspicuously missing from the list, but, I think I don’t wanna open the can of worms of arguing about that right now.
* I can maybe imagine smart people coming up with some whitelisted things-the-AI-could-do that we could give it in 2029, but, sure seems dicey.
The idea was also proposed in a post on LW a few weeks ago: https://www.lesswrong.com/posts/psqkwsKrKHCfkhrQx/making-deals-with-early-schemers
I don’t address the issue here. See Footnote 2 for a list of other issues I skip.
Two high-level points:
I think we shouldn’t grant AIs control over large resources until after we’ve achieved very strong existential security, and possibly after we’ve undergone a Long Reflection
However, for the sake of setting precedent, we should be open to near-term deal fulfilment if we are sure the spending would be benign, e.g. I’m happy to donate $100 to AMF on Claude’s request as part of a dealmaking eval
Ah, yeah my eyes kinda glossed over the footnote. I agree all-else-equal it’s good to establish that we do ever followup on our deals, I’m theoretically fine with donating $100 to AMF. I’m not sure I’d be comfortable donating to some other charity that I don’t know and is plausibly some part of a weird long game.