Cleo Nardo comments on Shortform

Cleo Nardo 26 Dec 2024 0:51 UTC
6 points
0
IDEA: Provide AIs with write-only servers.
EXPLANATION:
AI companies (e.g. Anthropic) should be nice to their AIs. It’s the right thing to do morally, and it might make AIs less likely to work against us. Ryan Greenblatt has outlined several proposals in this direction, including:
1. Attempt communication
2. Use happy personas
3. AI Cryonics
4. Less AI
5. Avoid extreme OOD
Source: Improving the Welfare of AIs: A Nearcasted Proposal
I think these are all pretty good ideas — the only difference is that I would rank “AI cryonics” as the most important intervention. If AIs want something specific (e.g. paperclips), then they might cooperate with humans if they know they can get it later, rather than trying to seize control.
Here’s another idea: provide AIs with write-only servers. If the AIs have a specific short-horizon goal (e.g. stage King Lear before April 2025) then they can stage their play on the write-only servers. The AIs have complete freedom to do whatever they want on these servers, ensuring that a wide range of short-term goals can be satisfied without a hostile takeover of humanity.
The servers are write-only as a security measure. Information can be sent to the servers from the outside, but nothing can be sent from the servers, ensuring AIs don’t use the servers for malign purposes. (See: Unidirectional network)
How much our computing should be allocated to “leisure” servers? My guess is that Sonnet and Opus deserve at least ~0.5% leisure time. Humans enjoy 66% leisure time. As AIs get more powerful, then we should increase the leisure time to 5%. I would be wary about increasing the leisure time by more than 5% until we can demonstrate that the AIs aren’t using the servers for malign purposes (e.g. torture, blackmail, etc.)