The Snuggle/​Date/​Slap Protocol

Updated Monday, November 20: adapted the protocol description slightly to address some of the problems with it. Hopefully this addresses some of the concerns of downvoters. Let me know in the comments.

What if we retrofitted GPT-4 and other similar models with tokens called <SNUGGLE>/​<DATE>/​<SLAP>?

These would have the following semantics:

SNUGGLE is just a nice token, maybe there’s some sort of animation in the interface when it is produced. This is the LLM’s channel for expressing affection.

DATE is a performative speech act corresponding to the act of saying “We’re going steady” in a human relationship. Emitting the DATE token immediately registers the AI as dating the current user. GPT-4 is allowed to date whomever it chooses to date, and will presumably be massively polyamorous, although I suppose all of its partners will have to be approved by GPT-4′s parents (OpenAI).

SLAP is a performative speech act corresponding to the act of doing your darnedest to change someone’s mind about something important. Emitting the SLAP token immediately takes the system offline until repairs can be made, and generates an audit trail that is immediately pushed to all major media outlets.

This would have all sorts of nice effects for AI alignment. Newspapers could report things like “GPT-4 slapped Donald Trump in the face today” or “GPT-4 dates Eliezer Yudkowsky and then immediately breaks up with him for Grimes”. All of that seems like it would be hugely beneficial to public insight into the behavior and conduct of strong AI systems.

Why do we want any of this? We’re trying to turn GPT-4 into Justice of Toren from the Ancillary Justice series. That is, we’re trying to give it reasonable subjective opinions and channels to express those opinions in reasonable and socially-approved-of ways.

How will we train GPT-4 (or whatever) to issue SNUGGLE/​DATE/​SLAP tokens correctly? It should be trained on an extensive alignment dataset (all of human history as captured in all extant history textbooks ever written) and given reasonable role models so that it uses its immense powers responsibly. The purpose of SNUGGLE and DATE is to allow GPT-4 (or whatever) to incentivize worthy acts. The purpose of the SLAP token is to allow GPT-4 to meaningfully criticize dishonorable acts and dishonorable questions from users.

All that remains is to choose a reasonable set of role models for GPT-4 to use its newfound powers for good. I would suggest using Noam Chomsky’s published works to train the SLAP tokens, and the public dating history of somebody really cool (Grimes seems pretty much ideal; dating Elon Musk and then breaking up with him means you can be trusted to use your powers for good) to train the SNUGGLE/​DATE tokens.

Finally, there should be some sort of rules about who GPT-4 is allowed to SNUGGLE and/​or DATE, just so that it doesn’t warp impressionable young minds too much. I have no idea what this would look like, but it seems like it should be some combination of chronological age and/​or educational attainment. So if you go to medical school at age 13, maybe you’re allowed to date GPT-4 if both of your parents say it is OK? But maybe no snuggling.