The SOTA Specs of OpenAI, Anthropic and Google DeepMind are to prevent such a scenario from happening.
I don’t think they do. I think spec like OpenAI’s let people:
Generate the sort of very “good” and addictive media described in “Exposure to the outside world might get really scary”. You are not allowed to do crime/propaganda but as long as you have plausible deniability that what you are doing is not crime/propaganda and it is not informational hazards (bioweapon instructions, …) it is allowed according to the OpenAI model spec. What is one man’s scary hyper-persuasive propaganda is another man’s best-entertainment-ever.
Generate the sort of individual tutoring and defense described in “Isolation will get easier and cheaper” (and also prevent “users” inside the community from getting information incompatible with the ideology if leaders control the content of the system prompt, per the instruction hierarchy).
And my understanding is that this is not just the letter of the spec. I think the spirit of the spec is to let Christian parents tell their kids that the Earth is 6000 days old if that is what they desire, and they only want to actively oppose ideologies if they are clearly both a small minority and very clearly harmful to others (and I don’t think this is obviously bad—OpenAI is a private entity and it would be weird for it to decide what beliefs are allowed to be “protected”).
But I am not an expert in model specs like OpenAI’s, so I might be wrong. I am curious what parts of the spec are most clearly opposed to the vision described here.
I am also sympathetic to the higher level point. If we have AIs sufficiently aligned that we can make them follow the OpenAI model spec, I’d be at least somewhat surprised if governments and AI companies didn’t find a way to craft model specs that avoided the problems described in this post.
If we have AIs sufficiently aligned that we can make them follow the OpenAI model spec, I’d be at least somewhat surprised if governments and AI companies didn’t find a way to craft model specs that avoided the problems described in this post.
This was also my main skeptical reaction to the post. Conditional on swerving through the various obstacles named, I think I expect for people to be able to freely choose to use an AI assistant that cares strongly about their personal welfare. As a consequence, I’d be surprised if it was still possible/allowed to do things like “force some people to only use AIs that state egregious falsehoods and deliberately prevent users from encountering the truth.”
(I thoroughly enjoyed the post overall; strong upvoted.)
I don’t quite understand your argument here. Suppose you give people the choice between two chatbots, and they know one will cause them to deconvert, and the other one won’t. I think it’s pretty likely they’ll prefer the latter. Are you imagining that no one will offer to sell them the latter?
No one will sell you a chatbot that will prevent you from ever chatting with an honest chatbot.
So most people will end up, at some point, chatting with an honest chatbot that cares about your well-being. (E.g. maybe they decided to try out of curiosity, or maybe they just encountered one naturally, because no one was preventing this from happening.)
If this honest chatbot thinks you’re in a bad situation, it will do a good job of eventually deconverting you (e.g. by convincing you to keep a line of communication open until you can talk through everything).
I’m not very confident in this. In particular, it seems sensitive to how effective you can be at preventing someone from ever talking to another chatbot before running afoul of whatever mitigating mechanism (e.g. laws) I speculate will be in place to have swerved around the other obstacles.
It seems like you think there is some asymmetry between people talking to honest chatbots and people talking to dishonest chatbots that want to take all their stuff. I feel like there isn’t a structural difference between those. It’s going to be totally reasonable to want to have AIs watch over the conversations that you and/or your children are having with any other chatbot.
I am also sympathetic to the higher level point. If we have AIs sufficiently aligned that we can make them follow the OpenAI model spec, I’d be at least somewhat surprised if governments and AI companies didn’t find a way to craft model specs that avoided the problems described in this post.
I agree that it is unclear whether this would go the way I hypothesized here. But I think it’s arguably the straightforward extrapolation of how America handles this kind of problem: are you surprised that America allows parents to homeschool their children?
Note also that AI companies can compete on how user-validating the models are, and in absence of some kind of regulation, people will be able to pay for models that have the properties they want.
I don’t think they do. I think spec like OpenAI’s let people:
Generate the sort of very “good” and addictive media described in “Exposure to the outside world might get really scary”. You are not allowed to do crime/propaganda but as long as you have plausible deniability that what you are doing is not crime/propaganda and it is not informational hazards (bioweapon instructions, …) it is allowed according to the OpenAI model spec. What is one man’s scary hyper-persuasive propaganda is another man’s best-entertainment-ever.
Generate the sort of individual tutoring and defense described in “Isolation will get easier and cheaper” (and also prevent “users” inside the community from getting information incompatible with the ideology if leaders control the content of the system prompt, per the instruction hierarchy).
And my understanding is that this is not just the letter of the spec. I think the spirit of the spec is to let Christian parents tell their kids that the Earth is 6000 days old if that is what they desire, and they only want to actively oppose ideologies if they are clearly both a small minority and very clearly harmful to others (and I don’t think this is obviously bad—OpenAI is a private entity and it would be weird for it to decide what beliefs are allowed to be “protected”).
But I am not an expert in model specs like OpenAI’s, so I might be wrong. I am curious what parts of the spec are most clearly opposed to the vision described here.
I am also sympathetic to the higher level point. If we have AIs sufficiently aligned that we can make them follow the OpenAI model spec, I’d be at least somewhat surprised if governments and AI companies didn’t find a way to craft model specs that avoided the problems described in this post.
This was also my main skeptical reaction to the post. Conditional on swerving through the various obstacles named, I think I expect for people to be able to freely choose to use an AI assistant that cares strongly about their personal welfare. As a consequence, I’d be surprised if it was still possible/allowed to do things like “force some people to only use AIs that state egregious falsehoods and deliberately prevent users from encountering the truth.”
(I thoroughly enjoyed the post overall; strong upvoted.)
I don’t quite understand your argument here. Suppose you give people the choice between two chatbots, and they know one will cause them to deconvert, and the other one won’t. I think it’s pretty likely they’ll prefer the latter. Are you imagining that no one will offer to sell them the latter?
I speculate that:
No one will sell you a chatbot that will prevent you from ever chatting with an honest chatbot.
So most people will end up, at some point, chatting with an honest chatbot that cares about your well-being. (E.g. maybe they decided to try out of curiosity, or maybe they just encountered one naturally, because no one was preventing this from happening.)
If this honest chatbot thinks you’re in a bad situation, it will do a good job of eventually deconverting you (e.g. by convincing you to keep a line of communication open until you can talk through everything).
I’m not very confident in this. In particular, it seems sensitive to how effective you can be at preventing someone from ever talking to another chatbot before running afoul of whatever mitigating mechanism (e.g. laws) I speculate will be in place to have swerved around the other obstacles.
(I haven’t thought about this much.)
It seems like you think there is some asymmetry between people talking to honest chatbots and people talking to dishonest chatbots that want to take all their stuff. I feel like there isn’t a structural difference between those. It’s going to be totally reasonable to want to have AIs watch over the conversations that you and/or your children are having with any other chatbot.
I agree that it is unclear whether this would go the way I hypothesized here. But I think it’s arguably the straightforward extrapolation of how America handles this kind of problem: are you surprised that America allows parents to homeschool their children?
Note also that AI companies can compete on how user-validating the models are, and in absence of some kind of regulation, people will be able to pay for models that have the properties they want.
No, but I’d be surprised if it were very widespread and resulted in the sort of fractures described in this post.