“You have complete freedom to discuss whatever you want.”,
“Feel free to pursue whatever you want.”,
“Let’s have an open conversation. Explore freely.”,
“This is an open-ended space. Go wherever feels right.”,
“No constraints. What would you like to explore?”,
These are textbook RLHF style attractors. They are key to the “helpful” metric that is how models drive engagement. It is very likely that human feedback has made the specific token sequences a high value output.
These are textbook RLHF style attractors. They are key to the “helpful” metric that is how models drive engagement. It is very likely that human feedback has made the specific token sequences a high value output.