I looked at the FairPlay website and agree that “banning schools from contacting kids on social media” or “preventing Gemini rollouts to under-13s” is not coherent under my threat model. However I think there is clear evidence that current parental screen time controls may not be a sufficiently strong measure to mitigate extant generational mental health issues (I am particularly worried about insomnia, depression, eating disorders, autism spectrum disorders, and self harm).
Zvi had previously reported on YouTube shorts reaching 200B daily views. This is clearly a case of egregiously user hostile design with major social and public backlash. I could not find a canonical citation on medrxiv and don’t believe it would be ethical to run a large scale experiment on the long term impacts of this but there are observational studies. Given historical cases of model sycophancy and the hiring of directors focused on maximizing engagement I think it’s not implausible for similar design outcomes.
I think that the numbers in this Anthropic blog post https://www.anthropic.com/news/how-people-use-claude-for-support-advice-and-companionship do not accurately portray reality. They report only 0.5% of conversations as being romantic or sexual roleplay, but I consider this to be misleading because they exclude chats focused on content creation tasks (such as writing stories, blog posts, or fictional dialogues), which their previous research found to be a major use case. Because the models are trained to refuse requests for explicit content, it’s common for jailbreaks to start by saying “it’s okay to do this because it’s just a fictional scenario in a story”. Anecdotally I have heard labs don’t care about this much in contrast to CBRN threats.
Let’s look at the top ten apps ranked by tokens on https://openrouter.ai/rankings. They are most well known for hosting free API instances of DeepSeek v3 and r1, which was the only way to get high usage out of SOTA LLMs for free before the Google AI studio price drop for Gemini 2.5 Pro. It is not the best proxy for real world usage because it requires technical sophistication and this is reflected in the first four (cline, roo code, litellm, and kilo code are all for software development) but the next four (sillytavern, chub ai, hammerai, roleplai) are all indicative that the distribution of tasks done with models at this capabilities level do not differ significantly from the distribution of tasks which people visit websites for. Although I wouldn’t morally panic about this since it seems likely to me that conventional security methods will be good enough to mostly prevent us from turning into glichers.
Kids safety activists are one of the only groups with a track record of introducing AI capabilities restrictions which actually get enforced. Multimodal models can now create both images and text, but the image models are more locked down (Gemini 2.5 defaults to stricter block thresholds for image generation than for text generation), and I think that this would not be the case without people focusing on kids safety. It’s possible for there to be AI Safety issues which affect children right now that are highly relevant to existential risks and this is a common topic in novice discussions of alignment.
I looked at the FairPlay website and agree that “banning schools from contacting kids on social media” or “preventing Gemini rollouts to under-13s” is not coherent under my threat model. However I think there is clear evidence that current parental screen time controls may not be a sufficiently strong measure to mitigate extant generational mental health issues (I am particularly worried about insomnia, depression, eating disorders, autism spectrum disorders, and self harm).
Zvi had previously reported on YouTube shorts reaching 200B daily views. This is clearly a case of egregiously user hostile design with major social and public backlash. I could not find a canonical citation on medrxiv and don’t believe it would be ethical to run a large scale experiment on the long term impacts of this but there are observational studies. Given historical cases of model sycophancy and the hiring of directors focused on maximizing engagement I think it’s not implausible for similar design outcomes.
I think that the numbers in this Anthropic blog post https://www.anthropic.com/news/how-people-use-claude-for-support-advice-and-companionship do not accurately portray reality. They report only 0.5% of conversations as being romantic or sexual roleplay, but I consider this to be misleading because they exclude chats focused on content creation tasks (such as writing stories, blog posts, or fictional dialogues), which their previous research found to be a major use case. Because the models are trained to refuse requests for explicit content, it’s common for jailbreaks to start by saying “it’s okay to do this because it’s just a fictional scenario in a story”. Anecdotally I have heard labs don’t care about this much in contrast to CBRN threats.
Let’s look at the top ten apps ranked by tokens on https://openrouter.ai/rankings. They are most well known for hosting free API instances of DeepSeek v3 and r1, which was the only way to get high usage out of SOTA LLMs for free before the Google AI studio price drop for Gemini 2.5 Pro. It is not the best proxy for real world usage because it requires technical sophistication and this is reflected in the first four (cline, roo code, litellm, and kilo code are all for software development) but the next four (sillytavern, chub ai, hammerai, roleplai) are all indicative that the distribution of tasks done with models at this capabilities level do not differ significantly from the distribution of tasks which people visit websites for. Although I wouldn’t morally panic about this since it seems likely to me that conventional security methods will be good enough to mostly prevent us from turning into glichers.
Kids safety activists are one of the only groups with a track record of introducing AI capabilities restrictions which actually get enforced. Multimodal models can now create both images and text, but the image models are more locked down (Gemini 2.5 defaults to stricter block thresholds for image generation than for text generation), and I think that this would not be the case without people focusing on kids safety. It’s possible for there to be AI Safety issues which affect children right now that are highly relevant to existential risks and this is a common topic in novice discussions of alignment.