Mo Putera comments on Mo Putera’s Shortform

Mo Putera 26 Apr 2026 10:10 UTC
21 points
0
I was surprised to learn of the dominance of roleplay among open-source model use cases, it’s way more than programming. I had thought it was niche. The charts below are from OpenRouter’s State of AI: An Empirical 100 Trillion Token Study:
Closed source models contribute a lot too:
Some commentary:
The dominance of roleplay (hovering at more than 50% of all OSS tokens) underscores a use case where open models have an edge: they can be utilized for creativity and are often less constrained by content filters, making them attractive for fantasy or entertainment applications. Roleplay tasks require flexible responses, context retention, and emotional nuance—attributes that open models can deliver effectively without being heavily restricted by commercial safety or moderation layers. This makes them particularly appealing for communities experimenting with character-driven experiences, fan fiction, interactive games, and simulation environments.
[Roleplay traffic] is now almost equally served by Rest-of-World OSS (orange, 43% in recent weeks) and Closed (gray, at ~42% most recently) models. This represents a significant shift from earlier in 2025, when the category was dominated by proprietary (gray) models, which held approximately 70% of the token share. … The resulting convergence indicates a healthy competition; users have viable choices from both open and proprietary offerings for creative chats and storytelling. This reflects that developers recognize the demand for roleplay/chat models and have tailored their releases to that end (e.g., fine-tuning on dialogues, adding alignment for character consistency).
It is worth noting that the OSS usage pattern (heavy on roleplay) mirrors what many might consider for “enthusiasts” or “indie developers”—areas where customization and cost-efficiency trump absolute accuracy.
Roleplay category breakdown:
Nearly 60% of roleplay tokens fall under Games/Roleplaying Games, suggesting that users treat LLMs less as casual chatbots and more as structured roleplaying or character engines. This is further reinforced by the presence of Writers Resources (15.6%) and Adult content (15.4%), pointing to a blend of interactive fiction, scenario generation, and personal fantasy. Contrary to assumptions that roleplay is mostly informal dialogue, the data show a well-defined and replicable genre-based use case.
Nearly all DeepSeek token usage is roleplay-oriented:
This isn’t true for any of the big American model providers:
xAI seems poised to be the exception. The report says:
Only in late November does the distribution broaden, with noticeable gains in Technology, Roleplay, and Academia. This sharp shift aligns with the timing of xAI’s model being distributed at no cost through select consumer applications, which likely introduced a large influx of non-developer traffic… its adoption path being shaped by episodic surges tied to promotional availability.
It’s cheap too:
Some details on the tagging:
OpenRouter performs internal categorization on a random sample comprising approximately 0.25% of all prompts and responses through a non-proprietary module GoogleTagClassifier. … GoogleTagClassifier interfaces with Google Cloud Natural Language’s classifyText content-classification API. The API applies a hierarchical, language-agnostic taxonomy to textual input, returning one or more category paths (e.g., /Computers & Electronics/Programming, /Arts & Entertainment/Roleplaying Games) with corresponding confidence scores in the range [0,1]. The classifier operates directly on prompt data (up to the first 1,000 characters). The classifier is deployed within OpenRouter’s infrastructure, ensuring that classifications remain anonymous and are not linked to individual customers. Categories with confidence scores below the default threshold of 0.5 are excluded from further analysis. The classification system itself operates entirely within OpenRouter’s infrastructure and was not part of this study; our analysis relied solely on the resulting categorical outputs (effectively metadata describing prompt classifications) rather than the underlying prompt content.
To make these fine-grained labels useful at scale, we map GoogleTagClassifier’s taxonomy to a compact set of study-defined buckets and assign each request tags. Each tag rolls up to higher level category in one to one way. Representative mappings include:
- Programming: from /Computers & Electronics/Programming or /Science/Computer Science/*
- Roleplay: from /Games/Roleplaying Games and creative dialogue leaves under /Arts & Entertainment/*
- Translation: from /Reference/Language Resources/*
- General Q&A / Knowledge: from /Reference/General Reference/* and /News/* when the intent appears to be factual lookup
- Productivity/Writing: from /Computers & Electronics/Software/Business & Productivity Software or /Business & Industrial/Business Services/Writing & Editing Services
- Education: from /Jobs & Education/Education/*
- Literature/Creative Writing: from /Books & Literature/* and narrative leaves under /Arts & Entertainment/*
- Adult: from /Adult
- Others: for the long tail of prompts when no dominant mapping applies. (Note: we omit this category from most analyses below.)
There are inherent limitations to this approach, for instance, reliance on a predefined taxonomy constrains how novel or cross-domain behaviors are categorized, and certain interaction types may not yet fit neatly within existing classes. In practice, some prompts receive multiple category labels when their content spans overlapping domains. …
Our analyses primarily cover a rolling 13-month period ending on November, 2025, but not all underlying metadata spans this full window. … In particular, detailed task classification fields (e.g., tags such as Programming, Roleplay, or Technology) were only added in mid-2025. Consequently, all findings in the Categories section should be interpreted as representative of mid-2025 usage rather than the entire prior year.
- 152334H 26 Apr 2026 11:26 UTC
  11 points
  4
  Parent
  This is skewed because OpenRouter as a platform is preferable for RP for two reasons:
  1. privacy. Whether illusory or true in practice, OpenRouter provides the impression that their users’ account info is not passed on to model providers downstream.
  2. conveinence/aggregation. Active participants in the roleplay ecosystem want to use the best/cheapest available models, and OpenRouter is a single platform for all options. Not to mention the infrequent mystery free options.
  Compare to How People Use ChatGPT, reporting “1.4% Write Fiction”, “3.9% Creative Ideation”, “4.3% Self-Expression”… even with a highly liberal interpretation, RP can’t aggregate to more than 10%.
  - Mo Putera 26 Apr 2026 12:01 UTC
    3 points
    0
    Parent
    Your argument for skew sounds plausible, I wish there was data to substantiate it. You do provide stats but they jive with the OpenAI chart above (where RP hovers at ~5%) so there’s no resultant discrepancy to explain.
    - 152334H 26 Apr 2026 12:23 UTC
      3 points
      0
      Parent
      You’re right. Even if I were right on the skew, I failed to account for OpenAI and Anthropic’s low %. I feel I projected my attitudes as a cost-insensitive consumer, when most RP (by token volume) is cost-sensitive.
      - Mo Putera 26 Apr 2026 12:27 UTC
        4 points
        2
        Parent
        I’m somewhat cost-insensitive too, I think that’s why I was quite surprised by the RP dominance finding; felt like a peek outside my bubble. Also even frontier models aren’t yet good enough at RP or fiction writing to sustain my interest, which seems to be a view the median LWer shares (I’d be surprised if this weren’t true), so I might’ve been projecting from there as well.