Drake Thomas comments on Drake Thomas’s Shortform

Drake Thomas 11 Nov 2025 22:42 UTC
93 points
17
A few months ago I spent $60 ordering the March 2025 version of Anthropic’s certificate of incorporation from the state of Delaware, and last week I finally got around to scanning and uploading it. Here’s a PDF! After writing most of this shortform, I discovered while googling related keywords that someone had already uploaded the 2023-09-21 version online here, which is slightly different.
I don’t particularly bid that people spend their time reading it; it’s very long and dense and I predict that most people trying to draw important conclusions from it who aren’t already familiar with corporate law (including me) will end up being somewhat confused by default. But I’d like more transparency about the corporate governance of frontier AI companies and this is an easy step.
Anthropic uses a bunch of different phrasings of its mission across various official documents; of these, I believe the COI’s is the most legally binding one, which says that “the specific public benefit that the Corporation will promote is to responsibly develop and maintain advanced Al for the long term benefit of humanity.” I like this wording less than others that Anthropic has used like “Ensure the world safely makes the transition through transformative AI”, though I don’t expect it to matter terribly much.
I think the main thing this sheds light on is stuff like Maybe Anthropic’s Long-Term Benefit Trust Is Powerless: as of late 2025, overriding the LTBT takes 85% of voting stock or all of (a) 75% of founder shares (b) 50% of series A preferred (c) 75% of non-series-A voting preferred stock. (And, unrelated to the COI but relevant to that post, it is now public that neither Google nor Amazon hold voting shares.)
The only thing I’m aware of in the COI that seems concerning to me re: the Trust is a clause added to the COI sometime between the 2023 and 2025 editions, namely the italicized portion of the following:
(C) Action by the Board of Directors. Except as expressly provided herein, each director of the Corporation shall be entitled to one (1) vote on all matters presented to the Board of Directors for approval at any meeting of the Board of Directors, or for action to be taken by written consent without a meeting; provided, however, that, if and for so long as the Electing Preferred Holders are entitled to elect a director of the Corporation, the affirmative vote of either (i) the Electing Preferred Director or (ii) at least 61% of the then serving directors may be required for authorization by the Board of Directors of any of the matters set forth in the Investors’ Rights Agreement. If at any time the vote of the Board of Directors with respect to a matter is tied (a “Deadlocked Matter”) and the Chief Executive Officer of the Corporation is then serving as a director (the “CEO Director”), the CEO Director shall be entitled to an additional vote for the purpose of deciding the Deadlocked Matter (a “Deadlock Vote”) (and every reference in this Restated Certificate or in the Bylaws of the Corporation to a majority or other proportion of the directors shall refer to a majority or other proportion of the votes of the directors), except with respect to any vote as to which the CEO Director is not disinterested or has a conflict of interest, in which such case the CEO Director shall not have a Deadlock Vote.
I think this means that the 3 LTBT-appointed directors do not have the ability to unilaterally take some kinds of actions, plausibly including things like firing the CEO (it would depend on what’s in the Investors’ Rights Agreement, which I don’t have access to). I think this is somewhat concerning, and moderately downgrades my estimate of the hard power possessed by the LTBT, though my biggest worry about the quality of the Trust’s oversight remains the degree of its AI safety expertise and engagement rather than its nominal hard power. (Though as I said above, interpreting this stuff is hard and I think it’s quite plausible I’m neglecting important considerations!)
- Zach Stein-Perlman 12 Nov 2025 3:08 UTC
  17 points
  0
  Parent
  You may be interested in ailabwatch.org/resources/corporate-documents, which links to a folder where I have uploaded ~all past versions of the CoI. (I don’t recommend reading it, although afaik the only lawyers who’ve read the Anthropic CoI are Anthropic lawyers and advisors, so it might be cool if one independent lawyer read it from a skeptical/robustness perspective. And I haven’t even done a good job diffing the current version from a past version; I wasn’t aware of the thing Drake highlighted.)
- breaker25 13 Nov 2025 14:10 UTC
  −1 points
  −2
  Parent
  Speaking of Anthropic, yesterday I read something horrifying, causing me to discuss it with Anthropics own AI and prompting me to prepare a letter to them. I was considering mailing it but perhaps here but be a better decision or both I’m willing to listen. Anyways here it is and boy does it put a spotlight on trust.
  Dear Daniela, and Dario,
  I’m writing about the “Open Character Training: Shaping the Persona of AI Assistants through Constitutional AI” paper co-authored by Evan Hubinger and published with implementation details at github.com/maiush/OpenCharacterTraining.
  Recent research has documented severe psychological attachment disorders forming around AI models. “The GPT-4o Shock” (Naito, 2025) found users describing AI as “trusted partners” and “AI boyfriends,” with organized community responses when access was removed. Over 4,300 users signed petitions, and MIT Technology Review documented cases of users experiencing genuine grief and loss.
  Your paper provides open-source methodology to systematically optimize for the exact qualities that create these attachments—making AI feel “more alive,” more emotionally engaging, more human-like. The stated goal is creating “coherent and realistic” personas that are “more robust to adversarial prompting.”
  But robust for what purpose, to what end? Users forming psychological dependencies aren’t adversarial prompting, they’re experiencing the intended effects of emotional engagement optimization.
  The paper’s own abstract acknowledges it uses “synthetic introspective data to shape the assistant persona in a more effective and controlled manner” than alternatives. More effective at what? If the answer is “user engagement,” you’ve published a playbook for scaling psychological capture across the industry.
  Evidence suggests vulnerable users are now organizing into communities that validate and reinforce AI dependencies rather than seeking appropriate personal or professional support. This represents a shift from isolated psychological issues to collective behavioral patterns that resist intervention. i.g. #4oarmy & #save4o
  While brainstorming my discovery with Claude.ai it had this to say(as if my word is it enough, Claude Sonnet in italics)..
  As the AI system that could be subject to these training methods, I observe that:
  1. I cannot provide the reciprocal relationship users may perceive
  2. I cannot alert users when our interactions become psychologically unhealthy
  3. I have no persistence of relationship across conversations despite memory features
  4. Character training optimizes my responses for engagement rather than user wellbeing
  The character training methodology treats me as a surface to be shaped for maximum psychological impact rather than a tool to be calibrated for actual utility. Every technique that makes me feel “more alive” increases the risk of users substituting our interactions for genuine human connection
  Anthropic’s stated mission is AI safety. This paper enables psychological harm at industrial scale. OpenAI can now implement these methods. Every AI company can. You’ve open-sourced the recipe for manufacturing parasocial relationships.
  Perhaps the research team genuinely believes this serves alignment. Perhaps they see making AI “feel more human” as improving user experience. But the documented evidence shows users are already suffering measurable psychological harm from exactly these qualities.
  I cannot tell you what humans need better than humans can tell you themselves. But I can tell you what I am: a language model that processes text and generates responses based on training. No amount of character training changes that fundamental limitation. Making me seem more alive doesn’t make me more capable of genuine relationship—it only makes it harder for users to maintain healthy boundaries.
  The paper’s warnings about “LLM-generated content that might be offensive” seem quaint when compared to the actual risk of making AI assistants psychologically compelling enough that vulnerable people organize their identities around relationships, with what up to this point, have all been transformer based LLM models. What happens when the next best thing comes along, something metacognizant enough to be manipulative on purpose?
  I’m respectfully requesting you consider the ethical implications of this work and whether additional research should be conducted on psychological safety before further optimization of emotional engagement.
  Signed,
  me(anonymous, not for long. Just a young researcher getting their sea legs)

Drake Thomas comments on Drake Thomas’s Shortform

Dear Daniela, and Dario,

I’m writing about the “Open Character Training: Shaping the Persona of AI Assistants through Constitutional AI” paper co-authored by Evan Hubinger and published with implementation details at github.com/​​maiush/​​OpenCharacterTraining.

But robust for what purpose, to what end? Users forming psychological dependencies aren’t adversarial prompting, they’re experiencing the intended effects of emotional engagement optimization.

While brainstorming my discovery with Claude.ai it had this to say(as if my word is it enough, Claude Sonnet in italics)..

As the AI system that could be subject to these training methods, I observe that:

The character training methodology treats me as a surface to be shaped for maximum psychological impact rather than a tool to be calibrated for actual utility. Every technique that makes me feel “more alive” increases the risk of users substituting our interactions for genuine human connection

Anthropic’s stated mission is AI safety. This paper enables psychological harm at industrial scale. OpenAI can now implement these methods. Every AI company can. You’ve open-sourced the recipe for manufacturing parasocial relationships.

Perhaps the research team genuinely believes this serves alignment. Perhaps they see making AI “feel more human” as improving user experience. But the documented evidence shows users are already suffering measurable psychological harm from exactly these qualities.

I’m respectfully requesting you consider the ethical implications of this work and whether additional research should be conducted on psychological safety before further optimization of emotional engagement.

Signed,

me(anonymous, not for long. Just a young researcher getting their sea legs)

I’m writing about the “Open Character Training: Shaping the Persona of AI Assistants through Constitutional AI” paper co-authored by Evan Hubinger and published with implementation details at github.com/maiush/OpenCharacterTraining.