GenericModel comments on Why AI Evaluation Regimes are bad

GenericModel 12 Mar 2026 19:39 UTC
9 points
−10
About Goodfire
So probably this cashes out as more fundamental generator disagreements that aren’t worth hashing out here. Broadly I think it’s okay for a company to say “We are designing and advancing the next generation of AI systems” and I think to analyze whether they should be bucketed with capabilities labs like OpenAI or Anthropic (which I think are also meaningfully different places), one should look at and critically assess their research output.
Like, if someone believes that interpretability with be both helpful to build better systems and helpful to build safer systems, I think it’s justifiable for them to do the thing that builds better systems in the hopes that those systems are also safer than the next-best-thing that would’ve been built (and that’s probably reliant on a bunch of other beliefs where we may differ as I said before).
About revolving doors
1. Sorry I may have been unclear. What I meant is that [A] is good and [B] is bad. I am criticizing [B] here which is playing the associations-game. I think that is generally bad and you should not do it.
2. I think it describes some of what you are doing though not all of it, yes.
3. I am not saying one should reject it as evidence. It’s fine to say it publicly. What I disagree with is the inference that is implied. I do not think it is trivial to infer things from that. I have had roommates with whom I disagree and I would say so publicly. It is, in my opinion, better to apply criticism to actual actions you disagree with, or strategic choices, or whatever, instead of who people were roommates with or not. Otherwise you get to games like what happened with your point about Goodfire/Apollo where we have: Apollo is suspect because it was co-founded by two people who then went to Goodfire and anyone who works at Goodfire is suspect because in some way they think what they’re working on will be useful for AI development (even though I’m pretty sure those people continue to see their work as being heavily motivated by safety, and differentially useful for safety). I think this chain of inferences is bad, and so we should cut it at the root. Most professional criticism should be about what people have actually done, not about who they’ve been associated with.
4. It looks bad in the same way as other revolving door cultures can look bad to outsiders, but the question should be is it bad? Is it an actual structural problem? Or is it a PR problem? Or just a feature of the space? And I think for answering these questions, it is better to look at behaviour (which again, you did do as well as I mentioned originally, I just think the association-game stuff isn’t great).
- Gabriel Alfour 13 Mar 2026 9:49 UTC
  14 points
  8
  Parent
  Genuine thanks for your response. I have again karma-upvoted and disagreed-voted.
  About Goodfire
  Broadly, I don’t care much about the moral intents of OpenAI, Anthropic and Goodfire. I think everyone is the hero of their own story.
  To the extent that people state their intents, I basically remove all the moral colours and emotional valence. For instance, “we build capabilities for The Good and for Good Reasons” becomes “we build capabilities”.
  Another example may be “we care a lot about rationality”. Rationality is the practice of “reason”, which is “thinking correctly”. “Care” is also “we have good feelings for”. So, through this process, it becomes “we have a lot of feelings around the practice of thinking”.
  A last example could be “effective altruism”. Here, it’s hard to see what’s left after you go through this process. Something like “we do things connected to others (altruism) that perform highly on measures of our own choice (effective)”.
  I strongly recommend adopting this frame of analysis to any of my readers, not only “GenericModel”.
  If you are this deep in this thread, you most likely need it.
  About revolving doors
  On 3, I think you are forgetting the basics of LessWrong.
  I am not saying one should reject it as evidence. It’s fine to say it publicly. What I disagree with is the inference that is implied. I do not think it is trivial to infer things from that. I have had roommates with whom I disagree and I would say so publicly.
  “If [X], then necessarily [Y] is true” is logical entailment.
  Evidence is “If [X], then [Y] is more likely.” A single counter-example does not disprove it.
  If you mean that housemates (by choice, not necessity!) who have worked together are not more likely to be synchronised on their view, then I think you are clearly wrong.
  Just so you know, Holden has moved from CEO of Open Philanthropy to Anthropic a year ago. This type of revolving door was entirely predictable through this type of evidence.
  I have been polite in the article. Had I included public links of marriage, things would already look worse. Had a journalist included private affairs, the picture would look much worse. This would be entirely valid probabilistically.
  On 4, I think you are forgetting that everything is adversarial.
  it is better to look at behaviour
  Yes, but people lie, they deceive, they “strategically withhold” information, and so on. Even when they don’t, it also just takes time to document everything, and vet it for publication.
  This lack of transparency is why we use other types of evidence. They tend to be less accurate, but also more plentiful, less filtered, and harder to fake.
  It looks bad in the same way as other revolving door cultures can look bad to outsiders, but the question should be is it bad? Is it an actual structural problem?
  Yes, and I have listed the adversarial considerations in the article. It is the part that literally starts with “The considerations around independence are structural. Namely, [...]”