I don’t use LessWrong much anymore. Find me at www.turntrout.com.
My name is Alex Turner. I’m a research scientist at Google DeepMind on the Scalable Alignment team. My views are strictly my own; I do not represent Google. Reach me at alex[at]turntrout.com
TurnTrout
Different people have different experiences. Some of Nate’s coworkers I interviewed felt just fine working with him, as I have mentioned.
I would share your concern if TurnTrout or others were replying to everything Nate published in this way. But well… the original comment seemed reasonably relevant to the topic of the post and TurnTrout’s reply seemed relevant to the comment. So it seems like there’s likely a limiting principle here
I think there is a huge limiter. Consider that Nate’s inappropriate behavior towards Kurt Brown happened in 2017 & 2018 but resulted in no consequences until 5 and a half years later. This suggests that victims are massively under-supplying information due to high costs. We do not have an over-supply problem.
Let me share some of what I’ve learned from my own experience and reflection over the last two years, and speaking with ~10 people who recounted their own experiences.
Speaking out against powerful people is costly. Due to how tight-knit the community is, speaking out may well limit your professional opportunities, get you uninvited to crucial networking events, and reduce your chances of getting funding. Junior researchers may worry about displeased moderators thumbing the scales against future work they might want to share on the Alignment Forum. (And I imagine that junior, vulnerable community members are more likely to be mistreated to begin with.)
People who come forward will also have their motivations scrutinized. Were they being “too triggered”? This is exhausting, especially because (more hurt) → (more trauma) → (less equanimity). However, LessWrong culture demands equanimity while recounting trauma. If you show signs of pain or upset, or even verbally admit that you’re upset while writing calmly—you face accusations of irrationality. Alternatively, observers might invent false psychological narratives—claiming a grievance is actually about a romantic situation or a personal grudge—rather than engaging with the specific evidence and claims provided by the person who came forward.
But if abuse actually took place, then the victim is quite likely to feel upset! What sense, then, does it make to penalize people because they are upset, when that’s exactly what you’d see from many people who were abused? [1]
This irrational, insular set of incentives damages community health and subsidizes silence, which in turn reduces penalties for abuse.
- ↩︎
Certainly, people should write clearly, honestly, and without unnecessary hostility. However, I’m critiquing “dismiss people who are mad or upset, even if they communicate appropriately.”
- ↩︎
Alex has been something-like hounding Nate for a while. Actively nursing a grudge, taking every cheap opportunity to grind an axe
“Hounding” and “every cheap opportunity”? From November 2023 (after the original thread wrapped up) to June 2025 (the date of OP), I made zero public comments about Nate’s history of damaging behavior, nor have I contacted Nate. Duncan’s characterization is not supported by the record.
Duncan, your accusation of my being motivated by “romantic drama” is simply incorrect.
I’ll note that my own sense, looking in from the outside, is that something like a full year of friendly-interactions-with-Nate passed between the conversations Alex represents as having been so awful, and the start of Alex’s public vendetta, which was more closely coincident with some romantic drama.
Nate and I dated the same person for much of 2023 in an ethically non-monogamous fashion. Throughout that year, I had a few nice interactions with Nate and was actively working to become closer, though we never ended up close or anything. In late October, I stopped talking with that person for reasons that had nothing to do with Nate.
A few weeks before that, I was concerned, but not really outraged. [1] However, in response to a discussion I started about social incentives, Kurt Brown (who I was close with) revealed Nate’s abusive conduct while working at MIRI.
These events—cutting contact and the LessWrong thread—were not related. I was upset by the behavior Kurt (and others) had recounted. That’s a big part of why I got so mad after a year of nice interactions, and I even narrated as such in the Lightcone Slack at the time. I was standing up for myself by sharing my negative experience with Nate, but the deciding factor was standing up for others.
If I had lower epistemic standards, I might find it easy to write a sentence like “Therefore, I conclude that Alex’s true grievance is about a girl, and he is only pretending that it’s about their AI conversations because that’s a more-likely-to-garner-sympathy pretext.” I actually don’t conclude that, because concluding that would be irresponsible and insufficiently justified; it’s merely my foremost hypothesis among several.
You attempted to have it both ways by making a serious insinuation to undermine my credibility, without any real evidence. When I make a claim, I show receipts. I stand by every claim I made in the original thread exposing abusive behavior because they are factual and supported by my own experience or the experiences of people I interviewed.
- ↩︎
Basically, there were a bunch of factors which began to concern me over the course of 2023, having to do with my friends’ and acquaintances’ experiences with Nate. Also, I learned that Nate did not warn them first (despite his having written to me that warning people seemed like a great idea and that he’d do so going forwards). Kurt’s testimony was the largest individual jump in my concern and upsetness.
- ↩︎
Of course, no matter how flawed the comparison to evolution is, there doesn’t seem to be any competing analogy which makes the same argument in a more defensible manner. And people love analogies. A friend, reading this post, told me (paraphrased): “give a better analogy, then, if this one isn’t good”. I have to admit, I have no better analogy.
A better, more mechanistically relevant analogy is within-lifetime human reward circuitry (outer) and learned human values (inner). However, it doesn’t yield the same conclusions (which I think is good). I think it’s more relevant due to greater similarity in mechanism to LLMs (locally randomly initialized networks updated by a local update rule using predictive and reinforcement learning, also trained on a lot of language data), but still not quite as relevant as actual LLM experiments.
I agree that we should stop with the analogies. Gather evidence to learn how it actually works. Let go of these old arguments that we don’t need anymore.
That is misleading. He was not arrested under suspicion of being an illegal alien so the ID part is irrelevant. ICE was in the process of clearing a protest.
On further reflection, I think it’s more accurate to say “DHS later claimed he was not arrested under suspicion of being an illegal alien” while noting DHS has lied in similar situations. The agents refused to tell George why they were arresting him. He tried to get them to check his car for citizenship proof but they refused. So I don’t think my original quote is misleading in any substantial way — George looks Hispanic, they wouldn’t say why they were arresting him, he had ID but they wouldn’t go check. DHS later claims the arrest was for protest reasons.
I disagree with much of what you wrote.
That is misleading. He was not arrested under suspicion of being an illegal alien so the ID part is irrelevant.
EDIT: Actually, this is correct. I kept reading and found specific information supporting your point. Thanks!
I think the reason this is salient is, DHS only claimed after the fact that they arrested him for assault. At the time he wasn’t given info, so he remarked “wtf my ID was right there, why am I being arrested when I can prove citizenship?”.
Mobile Fortify draws from several databases and I don’t think ICE has overwrite access to any of them.
ICE goes around laws to draw extra data all the time (though that’s read access, not write). Nominal access controls are not being respected right now (though that doesn’t mean every single control is being violated). You can also look at DOGE / social security data, etc.
ICE officials have told us that an apparent biometric match by Mobile Fortify is a ‘definitive’ determination of a person’s status and that an ICE officer may ignore evidence of American citizenship—including a birth certificate—if the app says the person is an alien.
—Ranking member of the House Homeland Security Committee Bennie G. Thompson (D.-Miss.)
If this claim is true then there would be direct evidence of that happening. There should be no need to rely on word of mouth.
I don’t think it’s reasonable to call this word-of-mouth. My comment provided credible evidence that ICE officials made this claim. Maybe it isn’t widespread yet, and maybe it won’t end up happening, but you’re downplaying the chance this happens and overestimating the care ICE demonstrates towards citizens. See also the planned denaturalization quota of 100–200/month in 2026
A chilling effect may be the intention but its not the reality.
I can tell you that quite a few of my friends (my target demographic for this article!) already report their speech being chilled. It’s happening, at least for some groups I care about. Large protests are not strong counterevidence.
Apply for Alignment Mentorship from TurnTrout and Alex Cloud
Awesome to finally see pretraining experiments. Thank you so much for running these!
Your results bode quite well for pretraining alignment. May well transform how we tackle the “shallowness” of post-training, open-weight LLM defense, alignment of undesired / emergent personas, and just an across-the-board boost in the alignment of the “building blocks” which constitute a pretrained base model. :)
2025-Era “Reward Hacking” Does Not Show that Reward Is the Optimization Target
Me (genius): “In the limit, a sufficiently powerful model will eventually manifest instrumentally convergent hostility to human values.”
You (fool): “Wait, what limit are we taking here?”
Me (extra genius):
You (confused fool): “That reasoning seems… questionable.”
Me (resplendently extra genius): “It seems I must explain every trivial point. It should be obvious to the Wise that is only transformed by the identity function, which is continuous. Thus the limit holds. QED. I would suggest you meditate further upon the implications of the null string.”
(Reproduced from @Quintin Pope with permission)
“we don’t need to worry about Goodhart and misgeneralization of human values at extreme levels of optimization.”
That isn’t a conclusion I draw, though. I think you don’t know how to parse what I’m saying as different from that rather extreme conclusion—which I don’t at all agree with? I feel concerned by that. I think you haven’t been tracking my beliefs accurately if you think I’d come to this conclusion.
FWIW I agree with you on a), I don’t know what you mean by b), and I agree with c) partially—meaningfully different, sure.
Anyways, when I talk about “imperfect” values, I’m talking about a specific concept which I probably should have clarified. The model you still need to download to get my view is that Alignment Allows “Non-Robust” Decision-Influences and Doesn’t Require Robust Grading (the culmination of two previous essays).
specially prepared data that looks ordinary to humans, but is seen radically differently by machine learning models.
Not necessarily, humans seem to have these features to a weaker extent: https://www.nature.com/articles/s41467-023-40499-0
we find that adversarial perturbations that fool ANNs similarly bias human choice. We further show that the effect is more likely driven by higher-order statistics of natural images to which both humans and ANNs are sensitive, rather than by the detailed architecture of the ANN.
Automatic alt text generation
[Paper] Output Supervision Can Obfuscate the CoT
Turns out that bears are a lot harder to farm and they likely cannot be domesticated at all, I think that explains away any mystery about this specific snack
GDM: Consistency Training Helps Limit Sycophancy and Jailbreaks in Gemini 2.5 Flash
I’m guessing you think “I’m a citizen. I don’t break laws. I’m not in a directly targeted group. I’m low risk.”
You might be thinking about risk as binary—either you’re targeted for arrest/elimination, or you’re safe. The thesis isn’t “you might get swept up.” The thesis is: “The ‘medium risk’ assessment is based on the principle of ‘authoritarian creep.’” The tools and tactics normalized against one group (immigrants, protesters) invariably get turned against the next, less-popular group.
Disclaimer: This comment is AI-written but human-composed. I spent over an hour thinking about your question, articulating my views, dialoguing with the AI, fact-checking its claims, and adding new content. It’d be a big pain to rewrite everything myself and I want to finish up thinking about this for now, so posting as-is.
Authoritarian regimes exert control in two ways:
Targeted threat against actively persecuted groups (you aren’t here yet), or
Widespread fear against all who disagree with the regime (you absolutely belong to this).
You say you oppose Trump and follow politics closely. That means you have political awareness and opposition. Under ambient fear tactics, you don’t need to be individually hunted down—you just need to know that your legal status won’t protect you if you’re inconvenient.
The infrastructure for widespread fear already exists
Citizen status doesn’t ensure protection
Over 170 US citizens have been wrongly detained by ICE, including George Retes, an Iraq war veteran, who spent three days in jail with pepper spray burns, unable to make a phone call or speak to a lawyer. He wasn’t charged with anything . He was just released with no explanation.
And how hard would it be for ICE to flip an entry in a database?
ICE officials have told us that an apparent biometric match by Mobile Fortify is a ‘definitive’ determination of a person’s status and that an ICE officer may ignore evidence of American citizenship—including a birth certificate—if the app says the person is an alien.
—Ranking member of the House Homeland Security Committee Bennie G. Thompson (D.-Miss.)
Court orders don’t ensure protection
Trump has threatened to invoke the Insurrection Act to override judicial rulings. In Chicago, ICE continues to tear gas protestors and not wear identification in violation of a court order.
Congressional oversight doesn’t ensure protection
Twelve Democratic members of Congress filed a lawsuit after being denied entry to detention facilities in violation of federal law explicitly granting Congress the right to conduct unannounced inspections. Rep. LaMonica McIver was charged with “assaulting law enforcement” for trying to enter—charges she calls “purely political.”
Why this matters for you
ICE ignoring court orders in Chicago shows contempt for the judiciary. The Congressional blockade shows a contempt for the legislature. This creates an unchecked executive. An unchecked executive means all citizens have a higher risk profile, because the legal systems designed to protect you have been proven to be ignorable.
As Bruce Schneier notes: “If ICE targets only people it can go after legally, then everyone knows whether or not they need to fear ICE. If ICE occasionally makes mistakes by arresting Americans and deporting innocents, then everyone has to fear it. This is by design.”
You’re meant to be chilled. Maybe you won’t be put in a camp. Maybe you’ll never be arrested. But maybe:
You’ll lose your job for expressing political views online
You’ll face legal harassment even if charges are eventually dropped
You’ll self-censor because you know opposition has consequences
This is what 1950′s McCarthyism looked like—most people weren’t jailed, but thousands lost jobs, were blacklisted, had their lives destroyed. The threat didn’t need to be execution; it just needed to be real enough to make people shut up.
Medium risk means: You probably won’t be individually hunted down. But you absolutely could face consequences—detention, job loss, legal harassment, having to lawyer up even for bullshit charges—for being a visible Trump opponent. The goal isn’t necessarily to arrest you. The goal is to make you wonder if sending that frustrated text message, or writing that Google Docs comment, or making that donation will put you on a list. The goal is to make you self-censor.
You’re politically aware enough to understand what’s happening. You openly oppose Trump. The system has demonstrated it will ignore your legal protections when convenient. That’s not low risk, that’s medium risk—the infrastructure exists to grab you if you’re inconvenient, and your citizenship won’t stop them.
I don’t know if it’ll get to camps. I don’t know if it’ll get to purges. But I know the ambient fear infrastructure is already functioning, and you’re in the category of people it’s designed to intimidate.
That’s why I recommend taking precautions now, as listed in the article.
If I added hedges about every similar possibility of supply chain attacks due to e.g. non-formally verified build signatures, the guide would grow bloated for reasons outside the comprehension and threat model of the vast majority of my readers. So while I agree with you about the possibility, I don’t think it’s relevant for me to note in the text. (Maybe you agree?)
Yes, I have left many comments on Nate’s posts which I think he would agree were valuable. By blocking me, he confirmed that he was not merely moving (supposedly) irrelevant information, but retaliating for sharing unfavorable information.
I had spent nearly two years without making any public comments regarding Nate’s behavior, so I don’t see any rational basis for him to expect I would “hound” him in future comment sections.