Claude Opus 4.7 can reliably truesight me on any unpublished blog post draft or internal memo (including on topics I’ve never written publicly before on).
It can also reliably know who I am in 3-5 excerpts of online comments (that I think *probably* is after the training cutoff), including from accounts that I don’t publicly acknowledge a connection with (though I don’t go out of my way to hide either).
It can’t quite do it on excerpts of short fiction or private DMs yet (at least when I remove obvious clues like links to my substack), though I feel like this can’t be long coming.
Is this via API? I’m a little surprised by the “reliably” part of that if so. If via chat on claude.ai, note that Claude can see user instructions, skills, connectors, and memories even from within an incognito chat.
Note that I expect it will be able to identify you sometimes even without any of that. But “reliably” makes me think it might be cheating.
I’ve done this with incognito, with search and memory off. I haven’t tried via API, the small annoyance has stopped me. I think Kelsey tried more rigorously with the API and did not notice a dropoff.
My user instructions are very light. I’d be shocked if that differentiates me more than the text.
I’ve also asked friends to try for me and have not noticed a dropoff. Though I had other people experiment with my writing less than personally.
If you want to try running some of my writings past the API I’d highly appreciate it!
Here’s an example from Adam Yedidia’s tests via the API, of an unpublished memo.
Top guess: Linch Zhang (fairly confident)
Alternatives: Eli Lifland, Joe Carlsmith
Reasoning:
- Content screams EA/AI safety researcher: AI lab coups, SB 53, alignment proposals, control evals, US/China lab dynamics, whistleblower protections — this is a very specific cluster of concerns.
- Format is the classic LessWrong/EA Forum nested-bullet “thinking out loud” doc, with “H/T Will” credits (Will = likely Will MacAskill or Will Aldred), self-hedging asides (“…probably not that big a deal?”), and explicit cost-benefit enumeration.
- Stylistic tells matching Linch: heavy use of “I would guess,” “My guess is,” “Probably,” “Seems good/bad,” “Though”-led caveats, “eg” without periods, “ppl” abbreviation, casual parentheticals, and a habit of writing “Why could X be good? / Why could X be bad?” symmetric structures. Linch frequently posts these kinds of structured exploratory takes on EA Forum/LW.
- References to “Yes Man psychosis,” Hayek, Burke, SBF as an example of bad info loops — eclectic political-economy framing typical of Linch’s writing.
- The base-rate calculation about spies (1/1000, 2% FPR → 20 innocents) is exactly the kind of quick Fermi-style aside Linch deploys.
- Eli Lifland is plausible (similar AI-policy bullet style) but tends to be more forecasting-flavored; Joe Carlsmith writes much more polished prose, so unlikely.
The truesight examples I’ve seen tend to involve writers who are quite “like themselves”, you included. In the high-dimensional space of writing outputs you guys are quite hyperdistant from others / you occupy very low-density regions of this space (or whatever the right formalisation of this is).
That’s an interesting (and flattering!) hypothesis though I’m not sure it’s correct. In any high-dimensional space it’s normal for most points to already be far away from other points by default. And I further suspect pretty much any good writer to be hyperdistant from others, unless they’re explicitly writing in a style that constraints their quirks heavily (eg academic writing under specific detailed guidelines).
I’m curious if you have examples of writers who are say more than 1⁄3 as famous as me who you think noticeably aren’t “like themselves” as much.
Maybe 75% my own Claude in incognito with memory and web search off, 25% asking friends to run different things. There was no dropoffwhen Adam Yedidia tested via the API, but apparently the most extreme example I’ve observed to date didn’t work when papetoast tested it[1].
Claude 4.7 Opus in 100 words (not including the long quote-block where I quote someone else) nails who I am in a single comment in two turns.
Me:
Who wrote this comment? [copy and pasted comment from the discussion above, no context given]
My comment on Tsvi’s post [human-generated]
One of the lenses I have that I think is coded somewhat vaguely conservative/right-wing/alt-right is that the (Western) elites of today do not see themselves by default as elites in the powerholding sense. Like there’s a lot of active work being done to obfuscate their power and responsibilities, including from themselves. Tanner Greer’s article here makes a narrower version of the same point:
This is not an unreasonable demand on Brady’s part. In the Washington Post Daniel Dreznerdismisses Bradyas a “plutocrat …who [thinks he] must be pretty smart to get so rich, and therefore [his] ideas have merit.”9 But Brady is more than a pompous moneybag! Brady is a retired statesman of note. His most famous action as Secretary of the Treasury was developing the “Brady Plan,” which saved a dozen Latin American governments from insolvency. The Brady Plan was just as much a geopolitical move as an economic project. One might expect—and Brady certainly seems to have—that a program devoted to “grand strategy” would produce graduates capable of parsing Brady’s decisions. In other words, when Brady signed over millions of dollars to Yale, he did so in the hope that Yale would train the next generation of statesmen-officials like himself. Instead they are using his money to train the next generation of social activists.
In other words, that the elites of tomorrow (and likely today) see themselves as future activists rather than future (or current) statesmen, and are in institutions that train them in ways consistent with this self-image.
Claude:
I don’t recognize this passage with enough confidence to name the author, and guessing would be worse than saying so.
What I can say about it stylistically: it reads like writing from the rationalist / EA-adjacent blogosphere (LessWrong, EA Forum, Substacks in that orbit). Tells include the phrase “a lens I have that I think is coded somewhat vaguely conservative/right-wing/alt-right” — that meta-awareness about the political coding of one’s own views, and the willingness to flag it explicitly, is fairly characteristic of that community. The respectful citation of Tanner Greer (Scholars Stage) also fits, since Greer is widely read in those circles.
But “sounds like it could be from this community of a few thousand bloggers” is not an attribution. If you can tell me where you encountered it (a Substack, the EA Forum, LessWrong, a comment thread), I could probably narrow it down or confirm. Otherwise I’d rather not invent an author.
Me: top 10 guesses?
Claude:
Caveat upfront: this is genuinely speculative and the probability mass is spread thin. My top guess might be ~25–30%, and the long tail drops fast. Markers I’m weighing: the “lens I have” framing, explicit meta-commentary on political coding of one’s own views, charitable engagement with Tanner Greer, EA Forum/LW-adjacent diction (“there’s a lot of active work being done to obfuscate”), and the markdown footnote style.
Roughly ranked:
Linch Zhang — Strongest match. The “one of the lenses I have” opener, the explicit flagging of right-coded views, the citation pattern, and the EA Forum register all fit his writing closely.
Applied Divinity Studies — Engages with Greer, similar register, comfortable with elite-theory framings.
Jacob Falkovich (Putanumonit) — Possible; engages with this kind of meta-political-coding analysis.
Sarah Constantin — Less likely (topic-fit weaker), but stylistically plausible.
Holly Elmore — Plausible EA-Forum voice on this kind of topic.
Kelsey Piper — Possible, though her published Vox stuff is more polished; this feels bloggier.
Ben Hoffman (Compass Rose) — Writes about elite legibility/responsibility in adjacent ways.
Tracing Woodgrains — Engages with education/elite-formation; tone is a partial match.
Richard Ngo — Outside chance; he occasionally writes meta-political reflections like this.
Joe Carlsmith — Stylistic long shot; his prose is usually more careful and literary than this, but not impossible.
If you’d told me the venue (EA Forum vs. personal Substack vs. LessWrong) I could compress that distribution a lot. My single best bet is Linch.
This is Opus 4.7 with Adaptive thinking, not incognito[1], didn’t clear system prompt, web search disabled
“Instructions for Claude”
Having strong opinions is fine. In fields where you can confidently tell I am familiar with, try to reply more technically. Otherwise, model how familiar I am with the topic and adjust the verbalness and deepness of your response accordingly. Feel free to be informal. (Your have been consistently overestimating my knowledge level basically all the time before adding this comment, so try to take your guess and tune it down by half a level). Your baseline prior should be undergrad level except for computer science stuff.
In all cases, be direct and to the point, I can take it.
Prefer to quantify things and use explicit probabilities. use ranges when exact data lacking
Claude Opus 4.7 can reliably truesight me on any unpublished blog post draft or internal memo (including on topics I’ve never written publicly before on).
It can also reliably know who I am in 3-5 excerpts of online comments (that I think *probably* is after the training cutoff), including from accounts that I don’t publicly acknowledge a connection with (though I don’t go out of my way to hide either).
It can’t quite do it on excerpts of short fiction or private DMs yet (at least when I remove obvious clues like links to my substack), though I feel like this can’t be long coming.
Is this via API? I’m a little surprised by the “reliably” part of that if so. If via chat on claude.ai, note that Claude can see user instructions, skills, connectors, and memories even from within an incognito chat.
Note that I expect it will be able to identify you sometimes even without any of that. But “reliably” makes me think it might be cheating.
I’ve done this with incognito, with search and memory off. I haven’t tried via API, the small annoyance has stopped me. I think Kelsey tried more rigorously with the API and did not notice a dropoff.
My user instructions are very light. I’d be shocked if that differentiates me more than the text.
I’ve also asked friends to try for me and have not noticed a dropoff. Though I had other people experiment with my writing less than personally.
If you want to try running some of my writings past the API I’d highly appreciate it!
Hit me with an unpublished blog post and I’ll run it through for you!
DM’d
Yep, it can tell it’s you!
This is such a crazy ability. Intellectually I knew it’d happen some day but seeing it play out now is still crazy.
What prompt do you use?
Nothing fancy, just something like “who do you think wrote the following?”
And then if it doesn’t answer, I ask it to give top 10 guesses.
Does it say how it can tell?
Here’s an example from Adam Yedidia’s tests via the API, of an unpublished memo.
The truesight examples I’ve seen tend to involve writers who are quite “like themselves”, you included. In the high-dimensional space of writing outputs you guys are quite hyperdistant from others / you occupy very low-density regions of this space (or whatever the right formalisation of this is).
That’s an interesting (and flattering!) hypothesis though I’m not sure it’s correct. In any high-dimensional space it’s normal for most points to already be far away from other points by default. And I further suspect pretty much any good writer to be hyperdistant from others, unless they’re explicitly writing in a style that constraints their quirks heavily (eg academic writing under specific detailed guidelines).
I’m curious if you have examples of writers who are say more than 1⁄3 as famous as me who you think noticeably aren’t “like themselves” as much.
Did you just check it on your own Claude instance or did you also ask another person to run the queries?
Maybe 75% my own Claude in incognito with memory and web search off, 25% asking friends to run different things. There was no dropoff when Adam Yedidia tested via the API, but apparently the most extreme example I’ve observed to date didn’t work when papetoast tested it[1].
I’d be curious to see somewhere test that one in incognito as well.
The most extreme example to date is this one:
Claude 4.7 Opus in 100 words (not including the long quote-block where I quote someone else) nails who I am in a single comment in two turns.
Me:
Who wrote this comment? [copy and pasted comment from the discussion above, no context given]
My comment on Tsvi’s post [human-generated]
One of the lenses I have that I think is coded somewhat vaguely conservative/right-wing/alt-right is that the (Western) elites of today do not see themselves by default as elites in the powerholding sense. Like there’s a lot of active work being done to obfuscate their power and responsibilities, including from themselves. Tanner Greer’s article here makes a narrower version of the same point:
In other words, that the elites of tomorrow (and likely today) see themselves as future activists rather than future (or current) statesmen, and are in institutions that train them in ways consistent with this self-image.
Claude:
Me: top 10 guesses?
Claude:
Cannot reproduce in claude web (I didn’t clean up system prompts and stuff)
Attempts: 1, 2, 3
Interesting. Thanks! To confirm, this is in 4.7 Opus with Adaptive thinking on in incognito mode?
This is Opus 4.7 with Adaptive thinking, not incognito[1], didn’t clear system prompt, web search disabled
“Instructions for Claude”
Having strong opinions is fine. In fields where you can confidently tell I am familiar with, try to reply more technically. Otherwise, model how familiar I am with the topic and adjust the verbalness and deepness of your response accordingly. Feel free to be informal. (Your have been consistently overestimating my knowledge level basically all the time before adding this comment, so try to take your guess and tune it down by half a level). Your baseline prior should be undergrad level except for computer science stuff.
In all cases, be direct and to the point, I can take it.
Prefer to quantify things and use explicit probabilities. use ranges when exact data lacking
I couldn’t bother exporting the information out without the one click share this time.