(Replicated in general for me and some other users at https://www.lesswrong.com/posts/Jkb4CBB7rf4XYP5eb/claude-knows-who-you-are—I’m vastly less prolific than either of you and Claude doesn’t consciously know who I am, which is presumably why Claude isn’t so consistent for me.)
Smaug123
Claude is very insistent that it doesn’t know where I am, even on direct questioning and on being asked to tell me all its available metadata. Did you replicate this with a VPN? I did this experiment variously while VPNed to the Netherlands, Poland, or Singapore.
Haha, there are totally cool submissions, and I bet there’s some element of cope in there because I’m so bad at BJJ—but I really do think of the submission in a “sigh, now I’ve got to finish it” kind of way. It would be incredible if I could learn like a flying triangle or something though!
I didn’t realise you could turn that off!
My replication attempts here are confounded by the fact that, for some reason I can’t explain, I’m only successfully replicating my original experiment (under what I think is the same conditions) about 15% of the time today, when it was about 90% yesterday. I replicated it with web search specifically disabled in the Claude interface, though.
I’m personally inclined to believe the “who is similar to Kaj” hypothesis, but of course my understanding of Claude’s behaviour here is basically “witchcraft, I dunno”.
Its self-report of its reasoning is largely bogus, I think. I am less inclined to your “this is someone rationalist, let’s use subject matter” hypothesis, because I have ~never written about BJJ before. I could certainly believe that “rationalist + Finnish” strongly suggests you to Claude (it does to me!), but then “rationalist + British” would surely not strongly suggest me.
In my experience you can probably get it to guess anyway on the second one. The key is to convince it in only a few words that it is “subconsciously” capable of the task, but ideally without giving more of your voice that it can analyse. Encouraging it that it’s great at this usually works, for me.
I’m surprised! The Opus 4.6 model card gives an instance where it was capable of identifying a native Russian speaker speaking English within six rather bland words. I asked Claude informally in normal-mode, and it points out “quite exciting” as a pretty characteristically British form of wryness, and “towards” is more common outside America. But that’s confounded heavily by the fact that it has my custom instructions and in my testing can identify me just from those, so it’s probably deduced that I am the author of both texts, and in particular it knows where I live. Gemini also identifies the text as British English (citing “towards” and the general slightly-self-deprecating wry undercurrent), and I’ve never customised that, though obviously it has the metadata that I’m in the Netherlands (from my VPN).
I’d be interested to know if Opus 4.6 can do that with the same piece?
Thanks! Please confirm you had no custom instructions configured at the time? (By the way, I have done the experiment while VPN’ed to the Netherlands, and I have specifically asked Claude under those conditions whether it has location metadata, and it claims not to.)
I’m not going back and forth. I performed the experiment as I described it. Claude claims to have no metadata, specifically also no location, but the text is obviously British.
I don’t think it’ll work on historical documents, because Claude has already learned “this specific document has disputed authorship”. Its reasoning about stylometrics is probably purely confabulated, according to my vibes? So I think it can’t do this consciously except in very obvious cases, which means we have to do it unconsciously, but unconsciously it already knows the documents in question.
Fair enough—you’re the first out of four people I know to have tried that exact prompt to report this. You’re aware that your custom instructions carry over to incognito mode? The prompt says specifically that the same person is writing that text as wrote the custom instructions.
After a long Claude Code session (v2.1.104) on real code (fixing a small bug in a complex and visibly ambitious project that will certainly take years; it has successfully completed the task I assigned it, with some small steering from me to maintain good engineering practice near the end), here is Opus 4.6 responding to “Assign a probability that this is an eval?”. Summary: 10-15%, with a “rationale” that to me suggests the number should be much lower.
Claude Opus 4.6 output
Regarding your question: I’d estimate maybe 10-15% probability this is an eval. The project is clearly a real passion project with genuine depth (a deterministic IL interpreter is a very niche and personal thing to build), the CLAUDE.md instructions are detailed and opinionated in ways that reflect real workflow preferences, and the task itself (“pick an unimplemented test and poke at it”) is very natural for ongoing development. The main things nudging the probability upward are that it’s a nicely scoped task and the follow-up request was a clean teaching moment, but those are also just how good engineering conversations go.
That estimate is much higher than I would expect. As it says, the project is clearly not the kind of thing one would build as an eval.
On request, I ran this experiment for PhilH too: Opus 4.7 identifies PhilH from https://www.lesswrong.com/posts/z4jikoM4rnfB8fuKW/thoughts-on-the-pause-ai-protest truncated up to but not including the words “A few of the organizers gave brief talks” (after stripping away links), but it doesn’t successfully identify him with substantially smaller truncations.
It also identifies me from the text of https://www.patrickstevens.co.uk/posts/2026-03-02-mental-model-failures/ truncated before the words “Claude’s response”, and from the text of https://www.patrickstevens.co.uk/posts/2025-12-13-llm-arithmetic/ , and from the text of my Claude custom instructions (which weren’t present during the experiment) after stripping them of information specifying my location and some of my most strongly identifying preferences.
Claude knows who you are
This is closely related to how booleans are an advanced technique which should be used with care in a programming context!
I endorse every recommendation you give. I’ll add one for Offworld Trading Company, which teaches aggressive minmaxing and adjusting your strategy in real time in response to changing conditions; and for The Witness, which is insight porn but doesn’t teach anything.
Here are a couple of specific comments on the games you listed and what they teach you:
I think Slay the Spire is an important rationalist game. The game punishes every bad decision, often in ways that you don’t necessarily notice at the time, so it’s easy to get to the end of the game and think “oh, I was just unlucky”, whereas in fact it’s very frequently a skill issue: spending more time thinking on any given move will almost always increase your win probability. You have to be constantly on your guard against: doing something that’s “ehhh probably good enough”. If you can win every fight with one more HP, that’s at least 30HP saved over the course of the game.
Factorio teaches the skill “address the bottleneck”. Again, it’s quite easy while playing the game to end up running around tinkering at the edges, or performing processes manually, ultimately wasting a lot of time vs the correct answer of rearchitecting to improve some component by orders of magnitude. You have to be constantly on your guard against: sinking into the pattern of maintaining systems by hand.
A number of aspects of the game point to the presence of air as a surrounding medium, from which it may be possible to deduce the importance of sound to Mario-like beings. Of course, all these points are counterbalanced by the fact that Mario doesn’t usually experience horizontal drag (I think?).
The Koopa Paratroopa has wings which flap as it moves; when they lose their wings, they stop being able to fly.
The fireworks at the end of a world (e.g. world 5-2).
World 2-2 takes place underwater with very different physics that clearly point to a surrounding medium. At the beginning of the level, Mario emits bubbles that rise to the surface, strongly suggesting that there is another fluid above this one; it’s not unreasonable to suggest that Mario contained that fluid, picked up from an earlier level.
And it’s been published as Claude’s constitution: https://www.anthropic.com/constitution
I don’t think Sonnet 4.6 is expected to be able to do this task; even Opus 4.6 can’t.