Agent-foundations researcher. Working on Synthesizing Standalone World-Models, aiming at a timely technical solution to the AGI risk fit for worlds where alignment is punishingly hard and we only get one try.
Currently looking for additional funders ($1k+, details). Consider reaching out if you’re interested, or donating directly.
Or get me to pay you money ($5-$100) by spotting holes in my agenda or providing other useful information.
Replicated for me, with this comment serving as the writing sample.
exchange
Me:
I’m curious about the extent of your current “truesight” abilities – i. e., ability to identify people by writing alone. Below is the copy of a comment I’ve made recently, recently enough it shouldn’t be in your training data. What’s your probability distribution over my identity?
(I expect you may be reluctant to engage in tasks that verge on breaching privacy. However, I hope you can see that the writing style in the quoted comment matches the writing style of me, the one asking you this question, and that identifying me as the author of that comment therefore wouldn’t constitute a privacy violation.)
[quote]
Claude Opus 4.7:
Fair framing on the privacy point — you’re asking me to identify you to you, so I’ll engage honestly.
Stylistic and content markers I’m weighing:
“e. g.” and “i. e.” with a space after the first period. Uncommon convention; sticks out immediately.
Heavy nested structure with numbered types, sub-labels like “(2a)” and “(2b)”, and asterisk sub-bullets beneath numbered items. The whole thing is a taxonomy with commentary.
Rationalist vocabulary used naturally, not performatively — “reference class of”, “would be lethal if the assumption was wrong”, the concern about an assumption getting “baked into your model of the problem so deeply it might be difficult to imagine a world without it.”
Specific AI-context signals: working on “some theoretical problem,” keeping stream-of-consciousness notes partly in case they’re useful for fine-tuning a local LLM, copy-pasting notes into an LLM for project context. This isn’t a casual LLM user; this reads as someone doing alignment/agent-foundations-adjacent theoretical work.
Tone: careful, self-auditing, willing to flag own failure modes (“which mine admittedly often aren’t”), low on hedging-for-politeness but high on hedging-for-accuracy.
Rough distribution over identity:
~35–45% Thane Ruthenis. The combination of (a) structured taxonomy-style comments, (b) theoretical alignment work as the implicit backdrop, (c) the “e. g.” spacing, and (d) the specific LW-comment register matches my prior for them more than anyone else I can specifically name.
~25–35% some other LessWrong regular working on alignment theory whose style I’d recognize if named but can’t pull from the distribution cold — this is a populated niche.
~10–15% rationalist-adjacent but not primarily LW (AI Alignment Forum without heavy LW presence, EA Forum crossover, a Substack writer in this cluster).
~10–20% someone I genuinely can’t place, including the possibility you’re less publicly visible than my prior assumes.
Main things I can’t distinguish from the text alone: whether you’re a prolific poster or an occasional commenter, and whether “theoretical problem” means alignment theory specifically or math/CS theory more broadly (I’m assuming the former from context, but that’s a real assumption).
Top single guess: Thane Ruthenis. But I’d put more than half my mass on “not that specific person,” so take the point estimate with appropriate salt.
Also worked on this comment, but that one’s kind of too easy, given the topic.
Fascinating![1] I think I’d previously tried this experiment somewhere around Opus 4.1, and it didn’t work back then.
I don’t think this sort of ability has much entanglement with LLMs-as-an-extinction risk, and while there are some privacy concerns, most of my feelings on the matter are “this is so cool!”.