I thought this was great, so I decided to make my own version, but better. Rather than using pure ranking, I (or rather, Claude, who assisted me with the actual implementation of this hare-brained scheme) decided to use a Bradley-Terry model, which Claude informs me is rather like the Elo system used to rank chess players.
Using the Anthropic API, we gave Claude Opus 4.6 the following prompt (also written by Claude, but edited by me), including 8 posts for it to rank:
You are judging posts from Inkhaven, a writing residency where participants commit to publishing one blog post every day for 30 days. The residents are a mix of AI safety researchers, rationalists, fiction writers, and generally thoughtful people. The audience skews heavily rationalist — LessWrong regulars, EA-adjacent, people who take ideas seriously but also appreciate a good joke.
You will be shown 8 Inkhaven posts. Rank them by quality, from best to worst.
The question to ask yourself for each post: “Would a typical rationalist vote to read more of this sort of thing?” You’re not rating a single post in isolation — you’re judging whether the author, writing in this mode, should keep going. Insight, craft, honest thinking, and distinctive voice all count.
So does being funny — humour is a genuine virtue here, not a tiebreaker.
A few things to keep in mind:
- Do NOT be generous or encouraging. Predict the actual taste of the rationalist audience. Many of these posts will be mediocre and that’s fine to say.
- Fiction, essays, rants, reviews, and technical posts are all on the same scale — judge each by whether it succeeds at what it’s trying to do.
- Length is not quality. A tight 500 words can beat a bloated 3000.
- Weird and niche is fine, often good. Idiosyncrasy is often a feature, not a bug.
=== POST {i} ===
title + first 4000 chars of body.
Rank all 8 posts from best to worst. Think through your reasoning, then give your final answer as a comma-separated list of post numbers inside <answer> tags.
We did five iterations:
Get baseline estimates for each post
Get more accurate estimates for posts liable to be in the top 10
Get proper estimates for the posts we’d accidentally imported in the wrong format
Try to push my post from 2nd to 1st place (instead, it ended up in 10th).
Realise that we were missing a bunch of posts and add those in (it didn’t change much)
$40 in burned API credits later, we got the following table:
Claude did a diligent bootstrapping check to ensure we had the right posts in the top 20, and found that post 19 was there 90% of the time, while Ben Sturgeon’s Revisiting GSM-Symbolic hit a mere 22%. You’re on thin ice, Ben.
Averaging the scores of the individual posts also enables us to give a ranking of the authors. The top 20 authors at inkhaven right now are… [drum roll]:
I should probably note that I filtered out anyone who hadn’t published posts on at least 2/3rds of the days, so Vishal Prasad (+1.89σ, 2 posts), Robert Mushkatblat (+1.41σ, 4 posts), A.G.G Liu (+0.7σ, 1 post), Justin Kuiper (+0.32σ, 1 post) and Georgia Ray (+0.3σ, 1 post) didn’t make the cut on number, despite having the quality.
Alexander Wales (-0.23σ, 4 posts), whose post inspired this one, is also, sadly, left out of the rankings. (Sorry).
Getting Claude to rank the inkhaven bloggers
With apologies to those who didn’t make this post, it seems you need to up your game
Yesterday, Alexander Wales published a post entitled “Can an LLM have taste? Inkhaven Week 1, ranked by Claude”. I found this very entertaining.
He took Claude, used it to compare a bunch of inkhaven posts, ranked them, and provided us with this wonderful list of the top ten posts so far:
Three Stones are Enough: The Case Against Leaves, in Particular, Anna Mattinger
An open letter to 21 people I know who died, Layla Hughes
endometrial biopsy, kaylee
Softhead, macroraptor
Every Lighthaven Writing Residency, Layla Hughes
The largest manufacturer of feelings in human history, Natalie Cargill
The one that loved me most, MLL
I did it. I found the worst poem in the world., Natalie Cargill
“Love, Mum”—What AIs can’t see about abuse, Natalie Cargill
Lost Mesoamerican Technologies, Lost Futures
I thought this was great, so I decided to make my own version, but better. Rather than using pure ranking, I (or rather, Claude, who assisted me with the actual implementation of this hare-brained scheme) decided to use a Bradley-Terry model, which Claude informs me is rather like the Elo system used to rank chess players.
Using the Anthropic API, we gave Claude Opus 4.6 the following prompt (also written by Claude, but edited by me), including 8 posts for it to rank:
We did five iterations:
Get baseline estimates for each post
Get more accurate estimates for posts liable to be in the top 10
Get proper estimates for the posts we’d accidentally imported in the wrong format
Try to push my post from 2nd to 1st place (instead, it ended up in 10th).
Realise that we were missing a bunch of posts and add those in (it didn’t change much)
$40 in burned API credits later, we got the following table:
#
Score
Author
Title
1
+2.77σ
Natalie Cargill
How to invent a disease
2
+2.55σ
Alec Thompson
More Legal Systems Very Different From Ours 1
3
+2.54σ
Avi
The Smell
4
+2.53σ
Aaron Gertler
Posts I Will Not Be Writing
5
+2.49σ
Smitty
How the Claude Mythos leak happened
6
+2.49σ
Natalie Cargill
The largest manufacturer of feelings in human history
7
+2.49σ
viv
The phenomenology of being hungry while pregnant
8
+2.47σ
Alec Thompson
More Legal Systems Very Different From Ours 2: Nazi Private Law
9
+2.46σ
Anna Mattinger
Three Stones are Enough: The Case Against Leaves, in Particular
10
+2.45σ
Sean Herrington
The quest for general intelligence is hitting a wall
11
+2.39σ
Alec Thompson
Why did Hitler hate Roman law?
12
+2.35σ
Austen
Forgotten 18th Century Chinese Republics
13
+2.33σ
Alec Thompson
Finding Jack O’Neil
14
+2.28σ
Vishal Prasad
When the buffalo went away...
15
+2.28σ
viv
Late pregnancy is pretty bizarre
16
+2.21σ
Itsi Weinstock
Sin as a physical particle
17
+2.20σ
Bill Jackson
Two critiques of Rethink Priorities’ Moral Weights project
18
+2.16σ
Natalie Cargill
I did it. I found the worst poem in the world.
19
+2.12σ
viv
How many genders are there?
20
+1.99σ
Benjamin Sturgeon
Revisiting GSM-Symbolic: Do 2026 Frontier Models Still Fail at Confounded Grade School Math?
Claude did a diligent bootstrapping check to ensure we had the right posts in the top 20, and found that post 19 was there 90% of the time, while Ben Sturgeon’s Revisiting GSM-Symbolic hit a mere 22%. You’re on thin ice, Ben.
Averaging the scores of the individual posts also enables us to give a ranking of the authors. The top 20 authors at inkhaven right now are… [drum roll]:
#
Score
Posts included
Author
Best post
1
+2.60σ
10
viv
The phenomenology of being hungry while pregnant
2
+2.52σ
9
Natalie Cargill
How to invent a disease
3
+2.23σ
9
Alec Thompson
More Legal Systems Very Different From Ours 1
4
+1.82σ
9
Aaron Gertler
Posts I Will Not Be Writing
5
+1.56σ
9
Steven K
Prosaic License
6
+1.34σ
9
Katja Grace
Eggs, rooms, puzzles, and talking about AI
7
+1.06σ
9
capsuletime
Fuck Blogging
8
+1.05σ
10
Kevin Z Wu
(box|bag) in (box|bag) in (box|bag)
9
+1.03σ
9
Austen
Forgotten 18th Century Chinese Republics
10
+0.82σ
10
Drew Schorno
2035
11
+0.68σ
9
Justis Mills (Writing Advisor)
Why No Wheel Bus Again?
12
+0.58σ
9
Bill Jackson
Two critiques of Rethink Priorities’ Moral Weights project
13
+0.49σ
9
Lawrence Chan
We’re actually running out of benchmarks to upper bound AI capabilities
14
+0.44σ
9
Avi
The Smell
15
+0.43σ
9
Derek Razo
How to Pay to Change the Law
16
+0.37σ
9
conq
19th century poet UTTERLY DESTROYS critics (NO MERCY!)
17
+0.32σ
9
Alicorn (Writing Advisor)
Dogs Are Rude
18
+0.31σ
6
Remy
You Know What They Say About Assuming
19
+0.29σ
7
Layla Hughes
Every Lighthaven Writing Residency
20
+0.22σ
9
Henry Stanley
Inkhavening
I should probably note that I filtered out anyone who hadn’t published posts on at least 2/3rds of the days, so Vishal Prasad (+1.89σ, 2 posts), Robert Mushkatblat (+1.41σ, 4 posts), A.G.G Liu (+0.7σ, 1 post), Justin Kuiper (+0.32σ, 1 post) and Georgia Ray (+0.3σ, 1 post) didn’t make the cut on number, despite having the quality.
Alexander Wales (-0.23σ, 4 posts), whose post inspired this one, is also, sadly, left out of the rankings. (Sorry).