Jo Jiao

Karma: 368

hello, I’m Jo

https://joneedssleep.github.io/

Estimating No-CoT Task-Completion Time Horizons of Frontier AI Models

Anders Cairns Woodruff, Francis Rhys Ward, Dewi Gould, Rauno Arike, Jason R Brown, Jo Jiao, wlanderson, ariana_azarbal, harrymayne, Patrick Leask, Twm Stone, Josh Hills, Ida Caspary, Shubhorup Biswas and Julian Stastny

10 Jun 2026 17:58 UTC

273 points

23 comments4 min readLW link

Jo Jiao 23 May 2026 4:40 UTC
13 points
1
in reply to: StanislavKrym’s comment on: Conclave 1492
Context: I did the full version of the sim. I ran the AI 2027 tabletop wargames in 2025, and they have since updated the material. So, some points I make here may not carry over.
As far as I know, Conclave 1492 is different & similar in the following ways to the AI 2027 tabletop wargames:
- Just like in the wargames, your goal is to be true to your character, not win (though indeed many characters will want to win in some subgame by e.g. becoming pope). You are given a rich enough character sheet (mine was 30+ pages) that you will know your background, your motivation, the context behind your goals, your personality, and your preferences.
- You will write letters to NPCs and PCs.
- You have at the start resources such as currencies, items, differential information, and armies that reflect the standing of your character in 1492, and you will need to use these resources carefully to achieve your objective.
- The war phase of Conclave 1492 has its own complicated rules and internal logic.
- You will learn a ton of Italian Renaissance history and moreover get a feel of the zeitgeist of 15th century Europe.

Jo Jiao 24 Aug 2025 15:21 UTC
1 point
0
in reply to: Aryaman Arora’s comment on: Mech Interp Wiki Page and Why You Should Edit Wikipedia
Hi Aryaman, thanks again for the great technical writeup in the mech interp article. Moving to the mech interp talk page to address the COI and RS concerns.

Jo Jiao 14 Aug 2025 17:12 UTC
1 point
0
in reply to: gjm’s comment on: Mech Interp Wiki Page and Why You Should Edit Wikipedia
Good call! Linking the relevant pages.

Jo Jiao 14 Aug 2025 17:11 UTC
4 points
0
in reply to: Chris_Leong’s comment on: Mech Interp Wiki Page and Why You Should Edit Wikipedia
I don’t think they’ve become less important. Wikipedia is pretty heavily cited by LLMs when they go and do their own research in my experience, so Wikipedia articles are still valuable even if fewer humans visit it.
On the point of Google not prioritizing it so heavily—I don’t think Google indexes a lot of new Wikipedia articles but old established articles still top the search result. In our case, the mech interp wiki page never got indexed by Google until a Wikipedia New Page reviewer marked it as reviewed a couple days ago—now it’s a top result.

Mech Interp Wiki Page and Why You Should Edit Wikipedia

Noah Birnbaum and Jo Jiao

12 Aug 2025 17:28 UTC

78 points

16 comments1 min readLW link

Jo Jiao 17 Apr 2025 18:51 UTC
4 points
0
in reply to: Zephaniah Roe’s comment on: College Advice For People Like Me
totally had Henry’s voice playing while reading your comment

Undergrad AI Safety Conference

Jo Jiao19 Feb 2025 3:43 UTC

19 points

0 comments1 min readLW link

Call for Applications: XLab Summer Research Fellowship

Jo Jiao18 Feb 2025 19:19 UTC

12 points

0 comments1 min readLW link

Jo Jiao 27 Jan 2025 23:05 UTC
2 points
1
on: JoNeedsSleep’s Shortform
My best attempt at attempting to characterize Kant’s Transcendental Idealism - Kant’s idealism says that essence—not existence—is dependent on us. That is to say, what it is to be is dependent on how we understand. For example, the schema of classification in biology, such as genetic proximity, depends on what purposes they serve to us. What it is for animals to be depends, in other words, on the biologist. To draw the biology analogy ad absurdum, transcendental idealism says something like “the genetic composition is the condition of the possibility of how we are able to make sense of biological objects in the first place”. The existence of these classification schema is dependent on our mind a priori.

Jo Jiao 24 Oct 2024 4:50 UTC
4 points
1
on: JoNeedsSleep’s Shortform
The distinction between inner and outer alignment is quite unnatural. For example, even the concept of reward hacking implies the double-fold failure of a reward that is not robust enough to exploitation, and a model that develops instrumental capabilities as to find a way to trick the reward; indeed, in the case of reward hacking, it’s worth noting that depending on the autonomy of the system in question, we could attribute the misalignment as inner or outer. At its core, this distinction comes out of the policy <-> reward scheme of RL, though prediction <-> loss function in SL can be similarly characterized; I doubt how well this framing generalizes to other engineering choices.

JoNeedsSleep’s Shortform

Jo Jiao24 Oct 2024 4:50 UTC

1 point

2 comments1 min readLW link

Jo Jiao 20 Aug 2024 23:01 UTC
3 points
0
on: IMO challenge bet with Eliezer
Eliezer seems on track to win: current AI benchmark for IMO geometry problems is at ²⁷⁄₃₀ (IMO Gold human performance is at 25.9/30). This new benchmark was set by LLM-augmented neurosymbolic AI.
Wu’s Method can Boost Symbolic AI to Rival Silver Medalists and AlphaGeometry to Outperform Gold Medalists at IMO Geometry [2024 April]

Notes on Tuning Metacognition

Jo Jiao3 Jul 2024 19:54 UTC

10 points

0 comments5 min readLW link

Jo Jiao 2 May 2024 3:03 UTC
2 points
0
on: Transformers Represent Belief State Geometry in their Residual Stream
Thank you for the insightful post! You mentioned that:
Consider the relation a transformer has to an HMM that produced the data it was trained on. This is general—any dataset consisting of sequences of tokens can be represented as having been generated from an HMM.
and the linear projection consists of:
Linear regression from the residual stream activations (64 dimensional vectors) to the belief distributions (3 dimensional vectors).
Given any natural language dataset, if we didn’t have the ground truth belief distribution, is it possible to reverse engineer (data $\to$ model) a HMM and extract the topology of the residual stream activation?
I’ve been running task salient representation experiments on larger models and am very interested in replicating and possibly extending your result to more noisy settings.

Jo Jiao

Es­ti­mat­ing No-CoT Task-Com­ple­tion Time Hori­zons of Fron­tier AI Models

Mech In­terp Wiki Page and Why You Should Edit Wikipedia

Un­der­grad AI Safety Conference

Call for Ap­pli­ca­tions: XLab Sum­mer Re­search Fel­low­ship

JoNeed­sSleep’s Shortform

Notes on Tun­ing Metacognition