Leon Lang

Karma: 2,042

I’m a last-year PhD student at the University of Amsterdam working on AI Safety and Alignment, and specifically safety risks of Reinforcement Learning from Human Feedback (RLHF). Previously, I also worked on abstract multivariate information theory and equivariant deep learning. https://langleon.github.io/

Leon Lang 10 Mar 2026 22:44 UTC
4 points
2
in reply to: cubefox’s comment on: Leon Lang’s Shortform
Do you mean “we have not seen an intelligence jump like from 3.5 to 4 again” unconditionally? Then I’d disagree, I think the newest GPT-pro models are a greater jump over 4 than 4 is over 3.5.

Or do you mean we have not seen a similar jump in pretraining capabilities? That is plausible but I wonder how to assess that.

Leon Lang 9 Mar 2026 9:56 UTC
29 points
4
on: Leon Lang’s Shortform
I remember that at the end of 2024 there were many reports of strongly diminishing returns in the development (and specifically pretraining) of foundation models, right around the time when reasoning models were starting to emerge. I also remember that many people on Lesswrong thought AI is developing more slowly than they had previously expected.
How are people feeling about this now? My impression is there was no overall slowdown, but I am curious in other people’s takes.
Something going slower than I expected is voice, and multimodality in general, though it’s hard to say whether this is due to a research roadblock or simply due to the companies’ focus on reasoning, coding, and agentic text-based workflows.
What links here?
- dynomight's comment on dynomight’s Shortform by dynomight (9 Mar 2026 20:51 UTC; 2 points)

Leon Lang 5 Mar 2026 20:11 UTC
2 points
0
in reply to: Dan MacKinlay’s comment on: An Alignment Journal: Coming Soon
I think the general sense is that this is written for a LW audience. If I’d point to specific wordings:
- “Key bets”, “The Core Bet”
- “build-in-the-open updates”
- “friction that kills speed”
- “This project could fail”
- “Status-chasing bottleneck”
- “counterfactually positive impact”
- “credibly status-accruing”
I think how other organizations handle this sort of thing is that they may have one post on Lesswrong for this specific audience, and a second, less detailed post for a broader community on their website. E.g., compare Anthropic’s RSP update with Holden’s post on the topic.
Concretely, I think it seems like your post assumes some of the worldviews and assumptions of the lesswrong-ish alignment community, and so general academics may feel like the post is not addressed to them.

Leon Lang 3 Mar 2026 22:08 UTC
2 points
0
on: An Alignment Journal: Coming Soon
This post seems written as if it’s “addressed to” the lesswrong community, rather than the broader community of researchers who might want to publish in such a journal. Was this intentional?

Leon Lang 26 Feb 2026 12:26 UTC
4 points
0
in reply to: peterbarnett’s comment on: Responsible Scaling Policy v3
One interpretation for how Holden might have been consistent over time: He did not think that Anthropic should unilaterally pause AI development if other companies race ahead. But he did think the RSP should say that they’d pause when there are unmitigated risks regardless of the context and race-dynamics since saying so in the RSP is a good forcing function for the actual benefits that he wished would follow from it.
(Tbc., I do not know what Holden believed, I’m just constructing a plausible reality)
(Also, even then he at least seems to have changed his mind about whether writing down If-Then commitments is a good idea!)

Leon Lang 26 Feb 2026 11:05 UTC
8 points
−5
in reply to: peterbarnett’s comment on: Responsible Scaling Policy v3
I just skimmed the piece, and it does seem consistent over time to me. Eg., under “Potential Benefits”, the piece does not list unilateral pause in case of unmitigated risks.

Leon Lang 22 Feb 2026 13:45 UTC
2 points
0
in reply to: Angela Tang’s comment on: How to work through the ARENA program on your own
My impression was that people in the in-person program mostly didn’t have enough time to do these. In any case, I do not know, and recommend just doing those bonus parts that seem personally exciting to you :)

Leon Lang 23 Jan 2026 16:27 UTC
4 points
0
on: Leon Lang’s Shortform
18-month postdoc position in Singular Learning Theory for Machine Learning Models in Amsterdam: https://werkenbij.uva.nl/en/vacancies/postdoc-position-in-singular-learning-theory-for-machine-learning-models-netherlands-14741

The PI Patrick Forré is an experienced mathematician with a past background in arithmetic geometry, and he also has extensive experience in machine learning. I recommend applying! Feel free to ask me a question if you want, Patrick has been my PhD advisor.

Leon Lang 23 Jan 2026 12:56 UTC
4 points
0
on: Decoupling vs Contextualising Norms
I repeatedly refer people to this post, and they repeatedly tell me that it explains a great many of conversations in their real life in a way they previously found hard to pin down. It’s a great post.

Leon Lang 5 Jan 2026 21:35 UTC
4 points
0
in reply to: cubefox’s comment on: X explains Z% of the variance in Y
Agreed that the post is not about causality.

Leon Lang 29 Nov 2025 15:34 UTC
12 points
6
in reply to: Adrià Garriga-alonso’s comment on: Alignment will happen by default. What’s next?
You saying you don’t have this experience sounds bizarre to me. Here is an example of this behavior happening to me recently:
It then invented another doi.
This is very common behavior in my experience.

Leon Lang 28 Nov 2025 1:21 UTC
4 points
0
in reply to: interstice’s comment on: A Technical Introduction to Solomonoff Induction without K-Complexity
Good idea, I now added the following to the opening paragraphs of the section doing the comparisons:
Importantly, due to Theorem 4, this means that the Solomonoff prior $P_{s o l}$ and a priori prior $P_{a p}$ lead up to a constant to the same predictions on sequences. The advantages of the priors that we analyze are thus not statements about their induced predictive distributions.

Leon Lang 27 Nov 2025 18:40 UTC
4 points
0
in reply to: Leon Lang’s comment on: K-complexity is silly; use cross-entropy instead
I wrote a post incorporating these thoughts now.

Leon Lang 27 Nov 2025 18:39 UTC
2 points
0
in reply to: Leon Lang’s comment on: K-complexity is silly; use cross-entropy instead
The post now exists.

Leon Lang 27 Nov 2025 17:30 UTC
4 points
0
in reply to: Lucius Bushnaq’s comment on: A Technical Introduction to Solomonoff Induction without K-Complexity
I answered in the parallel thread, which is probably going down to the crux now. To add a few more points:
- The prior matters for the Solomonoff bound, see Theorem 5. (Tbc., the true value of the prediction error is the same irrespective of the prior, but the bound we can prove differs)
- I think different priors have different aesthetics. Choosing a prior because it gives you a nice result (i.e., Solomonoff prior) feels different from choosing it because it’s a priori correct (like the a priori prior in this post). to me, aesthetics matter.

Leon Lang 27 Nov 2025 17:22 UTC
4 points
0
in reply to: Lucius Bushnaq’s comment on: A Technical Introduction to Solomonoff Induction without K-Complexity
Okay, I think I overstated the extent to which the difference in priors matters in the previous comments and crossed out “practical”.
Basically, I was right that the prior that gives 100% on $ν$ cannot update, it gives all its weight to $ν$ no matter how much data comes in. However, $ν$ itself can update with more data and shift between $ν_{1}$ and $ν_{2}$ .
I can see that this feels perhaps very syntactic, but in my mind the two priors still feel different. One of them is saying “The world first samples a bit indicating whether the world will continue with world 0 or world 1”, and the other one is saying “I am uncertain on whether we live in world 0 or world 1″.

Leon Lang 27 Nov 2025 15:43 UTC
4 points
0
in reply to: interstice’s comment on: A Technical Introduction to Solomonoff Induction without K-Complexity
Yes. There are lots of different settings one could consider, e.g.:
- Finite strings
- Infinite strings
- Functions
- LSCSMs
For all of these cases, one can compare different notions of complexity (plain K-complexity, prefix complexity, monotone complexity, if applicable) with algorithmic probability. My sense is that the correspondence is only exact for universal prefix machines and finite strings, but I didn’t consider all settings.

Leon Lang 27 Nov 2025 15:34 UTC
4 points
0
in reply to: Leon Lang’s comment on: A Technical Introduction to Solomonoff Induction without K-Complexity
It’s also useful to emphasize why even if the mixtures are the same, having different priors can make a ~~practical~~ difference. E.g., imagine that in the example above we had one prior giving 100% weight to $ν$ , and another prior giving 50% weight to each of $ν_{0}$ and $ν_{1}$ . They give the same mixture, but the first prior can’t update, and the second prior can!

Leon Lang 27 Nov 2025 15:32 UTC
4 points
0
in reply to: Lucius Bushnaq’s comment on: A Technical Introduction to Solomonoff Induction without K-Complexity
Well, their induced mixture distributions are the same up to a constant, but the priors on hypotheses are different. I’m not sure if you consider the difference “relevant”, perhaps you only care about the induced mixture distribution?
To make a simple example: Assume there were only three Turing machines $T$ , $T_{0}$ , and $T_{1}$ . Assume that $T (0 p) = T_{0} (p)$ and $T (1 p) = T_{1} (p)$ . Let $ν$ , $ν_{0}$ and $ν_{1}$ be the LSCSMs induced by $T$ , $T_{1}$ , and $T_{2}$ . Notice that $ν$ is a mixture of $ν_{0}$ and $ν_{1}$ : $ν = 1 / 2 ν_{0} + 1 / 2 ν_{1}$ .
Let $M$ be the mixture distribution given as $M = 1 / 3 ν + 1 / 3 ν_{0} + 1 / 3 ν_{1} .$ Then clearly, $M$ is also represented as $M = 1 / 2 ν_{0} + 1 / 2 ν_{1}$ . My viewpoint is that the prior distributions giving weight $1 / 3$ to each of the three hypotheses is different from the one giving weight $1 / 2$ to each of $ν_{0}$ and $ν_{1}$ , even if their mixture distributions are exactly the same.
And this is exactly the situation we’re in with the true mixture distribution $M$ from the post. Some of the LSCSMs $ν$ in the mixture are given by $ν = ν_{T}$ for a separate universal monotone Turing machine, which means that $ν_{T}$ is itself a mixture of all LSCSMs. Any such mixtures in the LSCSMs allow to redistribute the prior weight from this LSCSM to all others, without affecting the mixture $M$ in any way.
This is also related to what makes a prior based on Kolmogorov complexity ultimately so arbitrary: We could have chosen just about anything and it would still essentially sum to $M$ . A posteriori the Kolmogorov complexity then has some mathematical advantages as outlined in the post, however.

Leon Lang 26 Nov 2025 23:51 UTC
4 points
0
in reply to: Lucius Bushnaq’s comment on: A Technical Introduction to Solomonoff Induction without K-Complexity
I’m confused. Isn’t one of the standard justification for the Solomonoff prior that you can get it without talking about K-complexity, just by assuming a uniform prior over programs of length $l$ on a universal monotone Turing machine and letting $l$ tend to infinity?
What you describe is not the Solomonoff prior on hypotheses, but the Solomonoff a priori distribution on sequences/histories! This is the distribution I call $M$ in my post. It can then be written as a mixture of LSCSMs, with the weights given either by the Solomonoff prior $P_{s o l}$ (involving Kolmogorov complexity) or the a priori prior $P_{a p}$ in my work. Those priors are not the same.