Leon Lang

Karma: 1,991

I’m a last-year PhD student at the University of Amsterdam working on AI Safety and Alignment, and specifically safety risks of Reinforcement Learning from Human Feedback (RLHF). Previously, I also worked on abstract multivariate information theory and equivariant deep learning. https://langleon.github.io/

A Technical Introduction to Solomonoff Induction without K-Complexity

Leon Lang26 Nov 2025 21:36 UTC

75 points

20 comments25 min readLW link

The Coding Theorem — A Link between Complexity and Probability

Leon Lang10 Aug 2025 15:34 UTC

34 points

4 comments9 min readLW link

X explains Z% of the variance in Y

Leon Lang20 Jun 2025 12:17 UTC

157 points

34 comments9 min readLW link

How to work through the ARENA program on your own

Leon Lang3 Jun 2025 17:38 UTC

37 points

3 comments6 min readLW link

[Paper Blogpost] When Your AIs Deceive You: Challenges with Partial Observability in RLHF

Leon Lang22 Oct 2024 13:57 UTC

51 points

2 comments18 min readLW link

(arxiv.org)

We Should Prepare for a Larger Representation of Academia in AI Safety

Leon Lang13 Aug 2023 18:03 UTC

90 points

14 comments5 min readLW link

Andrew Ng wants to have a conversation about extinction risk from AI

Leon Lang5 Jun 2023 22:29 UTC

31 points

2 comments1 min readLW link

(twitter.com)

Evaluating Language Model Behaviours for Shutdown Avoidance in Textual Scenarios

Simon Lermen, Teun van der Weij and Leon Lang

16 May 2023 10:53 UTC

26 points

0 comments13 min readLW link

[Appendix] Natural Abstractions: Key Claims, Theorems, and Critiques

LawrenceC, Erik Jenner and Leon Lang

16 Mar 2023 16:38 UTC

48 points

0 comments13 min readLW link

Natural Abstractions: Key Claims, Theorems, and Critiques

LawrenceC, Leon Lang and Erik Jenner

16 Mar 2023 16:37 UTC

246 points

26 comments45 min readLW link 3 reviews

Andrew Huberman on How to Optimize Sleep

Leon Lang2 Feb 2023 20:17 UTC

38 points

6 comments6 min readLW link

Experiment Idea: RL Agents Evading Learned Shutdownability

Leon Lang16 Jan 2023 22:46 UTC

31 points

7 comments17 min readLW link

(docs.google.com)

Disentangling Shard Theory into Atomic Claims

Leon Lang13 Jan 2023 4:23 UTC

86 points

6 comments18 min readLW link

Citability of Lesswrong and the Alignment Forum

Leon Lang8 Jan 2023 22:12 UTC

48 points

2 comments1 min readLW link

A Short Dialogue on the Meaning of Reward Functions

Leon Lang, Quintin Pope and peligrietzer

19 Nov 2022 21:04 UTC

45 points

0 comments3 min readLW link

Leon Lang’s Shortform

Leon Lang2 Oct 2022 10:05 UTC

2 points

86 comments1 min readLW link

Distribution Shifts and The Importance of AI Safety

Leon Lang29 Sep 2022 22:38 UTC

17 points

2 comments9 min readLW link

Summaries: Alignment Fundamentals Curriculum

Leon Lang18 Sep 2022 13:08 UTC

44 points

3 comments1 min readLW link

(docs.google.com)