Joar Skalse

Karma: 735

My name is pronounced “YOO-ar SKULL-se” (the “e” is not silent). I’m a PhD student at Oxford University, and I was a member of the Future of Humanity Institute before it shut down. I have worked in several different areas of AI safety research. For a few highlights, see:

Some of my recent research on the theoretical foundations of reward learning is also described in this sequence.

For a full list of all my research, see my Google Scholar.

How to Contribute to Theoretical Reward Learning Research

Joar SkalseFeb 28, 2025, 7:27 PM

16 points

0 comments21 min readLW link

Other Papers About the Theory of Reward Learning

Joar SkalseFeb 28, 2025, 7:26 PM

16 points

0 comments5 min readLW link

Defining and Characterising Reward Hacking

Joar SkalseFeb 28, 2025, 7:25 PM

15 points

0 comments4 min readLW link

Misspecification in Inverse Reinforcement Learning—Part II

Joar SkalseFeb 28, 2025, 7:24 PM

9 points

0 comments7 min readLW link

STARC: A General Framework For Quantifying Differences Between Reward Functions

Joar SkalseFeb 28, 2025, 7:24 PM

11 points

0 comments8 min readLW link

Misspecification in Inverse Reinforcement Learning

Joar SkalseFeb 28, 2025, 7:24 PM

19 points

0 comments11 min readLW link

Partial Identifiability in Reward Learning

Joar SkalseFeb 28, 2025, 7:23 PM

15 points

0 comments12 min readLW link

The Theoretical Reward Learning Research Agenda: Introduction and Motivation

Joar SkalseFeb 28, 2025, 7:20 PM

25 points

4 comments14 min readLW link

Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems

Joar SkalseMay 17, 2024, 7:13 PM

67 points

10 comments2 min readLW link

My Criticism of Singular Learning Theory

Joar SkalseNov 19, 2023, 3:19 PM

83 points

56 comments12 min readLW link

Goodhart’s Law in Reinforcement Learning

jacek, Joar Skalse, OliverHayman, charlie_griffin and Xingjian Bai

Oct 16, 2023, 12:54 AM

126 points

22 comments7 min readLW link

VC Theory Overview

Joar SkalseJul 2, 2023, 10:45 PM

12 points

2 comments11 min readLW link

How Smart Are Humans?

Joar SkalseJul 2, 2023, 3:46 PM

10 points

19 comments2 min readLW link

Using (Uninterpretable) LLMs to Generate Interpretable AI Code

Joar SkalseJul 2, 2023, 1:01 AM

13 points

12 comments3 min readLW link

Some Arguments Against Strong Scaling

Joar SkalseJan 13, 2023, 12:04 PM

25 points

21 comments16 min readLW link

What kinds of algorithms do multi-human imitators learn?

Chris van Merwijk and Joar Skalse

May 22, 2022, 2:27 PM

20 points

0 comments3 min readLW link

Updating Utility Functions

JustinShovelain and Joar Skalse

May 9, 2022, 9:44 AM

41 points

6 comments8 min readLW link

Why Neural Networks Generalise, and Why They Are (Kind of) Bayesian

Joar SkalseDec 29, 2020, 1:33 PM

75 points

58 comments1 min readLW link 1 review

Two senses of “optimizer”

Joar SkalseAug 21, 2019, 4:02 PM

35 points

41 comments3 min readLW link

Risks from Learned Optimization: Conclusion and Related Work

evhub, Chris van Merwijk, Vlad Mikulik, Joar Skalse and Scott Garrabrant

Jun 7, 2019, 7:53 PM

82 points

5 comments6 min readLW link