RSS

Joar Skalse

Karma: 747

My name is pronounced “YOO-ar SKULL-se” (the “e” is not silent). I have a PhD in computer science from Oxford University, and I was a member of the Future of Humanity Institute before it shut down. I have worked in several different areas of AI safety research. For a few highlights, see:

  1. Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems

  2. Misspecification in Inverse Reinforcement Learning

  3. STARC: A General Framework For Quantifying Differences Between Reward Functions

  4. Risks from Learned Optimization in Advanced Machine Learning Systems

  5. Is SGD a Bayesian sampler? Well, almost

My PhD thesis is available here. Some of my recent work on the theoretical foundations of reward learning is also described in this sequence.

For a full list of all my research, see my Google Scholar.

How to Con­tribute to The­o­ret­i­cal Re­ward Learn­ing Research

Joar Skalse28 Feb 2025 19:27 UTC
16 points
0 comments21 min readLW link

Other Papers About the The­ory of Re­ward Learning

Joar Skalse28 Feb 2025 19:26 UTC
16 points
0 comments5 min readLW link

Defin­ing and Char­ac­ter­is­ing Re­ward Hacking

Joar Skalse28 Feb 2025 19:25 UTC
15 points
0 comments4 min readLW link

Misspeci­fi­ca­tion in In­verse Re­in­force­ment Learn­ing—Part II

Joar Skalse28 Feb 2025 19:24 UTC
9 points
0 comments7 min readLW link

STARC: A Gen­eral Frame­work For Quan­tify­ing Differ­ences Between Re­ward Functions

Joar Skalse28 Feb 2025 19:24 UTC
11 points
0 comments8 min readLW link

Misspeci­fi­ca­tion in In­verse Re­in­force­ment Learning

Joar Skalse28 Feb 2025 19:24 UTC
19 points
0 comments11 min readLW link

Par­tial Iden­ti­fi­a­bil­ity in Re­ward Learning

Joar Skalse28 Feb 2025 19:23 UTC
16 points
0 comments12 min readLW link

The The­o­ret­i­cal Re­ward Learn­ing Re­search Agenda: In­tro­duc­tion and Motivation

Joar Skalse28 Feb 2025 19:20 UTC
29 points
4 comments14 min readLW link

Towards Guaran­teed Safe AI: A Frame­work for En­sur­ing Ro­bust and Reli­able AI Systems

Joar Skalse17 May 2024 19:13 UTC
67 points
10 comments2 min readLW link