RSS

Joar Skalse

Karma: 560

My name is pronounced “YOO-ar SKULL-se”.

I’m a DPhil Scholar at the Future of Humanity Institute in Oxford.

My Crit­i­cism of Sin­gu­lar Learn­ing Theory

Joar Skalse19 Nov 2023 15:19 UTC
77 points
56 comments12 min readLW link

Good­hart’s Law in Re­in­force­ment Learning

16 Oct 2023 0:54 UTC
125 points
22 comments7 min readLW link

VC The­ory Overview

Joar Skalse2 Jul 2023 22:45 UTC
10 points
2 comments11 min readLW link

How Smart Are Hu­mans?

Joar Skalse2 Jul 2023 15:46 UTC
9 points
19 comments2 min readLW link

Us­ing (Un­in­ter­pretable) LLMs to Gen­er­ate In­ter­pretable AI Code

Joar Skalse2 Jul 2023 1:01 UTC
15 points
9 comments3 min readLW link

Some Ar­gu­ments Against Strong Scaling

Joar Skalse13 Jan 2023 12:04 UTC
25 points
21 comments16 min readLW link

What kinds of al­gorithms do multi-hu­man imi­ta­tors learn?

22 May 2022 14:27 UTC
20 points
0 comments3 min readLW link

Up­dat­ing Utility Functions

9 May 2022 9:44 UTC
37 points
6 comments8 min readLW link

Why Neu­ral Net­works Gen­er­al­ise, and Why They Are (Kind of) Bayesian

Joar Skalse29 Dec 2020 13:33 UTC
74 points
58 comments1 min readLW link1 review

[Question] Baseline Like­li­hood of Long-Term Side Effects From New Drugs?

Joar Skalse27 Dec 2020 13:37 UTC
7 points
2 comments1 min readLW link

Two senses of “op­ti­mizer”

Joar Skalse21 Aug 2019 16:02 UTC
35 points
41 comments3 min readLW link

Risks from Learned Op­ti­miza­tion: Con­clu­sion and Re­lated Work

7 Jun 2019 19:53 UTC
82 points
5 comments6 min readLW link

De­cep­tive Alignment

5 Jun 2019 20:16 UTC
117 points
20 comments17 min readLW link

The In­ner Align­ment Problem

4 Jun 2019 1:20 UTC
103 points
17 comments13 min readLW link

Con­di­tions for Mesa-Optimization

1 Jun 2019 20:52 UTC
83 points
48 comments12 min readLW link

Risks from Learned Op­ti­miza­tion: Introduction

31 May 2019 23:44 UTC
183 points
42 comments12 min readLW link3 reviews

Two agents can have the same source code and op­ti­mise differ­ent util­ity functions

Joar Skalse10 Jul 2018 21:51 UTC
11 points
11 comments1 min readLW link