Joar Skalse

Karma: 740

My name is pronounced “YOO-ar SKULL-se” (the “e” is not silent). I’m a PhD student at Oxford University, and I was a member of the Future of Humanity Institute before it shut down. I have worked in several different areas of AI safety research. For a few highlights, see:

Some of my recent research on the theoretical foundations of reward learning is also described in this sequence.

For a full list of all my research, see my Google Scholar.

Deceptive Alignment

evhub, Chris van Merwijk, Vlad Mikulik, Joar Skalse and Scott Garrabrant

Jun 5, 2019, 8:16 PM

118 points

20 comments17 min readLW link

The Inner Alignment Problem

evhub, Chris van Merwijk, Vlad Mikulik, Joar Skalse and Scott Garrabrant

Jun 4, 2019, 1:20 AM

105 points

17 comments13 min readLW link

Conditions for Mesa-Optimization

evhub, Chris van Merwijk, Vlad Mikulik, Joar Skalse and Scott Garrabrant

Jun 1, 2019, 8:52 PM

84 points

48 comments12 min readLW link

Risks from Learned Optimization: Introduction

evhub, Chris van Merwijk, Vlad Mikulik, Joar Skalse and Scott Garrabrant

May 31, 2019, 11:44 PM

187 points

42 comments12 min readLW link 3 reviews

Two agents can have the same source code and optimise different utility functions

Joar SkalseJul 10, 2018, 9:51 PM

11 points

11 comments1 min readLW link

Joar Skalse

De­cep­tive Alignment

The In­ner Align­ment Problem

Con­di­tions for Mesa-Optimization

Risks from Learned Op­ti­miza­tion: Introduction

Two agents can have the same source code and op­ti­mise differ­ent util­ity functions

Deceptive Alignment

The Inner Alignment Problem

Conditions for Mesa-Optimization

Risks from Learned Optimization: Introduction

Two agents can have the same source code and optimise different utility functions