RSS

In­verse Re­in­force­ment Learning

TagLast edit: 19 Apr 2023 23:45 UTC by Phib

From ChatGPT(4):

Inverse Reinforcement Learning (IRL) is a technique in the field of machine learning where an AI system learns the preferences or objectives of an agent, typically a human, by observing their behavior. Unlike traditional Reinforcement Learning (RL), where an agent learns to optimize its actions based on given reward functions, IRL works by inferring the underlying reward function from the demonstrated behavior.

In other words, IRL aims to understand the motivations and goals of an agent by examining their actions in various situations. Once the AI system has learned the inferred reward function, it can then use this information to make decisions that align with the preferences or objectives of the observed agent.

IRL is particularly relevant in the context of AI alignment, as it provides a potential approach to align AI systems with human values. By learning from human demonstrations, AI systems can be designed to better understand and respect the preferences, intentions, and values of the humans they interact with or serve.

(Cunningham law this if you please, it was empty when I came across it and I thought something better than nothing.)

In­verse re­in­force­ment learn­ing on self, pre-on­tol­ogy-change

Stuart_Armstrong18 Nov 2015 13:23 UTC
0 points
2 comments1 min readLW link

Co­op­er­a­tive In­verse Re­in­force­ment Learn­ing vs. Ir­ra­tional Hu­man Preferences

orthonormal18 Jun 2016 0:55 UTC
16 points
2 comments3 min readLW link

(C)IRL is not solely a learn­ing process

Stuart_Armstrong15 Sep 2016 8:35 UTC
1 point
29 comments3 min readLW link

Del­ega­tive In­verse Re­in­force­ment Learning

Vanessa Kosoy12 Jul 2017 12:18 UTC
15 points
13 comments16 min readLW link

CIRL Wireheading

tom4everitt8 Aug 2017 6:33 UTC
3 points
4 comments2 min readLW link

Hu­mans can be as­signed any val­ues what­so­ever...

Stuart_Armstrong13 Oct 2017 11:29 UTC
15 points
6 comments4 min readLW link

Bi­ased re­ward-learn­ing in CIRL

Stuart_Armstrong5 Jan 2018 18:12 UTC
8 points
3 comments7 min readLW link

Prob­lems in­te­grat­ing de­ci­sion the­ory and in­verse re­in­force­ment learning

agilecaveman8 May 2018 5:11 UTC
7 points
2 comments3 min readLW link

Agents That Learn From Hu­man Be­hav­ior Can’t Learn Hu­man Values That Hu­mans Haven’t Learned Yet

steven046111 Jul 2018 2:59 UTC
27 points
11 comments1 min readLW link

Model Mis-speci­fi­ca­tion and In­verse Re­in­force­ment Learning

9 Nov 2018 15:33 UTC
33 points
3 comments16 min readLW link

RAISE is launch­ing their MVP

null26 Feb 2019 11:45 UTC
67 points
1 comment1 min readLW link

IRL 1/​8: In­verse Re­in­force­ment Learn­ing and the prob­lem of degeneracy

RAISE4 Mar 2019 13:11 UTC
20 points
2 comments1 min readLW link
(app.grasple.com)

[Question] Can co­her­ent ex­trap­o­lated vo­li­tion be es­ti­mated with In­verse Re­in­force­ment Learn­ing?

Jade Bishop15 Apr 2019 3:23 UTC
12 points
5 comments3 min readLW link

Learn­ing bi­ases and re­wards simultaneously

Rohin Shah6 Jul 2019 1:45 UTC
41 points
3 comments4 min readLW link

Thoughts on “Hu­man-Com­pat­i­ble”

TurnTrout10 Oct 2019 5:24 UTC
64 points
34 comments5 min readLW link

Hu­man-AI Collaboration

Rohin Shah22 Oct 2019 6:32 UTC
42 points
7 comments2 min readLW link
(bair.berkeley.edu)

Book re­view: Hu­man Compatible

PeterMcCluskey19 Jan 2020 3:32 UTC
37 points
2 comments5 min readLW link
(www.bayesianinvestor.com)

Book Re­view: Hu­man Compatible

Scott Alexander31 Jan 2020 5:20 UTC
78 points
6 comments16 min readLW link
(slatestarcodex.com)

Our take on CHAI’s re­search agenda in un­der 1500 words

Alex Flint17 Jun 2020 12:24 UTC
112 points
18 comments5 min readLW link

AXRP Epi­sode 2 - Learn­ing Hu­man Bi­ases with Ro­hin Shah

DanielFilan29 Dec 2020 20:43 UTC
13 points
0 comments35 min readLW link

My take on Michael Littman on “The HCI of HAI”

Alex Flint2 Apr 2021 19:51 UTC
59 points
4 comments7 min readLW link

AXRP Epi­sode 8 - As­sis­tance Games with Dy­lan Had­field-Menell

DanielFilan8 Jun 2021 23:20 UTC
22 points
1 comment72 min readLW link

Hard­code the AGI to need our ap­proval in­definitely?

MichaelStJules11 Nov 2021 7:04 UTC
2 points
2 comments1 min readLW link

Machines vs Memes Part 3: Imi­ta­tion and Memes

ceru231 Jun 2022 13:36 UTC
7 points
0 comments7 min readLW link

[Question] Is CIRL a promis­ing agenda?

Chris_Leong23 Jun 2022 17:12 UTC
27 points
14 comments1 min readLW link

A Sur­vey of Foun­da­tional Meth­ods in In­verse Re­in­force­ment Learning

adamk1 Sep 2022 18:21 UTC
19 points
0 comments12 min readLW link

Data for IRL: What is needed to learn hu­man val­ues?

Jan Wehner3 Oct 2022 9:23 UTC
18 points
6 comments12 min readLW link

[Linkpost] Con­cept Align­ment as a Pr­ereq­ui­site for Value Alignment

Bogdan Ionut Cirstea4 Nov 2023 17:34 UTC
27 points
0 comments1 min readLW link
(arxiv.org)

Why do we need RLHF? Imi­ta­tion, In­verse RL, and the role of reward

Ran W3 Feb 2024 4:00 UTC
12 points
0 comments5 min readLW link
No comments.