Inverse Reinforcement Learning

TagLast edit: 19 Apr 2023 23:45 UTC by Phib

From ChatGPT(4):

Inverse Reinforcement Learning (IRL) is a technique in the field of machine learning where an AI system learns the preferences or objectives of an agent, typically a human, by observing their behavior. Unlike traditional Reinforcement Learning (RL), where an agent learns to optimize its actions based on given reward functions, IRL works by inferring the underlying reward function from the demonstrated behavior.

In other words, IRL aims to understand the motivations and goals of an agent by examining their actions in various situations. Once the AI system has learned the inferred reward function, it can then use this information to make decisions that align with the preferences or objectives of the observed agent.

IRL is particularly relevant in the context of AI alignment, as it provides a potential approach to align AI systems with human values. By learning from human demonstrations, AI systems can be designed to better understand and respect the preferences, intentions, and values of the humans they interact with or serve.

(Cunningham law this if you please, it was empty when I came across it and I thought something better than nothing.)

Thoughts on “Human-Compatible”

TurnTrout10 Oct 2019 5:24 UTC

64 points

34 comments5 min readLW link

Model Mis-specification and Inverse Reinforcement Learning

Owain_Evans and jsteinhardt

9 Nov 2018 15:33 UTC

34 points

3 comments16 min readLW link

Learning biases and rewards simultaneously

Rohin Shah6 Jul 2019 1:45 UTC

41 points

3 comments4 min readLW link

Our take on CHAI’s research agenda in under 1500 words

Alex Flint17 Jun 2020 12:24 UTC

112 points

18 comments5 min readLW link

[Question] Can coherent extrapolated volition be estimated with Inverse Reinforcement Learning?

Jade Bishop15 Apr 2019 3:23 UTC

12 points

5 comments3 min readLW link

Cooperative Inverse Reinforcement Learning vs. Irrational Human Preferences

orthonormal18 Jun 2016 0:55 UTC

17 points

2 comments3 min readLW link

Inverse reinforcement learning on self, pre-ontology-change

Stuart_Armstrong18 Nov 2015 13:23 UTC

0 points

2 comments1 min readLW link

[Question] Is CIRL a promising agenda?

Chris_Leong23 Jun 2022 17:12 UTC

28 points

16 comments1 min readLW link

A Survey of Foundational Methods in Inverse Reinforcement Learning

adamk1 Sep 2022 18:21 UTC

19 points

0 comments12 min readLW link

Biased reward-learning in CIRL

Stuart_Armstrong5 Jan 2018 18:12 UTC

8 points

3 comments7 min readLW link

CIRL Wireheading

tom4everitt8 Aug 2017 6:33 UTC

3 points

4 comments2 min readLW link

(C)IRL is not solely a learning process

Stuart_Armstrong15 Sep 2016 8:35 UTC

1 point

29 comments3 min readLW link

Book Review: Human Compatible

Scott Alexander31 Jan 2020 5:20 UTC

78 points

6 comments16 min readLW link

(slatestarcodex.com)

Book review: Human Compatible

PeterMcCluskey19 Jan 2020 3:32 UTC

37 points

2 comments5 min readLW link

(www.bayesianinvestor.com)

AXRP Episode 2 - Learning Human Biases with Rohin Shah

DanielFilan29 Dec 2020 20:43 UTC

13 points

0 comments35 min readLW link

IRL 1/8: Inverse Reinforcement Learning and the problem of degeneracy

RAISE4 Mar 2019 13:11 UTC

20 points

2 comments1 min readLW link

(app.grasple.com)

Problems integrating decision theory and inverse reinforcement learning

agilecaveman8 May 2018 5:11 UTC

7 points

2 comments3 min readLW link

My take on Michael Littman on “The HCI of HAI”

Alex Flint2 Apr 2021 19:51 UTC

59 points

4 comments7 min readLW link

AXRP Episode 8 - Assistance Games with Dylan Hadfield-Menell

DanielFilan8 Jun 2021 23:20 UTC

22 points

1 comment72 min readLW link

Delegative Inverse Reinforcement Learning

Vanessa Kosoy12 Jul 2017 12:18 UTC

15 points

13 comments16 min readLW link

[Linkpost] Concept Alignment as a Prerequisite for Value Alignment

Bogdan Ionut Cirstea4 Nov 2023 17:34 UTC

27 points

0 comments1 min readLW link

(arxiv.org)

RAISE is launching their MVP

null26 Feb 2019 11:45 UTC

67 points

1 comment1 min readLW link

Human-AI Collaboration

Rohin Shah22 Oct 2019 6:32 UTC

42 points

7 comments2 min readLW link

(bair.berkeley.edu)

Agents That Learn From Human Behavior Can’t Learn Human Values That Humans Haven’t Learned Yet

steven046111 Jul 2018 2:59 UTC

27 points

11 comments1 min readLW link

Humans can be assigned any values whatsoever...

Stuart_Armstrong13 Oct 2017 11:29 UTC

16 points

6 comments4 min readLW link

Hardcode the AGI to need our approval indefinitely?

MichaelStJules11 Nov 2021 7:04 UTC

2 points

2 comments1 min readLW link

Machines vs Memes Part 3: Imitation and Memes

ceru231 Jun 2022 13:36 UTC

7 points

0 comments7 min readLW link

Data for IRL: What is needed to learn human values?

Jan Wehner3 Oct 2022 9:23 UTC

18 points

6 comments12 min readLW link

Why do we need RLHF? Imitation, Inverse RL, and the role of reward

Ran W3 Feb 2024 4:00 UTC

14 points

0 comments5 min readLW link

No comments.