Utility Functions

TagLast edit: Dec 30, 2024, 9:55 AM by Dakara

Utility Function is a function that assigns numerical values (“utilities”) to outcomes, in such a way that outcomes with higher utilities are absolutely always preferred to outcomes with lower utilities, with no exceptions; the lack of exploitable holes in the preference ordering is necessary for the definition and separates utility from mere reward.

Utility Functions do not work very well in practice for individual humans. Human drives are not coherent nor is there any reason to think they would converge to a utility-function-grade level of reliability (Thou Art Godshatter), and even people with a strong interest in the concept have trouble working out what their utility function actually is even slightly (Post Your Utility Function). Furthermore, humans appear to calculate reward and loss separately—adding one to the other does not predict their behavior accurately, and thus human reward is not human utility. This makes humans highly exploitable—and in fact, not being exploitable would be a minimum requirement in order to qualify as having a coherent utility function.

pjeby posits humans’ difficulty in understanding their own utility functions as the root of akrasia.

However, utility functions can be a useful model for dealing with humans in groups, e.g. in economics.

The VNM Theorem tag is likely to be a strict subtag of the Utility Functions tag, because the VNM theorem establishes when preferences can be represented by a utility function, but a post discussing utility functions may or may not discuss the VNM theorem/axioms.

Because utility functions arise from VNM rationality, they may still be of note in understanding intelligent systems even when the system does not explicitly store a utility function anywhere, since reducing exploitable error rate should eventually converge to utility-function-like guarantees.

Coherent decisions imply consistent utilities

Eliezer YudkowskyMay 12, 2019, 9:33 PM

149 points

83 comments26 min readLW link 3 reviews

Coherence arguments do not entail goal-directed behavior

Rohin ShahDec 3, 2018, 3:26 AM

134 points

69 comments7 min readLW link 3 reviews

An Orthodox Case Against Utility Functions

abramdemskiApr 7, 2020, 7:18 PM

154 points

66 comments8 min readLW link 2 reviews

Approximately Bayesian Reasoning: Knightian Uncertainty, Goodhart, and the Look-Elsewhere Effect

RogerDearnaleyJan 26, 2024, 3:58 AM

16 points

2 comments11 min readLW link

Utility ≠ Reward

Vlad MikulikSep 5, 2019, 5:28 PM

131 points

24 comments1 min readLW link 2 reviews

Bayesian Utility: Representing Preference by Probability Measures

Vladimir_NesovJul 27, 2009, 2:28 PM

50 points

37 comments2 min readLW link

Why Not Subagents?

johnswentworth and David Lorell

Jun 22, 2023, 10:16 PM

130 points

52 comments14 min readLW link 1 review

How easily can we separate a friendly AI in design space from one which would bring about a hyperexistential catastrophe?

AnirandisSep 10, 2020, 12:40 AM

20 points

19 comments2 min readLW link

Pinpointing Utility

[deleted]Feb 1, 2013, 3:58 AM

94 points

156 comments13 min readLW link

Time and Effort Discounting

Scott AlexanderJul 7, 2011, 11:48 PM

66 points

32 comments4 min readLW link

The Human’s Hidden Utility Function (Maybe)

lukeprogJan 23, 2012, 7:39 PM

68 points

91 comments3 min readLW link

When do utility functions constrain?

HoagyAug 23, 2019, 5:19 PM

30 points

8 comments7 min readLW link

[Question] Why doesn’t the presence of log-loss for probabilistic models (e.g. sequence prediction) imply that any utility function capable of producing a “fairly capable” agent will have at least some non-negligible fraction of overlap with human values?

Thoth HermesMay 16, 2023, 6:02 PM

2 points

0 comments1 min readLW link

Orthogonality is expensive

berenApr 3, 2023, 10:20 AM

43 points

9 comments3 min readLW link

If you don’t know the name of the game, just tell me what I mean to you

Stuart_ArmstrongOct 26, 2010, 1:43 PM

16 points

26 comments5 min readLW link

The VNM independence axiom ignores the value of information

kilobugMar 2, 2013, 2:36 PM

15 points

48 comments1 min readLW link

Ngo and Yudkowsky on AI capability gains

Eliezer Yudkowsky and Richard_Ngo

Nov 18, 2021, 10:19 PM

131 points

61 comments39 min readLW link 1 review

Intertheoretic utility comparison

Stuart_ArmstrongJul 3, 2018, 1:44 PM

23 points

11 comments6 min readLW link

Is “VNM-agent” one of several options, for what minds can grow up into?

AnnaSalamonDec 30, 2024, 6:36 AM

89 points

55 comments2 min readLW link

Inferring utility functions from locally non-transitive preferences

JanFeb 10, 2022, 10:33 AM

33 points

15 comments8 min readLW link

(universalprior.substack.com)

Satisficers want to become maximisers

Stuart_ArmstrongOct 21, 2011, 4:27 PM

38 points

70 comments1 min readLW link

Thinking about Broad Classes of Utility-like Functions

J BostockJun 7, 2022, 2:05 PM

7 points

0 comments4 min readLW link

I’m no longer sure that I buy dutch book arguments and this makes me skeptical of the “utility function” abstraction

Eli TyreJun 22, 2021, 3:53 AM

42 points

29 comments4 min readLW link

To capture anti-death intuitions, include memory in utilitarianism

Kaj_SotalaJan 15, 2014, 6:27 AM

12 points

34 comments3 min readLW link

Updating Utility Functions

JustinShovelain and Joar Skalse

May 9, 2022, 9:44 AM

41 points

6 comments8 min readLW link

Resolving von Neumann-Morgenstern Inconsistent Preferences

niplavOct 22, 2024, 11:45 AM

38 points

5 comments58 min readLW link

Against utility functions

Qiaochu_YuanJun 19, 2014, 5:56 AM

67 points

87 comments1 min readLW link

Choosing the Zero Point

orthonormalApr 6, 2020, 11:44 PM

170 points

25 comments3 min readLW link 2 reviews

We Are Less Wrong than E. T. Jaynes on Loss Functions in Human Society

Zack_M_DavisJun 5, 2023, 5:34 AM

46 points

14 comments2 min readLW link

The Allais Paradox

Eliezer YudkowskyJan 19, 2008, 3:05 AM

65 points

145 comments3 min readLW link

Vegans need to eat just enough Meat—emperically evaluate the minimum ammount of meat that maximizes utility

Johannes C. MayerDec 22, 2024, 10:08 PM

55 points

35 comments3 min readLW link

Comparing Utilities

abramdemskiSep 14, 2020, 8:56 PM

72 points

31 comments17 min readLW link

money ≠ value

stoneflyApr 30, 2023, 5:47 PM

2 points

3 comments3 min readLW link

The Fundamental Theorem of Asset Pricing: Missing Link of the Dutch Book Arguments

johnswentworthJun 1, 2019, 8:34 PM

42 points

5 comments3 min readLW link

Coherence arguments imply a force for goal-directed behavior

KatjaGraceMar 26, 2021, 4:10 PM

91 points

25 comments11 min readLW link 1 review

(aiimpacts.org)

Game Theory without Argmax [Part 2]

Cleo NardoNov 11, 2023, 4:02 PM

31 points

14 comments13 min readLW link

[link] Choose your (preference) utilitarianism carefully – part 1

Kaj_SotalaJun 25, 2015, 12:06 PM

21 points

6 comments2 min readLW link

Using expected utility for Good(hart)

Stuart_ArmstrongAug 27, 2018, 3:32 AM

42 points

5 comments4 min readLW link

Applying utility functions to humans considered harmful

Kaj_SotalaFeb 3, 2010, 7:22 PM

36 points

116 comments5 min readLW link

Descriptive vs. specifiable values

TsviBTMar 26, 2023, 9:10 AM

17 points

2 comments2 min readLW link

The Isolation Assumption of Expected Utility Maximization

Pedro OliboniAug 6, 2020, 4:05 AM

7 points

1 comment5 min readLW link

Game Theory without Argmax [Part 1]

Cleo NardoNov 11, 2023, 3:59 PM

70 points

18 comments19 min readLW link

[Question] How do bounded utility functions work if you are uncertain how close to the bound your utility is?

GhatanathoahOct 6, 2021, 9:31 PM

13 points

26 comments2 min readLW link

Research Agenda v0.9: Synthesising a human’s preferences into a utility function

Stuart_ArmstrongJun 17, 2019, 5:46 PM

70 points

26 comments33 min readLW link

[Question] Why The Focus on Expected Utility Maximisers?

DragonGodDec 27, 2022, 3:49 PM

118 points

84 comments3 min readLW link

Value/Utility: A History

LorecNov 19, 2024, 11:01 PM

9 points

0 comments10 min readLW link

Deontology for Consequentialists

AlicornJan 30, 2010, 5:58 PM

61 points

255 comments6 min readLW link

Why Subagents?

johnswentworthAug 1, 2019, 10:17 PM

175 points

48 comments7 min readLW link 1 review

An Attempt at Preference Uncertainty Using VNM

[deleted]Jul 16, 2013, 5:20 AM

15 points

33 comments6 min readLW link

Distinctions when Discussing Utility Functions

ozziegooenMar 9, 2024, 8:14 PM

24 points

7 comments1 min readLW link

Post Your Utility Function

tawJun 4, 2009, 5:05 AM

39 points

280 comments1 min readLW link

Shard Theory: An Overview

David UdellAug 11, 2022, 5:44 AM

167 points

34 comments10 min readLW link

Differential Optimization Reframes and Generalizes Utility-Maximization

J BostockDec 27, 2023, 1:54 AM

30 points

2 comments3 min readLW link

Consequentialism & corrigibility

Steven ByrnesDec 14, 2021, 1:23 PM

70 points

35 comments7 min readLW link

Computational efficiency reasons not to model VNM-rational preference relations with utility functions

AlexMennenJul 25, 2018, 2:11 AM

16 points

5 comments3 min readLW link

Person-moment affecting views

KatjaGraceMar 7, 2018, 2:30 AM

17 points

8 comments5 min readLW link

(meteuphoric.wordpress.com)

Stable Pointers to Value III: Recursive Quantilization

abramdemskiJul 21, 2018, 8:06 AM

20 points

4 comments4 min readLW link

Valence Need Not Be Bounded; Utility Need Not Synthesize

LorecNov 20, 2024, 1:37 AM

8 points

0 comments6 min readLW link

LeCun says making a utility function is intractable

IknownothingJun 28, 2023, 6:02 PM

2 points

3 comments1 min readLW link

Reinforcement Learner Wireheading

Nate ShowellJul 8, 2022, 5:32 AM

8 points

2 comments3 min readLW link

Geometric Utilitarianism (And Why It Matters)

StrivingForLegibilityMay 12, 2024, 3:41 AM

34 points

2 comments11 min readLW link

How Not to be Stupid: Brewing a Nice Cup of Utilitea

Psy-KoshMay 9, 2009, 8:14 AM

2 points

17 comments6 min readLW link

The Doubling Box

MestroyerAug 6, 2012, 5:50 AM

22 points

84 comments3 min readLW link

Impossibility results for unbounded utilities

paulfchristianoFeb 2, 2022, 3:52 AM

167 points

109 comments8 min readLW link 1 review

“Solving” selfishness for UDT

Stuart_ArmstrongOct 27, 2014, 5:51 PM

39 points

52 comments8 min readLW link

Harsanyi’s Social Aggregation Theorem and what it means for CEV

AlexMennenJan 5, 2013, 9:38 PM

37 points

90 comments4 min readLW link

The Preference Utilitarian’s Time Inconsistency Problem

Wei DaiJan 15, 2010, 12:26 AM

35 points

107 comments1 min readLW link

[Question] Your Preferences

PeterLJan 5, 2022, 6:49 PM

1 point

4 comments1 min readLW link

Proving the Geometric Utilitarian Theorem

StrivingForLegibilityAug 7, 2024, 1:39 AM

25 points

0 comments8 min readLW link

Degrees of Freedom

sarahconstantinApr 2, 2019, 9:10 PM

103 points

31 comments11 min readLW link

(srconstantin.wordpress.com)

Thatcher’s Axiom

Edward P. KöningsJan 24, 2023, 10:35 PM

10 points

22 comments4 min readLW link

I’m confused. Could someone help?

CronoDASMar 23, 2009, 5:26 AM

1 point

12 comments1 min readLW link

Expected futility for humans

RokoJun 9, 2009, 12:04 PM

14 points

53 comments3 min readLW link

Types of subjective welfare

MichaelStJulesFeb 2, 2024, 9:56 AM

10 points

3 comments1 min readLW link

Sequence overview: Welfare and moral weights

MichaelStJulesAug 15, 2024, 4:22 AM

7 points

0 comments1 min readLW link

Risk aversion vs. concave utility function

dvasyaJan 31, 2012, 6:25 AM

3 points

35 comments3 min readLW link

How Not to be Stupid: Adorable Maybes

Psy-KoshApr 29, 2009, 7:15 PM

1 point

55 comments3 min readLW link

Adaptation Executors and the Telos Margin

PlinthistJun 20, 2022, 1:06 PM

2 points

8 comments5 min readLW link

Increasingly vague interpersonal welfare comparisons

MichaelStJulesFeb 1, 2024, 6:45 AM

5 points

0 comments1 min readLW link

Simplified preferences needed; simplified preferences sufficient

Stuart_ArmstrongMar 5, 2019, 7:39 PM

33 points

6 comments3 min readLW link

More on the Linear Utility Hypothesis and the Leverage Prior

AlexMennenFeb 26, 2018, 11:53 PM

16 points

4 comments9 min readLW link

Wanting to Want

AlicornMay 16, 2009, 3:08 AM

30 points

199 comments2 min readLW link

ACI #3: The Origin of Goals and Utility

Akira PyinyaMay 17, 2023, 8:47 PM

1 point

0 comments6 min readLW link

Why the beliefs/values dichotomy?

Wei DaiOct 20, 2009, 4:35 PM

29 points

156 comments2 min readLW link

ACI#4: Seed AI is the new Perpetual Motion Machine

Akira PyinyaJul 8, 2023, 1:17 AM

−1 points

0 comments6 min readLW link

Is the Endowment Effect Due to Incomparability?

Kevin DorstJul 10, 2023, 4:26 PM

21 points

10 comments7 min readLW link

(kevindorst.substack.com)

Why you can add moral value, and if an AI has moral weights for these moral values, those might be off

Wes RApr 2, 2025, 5:43 PM

0 points

1 comment10 min readLW link

(docs.google.com)

Chasing Infinities

Michael BatemanAug 16, 2021, 1:19 AM

2 points

1 comment9 min readLW link

Nature < Nurture for AIs

scottviteriJun 4, 2023, 8:38 PM

14 points

22 comments7 min readLW link

[Question] Toward a Mathematical Definition of Rationality in Multi-Agent Systems

nekofuguFeb 23, 2025, 5:29 PM

1 point

0 comments1 min readLW link

An Unexpected GPT-3 Decision in a Simple Gamble

casualphysicsenjoyerSep 25, 2022, 4:46 PM

8 points

4 comments1 min readLW link

Allais Hack—Transform Your Decisions!

MBlumeMay 3, 2009, 10:37 PM

22 points

19 comments2 min readLW link

Is risk aversion really irrational ?

kilobugJan 31, 2012, 8:34 PM

54 points

65 comments9 min readLW link

Knightian Uncertainty and Ambiguity Aversion: Motivation

So8resJul 21, 2014, 8:32 PM

48 points

44 comments13 min readLW link

Gradient Ascenders Reach the Harsanyi Hyperplane

StrivingForLegibilityAug 7, 2024, 1:40 AM

4 points

0 comments6 min readLW link

Verifying vNM-rationality requires an ontology

jeyoorMar 13, 2019, 12:03 AM

25 points

5 comments1 min readLW link

When to use quantilization

RyanCareyFeb 5, 2019, 5:17 PM

65 points

5 comments4 min readLW link

Deriving the Geometric Utilitarian Weights

StrivingForLegibilityAug 7, 2024, 1:39 AM

2 points

0 comments11 min readLW link

Tendencies in reflective equilibrium

Scott AlexanderJul 20, 2011, 10:38 AM

51 points

71 comments4 min readLW link

Sublimity vs. Youtube

AlicornMar 18, 2011, 5:33 AM

33 points

63 comments1 min readLW link

Against the Linear Utility Hypothesis and the Leverage Penalty

AlexMennenDec 14, 2017, 6:38 PM

41 points

47 comments11 min readLW link

What we talk about when we talk about maximising utility

Richard_NgoFeb 24, 2018, 10:33 PM

14 points

18 comments4 min readLW link

If it looks like utility maximizer and quacks like utility maximizer...

tawJun 11, 2009, 6:34 PM

20 points

24 comments2 min readLW link

Pascal’s Muggle: Infinitesimal Priors and Strong Evidence

Eliezer YudkowskyMay 8, 2013, 12:43 AM

74 points

402 comments26 min readLW link

Housing Markets, Satisficers, and One-Track Goodhart

J BostockDec 16, 2021, 9:38 PM

2 points

2 comments2 min readLW link

A summary of Savage’s foundations for probability and utility.

SniffnoyMay 22, 2011, 7:56 PM

84 points

92 comments13 min readLW link

Personal Ruminations on AI’s Missing Variable Problem

Thehumanproject.aiMay 26, 2025, 9:11 PM

1 point

0 comments3 min readLW link

Utility is relative

CrimsonChinJan 8, 2024, 2:31 AM

2 points

4 comments2 min readLW link

We Don’t Have a Utility Function

[deleted]Apr 2, 2013, 3:49 AM

73 points

119 comments4 min readLW link

Underappreciated points about utility functions (of both sorts)

SniffnoyJan 4, 2020, 7:27 AM

47 points

61 comments15 min readLW link

Will Values and Competition Decouple?

intersticeSep 28, 2022, 4:27 PM

15 points

11 comments17 min readLW link

The Domain of Your Utility Function

Peter_de_BlancJun 23, 2009, 4:58 AM

42 points

99 comments2 min readLW link

Terminal Values and Instrumental Values

Eliezer YudkowskyNov 15, 2007, 7:56 AM

116 points

46 comments10 min readLW link

Conceptual problems with utility functions

DacynJul 11, 2018, 1:29 AM

22 points

12 comments2 min readLW link

Freedom Is All We Need

Leo GlisicApr 27, 2023, 12:09 AM

−1 points

8 comments10 min readLW link

Utility versus Reward function: partial equivalence

Stuart_ArmstrongApr 13, 2018, 2:58 PM

18 points

5 comments5 min readLW link

Better difference-making views

MichaelStJulesDec 21, 2024, 6:27 PM

7 points

0 comments1 min readLW link

Galatea and the windup toy

Nicolas VillarrealOct 26, 2024, 2:52 PM

−3 points

0 comments13 min readLW link

(nicolasdvillarreal.substack.com)

Atlas: Stress-Testing ASI Value Learning Through Grand Strategy Scenarios

NeilFoxFeb 17, 2025, 11:55 PM

1 point

0 comments2 min readLW link

Why modelling multi-objective homeostasis is essential for AI alignment (and how it helps with AI safety as well)

Roland PihlakasJan 12, 2025, 3:37 AM

46 points

7 comments10 min readLW link

Expected Utility, Geometric Utility, and Other Equivalent Representations

StrivingForLegibilityNov 20, 2024, 11:28 PM

10 points

0 comments11 min readLW link

A fungibility theorem

NisanJan 12, 2013, 9:27 AM

35 points

66 comments6 min readLW link

Individual Utilities Shift Continuously as Geometric Weights Shift

StrivingForLegibilityAug 7, 2024, 1:41 AM

2 points

0 comments17 min readLW link

Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs

Matrice JacobineFeb 12, 2025, 9:15 AM

53 points

49 comments1 min readLW link

(www.emergent-values.ai)

[Question] Why does expected utility matter?

Marco DiscendentiDec 25, 2023, 2:47 PM

18 points

21 comments4 min readLW link

Bridging Expected Utility Maximization and Optimization

Daniel HerrmannAug 5, 2022, 8:18 AM

25 points

5 comments14 min readLW link

A Pedagogical Guide to Corrigibility

A.H.Jan 17, 2024, 11:45 AM

6 points

3 comments16 min readLW link

The genie knows, but doesn’t care

Rob BensingerSep 6, 2013, 6:42 AM

121 points

495 comments8 min readLW link

(A Failed Approach) From Precedent to Utility Function

Akira PyinyaApr 29, 2023, 9:55 PM

0 points

2 comments4 min readLW link

The Lifespan Dilemma

Eliezer YudkowskySep 10, 2009, 6:45 PM

61 points

220 comments7 min readLW link

Universal agents and utility functions

AnjaNov 14, 2012, 4:05 AM

43 points

38 comments6 min readLW link

Utility functions and probabilities are entangled

Thomas KwaJul 26, 2022, 5:36 AM

15 points

5 comments1 min readLW link

Expected utility, unlosing agents, and Pascal’s mugging

Stuart_ArmstrongJul 28, 2014, 6:05 PM

32 points

54 comments5 min readLW link

Solution to the two envelopes problem for moral weights

MichaelStJulesFeb 19, 2024, 12:15 AM

9 points

1 comment1 min readLW link

Expected utility without the independence axiom

Stuart_ArmstrongOct 28, 2009, 2:40 PM

20 points

68 comments4 min readLW link

Take 7: You should talk about “the human’s utility function” less.

Charlie SteinerDec 8, 2022, 8:14 AM

50 points

22 comments2 min readLW link

Fake Utility Functions

Eliezer YudkowskyDec 6, 2007, 4:55 PM

72 points

64 comments4 min readLW link

Optimisation Measures: Desiderata, Impossibility, Proposals

mattmacdermott and Alexander Gietelink Oldenziel

Aug 7, 2023, 3:52 PM

36 points

9 comments1 min readLW link

Zut Allais!

Eliezer YudkowskyJan 20, 2008, 3:18 AM

59 points

51 comments6 min readLW link

The Linguistic Blind Spot of Value-Aligned Agency, Natural and Artificial

Roman LeventovFeb 14, 2023, 6:57 AM

6 points

0 comments2 min readLW link

(arxiv.org)

Logarithms and Total Utilitarianism

Pablo VillalobosAug 9, 2018, 8:49 AM

37 points

31 comments4 min readLW link

A gentle primer on caring, including in strange senses, with applications

KaarelAug 30, 2022, 8:05 AM

10 points

4 comments18 min readLW link

What resources have increasing marginal utility?

Qiaochu_YuanJun 14, 2014, 3:43 AM

58 points

63 comments1 min readLW link

Allais Malaise

Eliezer YudkowskyJan 21, 2008, 12:40 AM

41 points

38 comments2 min readLW link

Humans are utility monsters

PhilGoetzAug 16, 2013, 9:05 PM

124 points

216 comments2 min readLW link

Utility Maximization = Description Length Minimization

johnswentworthFeb 18, 2021, 6:04 PM

216 points

50 comments6 min readLW link

Coherent behaviour in the real world is an incoherent concept

Richard_NgoFeb 11, 2019, 5:00 PM

51 points

17 comments9 min readLW link

The Unified Theory of Normative Ethics

Thane RuthenisJun 17, 2022, 7:55 PM

8 points

0 comments6 min readLW link

Pascal’s Mugging: Tiny Probabilities of Vast Utilities

Eliezer YudkowskyOct 19, 2007, 11:37 PM

112 points

354 comments4 min readLW link

The Impossibility of a Rational Intelligence Optimizer

Nicolas VillarrealJun 6, 2024, 4:14 PM

−9 points

5 comments14 min readLW link

Probability is Real, and Value is Complex

abramdemskiJul 20, 2018, 5:24 AM

80 points

21 comments6 min readLW link

Against Discount Rates

Eliezer YudkowskyJan 21, 2008, 10:00 AM

38 points

81 comments2 min readLW link

Building AI safety benchmark environments on themes of universal human values

Roland PihlakasJan 3, 2025, 4:24 AM

18 points

3 comments8 min readLW link

(docs.google.com)

Why Bet Kelly?

Joe ZimmermanNov 29, 2022, 6:47 PM

16 points

4 comments4 min readLW link

The “Measuring Stick of Utility” Problem

johnswentworthMay 25, 2022, 4:17 PM

74 points

25 comments3 min readLW link

Three ways that “Sufficiently optimized agents appear coherent” can be false

Wei DaiMar 5, 2019, 9:52 PM

65 points

3 comments3 min readLW link

Gradations of moral weight

MichaelStJulesFeb 29, 2024, 11:08 PM

1 point

0 comments1 min readLW link

[Question] “Do Nothing” utility function, 3½ years later?

niplavJul 20, 2020, 11:09 AM

5 points

3 comments1 min readLW link

AI Alignment 2018-19 Review

Rohin ShahJan 28, 2020, 2:19 AM

126 points

6 comments35 min readLW link

Agents which are EU-maximizing as a group are not EU-maximizing individually

MlxaDec 4, 2023, 6:49 PM

3 points

2 comments2 min readLW link

Notable runaway-optimiser-like LLM failure modes on Biologically and Economically aligned AI safety benchmarks for LLMs with simplified observation format

Roland Pihlakas, Sruthi Kuriakose and shrutidattagupta

Mar 16, 2025, 11:23 PM

45 points

7 comments10 min readLW link

On dollars, utility, and crack cocaine

PhilGoetzApr 4, 2009, 12:00 AM

16 points

100 comments2 min readLW link

[Question] Doing Nothing Utility Function

k64Sep 26, 2024, 10:05 PM

9 points

9 comments1 min readLW link

The Geometric Importance of Side Payments

StrivingForLegibilityAug 7, 2024, 1:38 AM

8 points

4 comments3 min readLW link

Are pre-specified utility functions about the real world possible in principle?

mloganJul 11, 2018, 6:46 PM

24 points

7 comments4 min readLW link

Ethodynamics of Omelas

dr_sJun 10, 2023, 4:24 PM

83 points

18 comments9 min readLW link 1 review

Buridan’s ass in coordination games

jessicataJul 16, 2018, 2:51 AM

52 points

26 comments10 min readLW link

Complex Behavior from Simple (Sub)Agents

moridinamaelMay 10, 2019, 9:44 PM

113 points

14 comments9 min readLW link 1 review

Big Advance in Infinite Ethics

bwestNov 28, 2017, 3:10 PM

32 points

13 comments5 min readLW link

Why you must maximize expected utility

BenyaDec 13, 2012, 1:11 AM

50 points

76 comments21 min readLW link

[Aspiration-based designs] A. Damages from misaligned optimization – two more models

Jobst Heitzig and Simon Dima

Jul 15, 2024, 2:08 PM

6 points

0 comments9 min readLW link

Arguments for utilitarianism are impossibility arguments under unbounded prospects

MichaelStJulesOct 7, 2023, 9:08 PM

7 points

7 comments21 min readLW link

[Question] Mathematical models of Ethics

VictorsMar 8, 2023, 5:40 PM

4 points

2 comments1 min readLW link

Real-world examples of money-pumping?

sixes_and_sevensApr 25, 2013, 1:49 PM

28 points

97 comments1 min readLW link

Only humans can have human values

PhilGoetzApr 26, 2010, 6:57 PM

49 points

161 comments17 min readLW link

Alignment, conflict, powerseeking

Oliver SourbutNov 22, 2023, 9:47 AM

6 points

1 comment1 min readLW link

Why Universal Comparability of Utility?

AKMay 13, 2018, 12:10 AM

8 points

16 comments1 min readLW link

VNM expected utility theory: uses, abuses, and interpretation

AcademianApr 17, 2010, 8:23 PM

36 points

51 comments10 min readLW link

No comments.