Utilitarianism

Following is a brief of some parts of this paper on Aligning AI with shared human values.

The “why” behind most human actions is a universal seeking of pleasure and aversion to pain, so it seems natural that morality should be focused on “the greatest good for the greatest number of people”.

This is why Utilitarianism emerged as a key idea in human values- that we make moral decisions from the position of a benevolent disinterested spectator.

In the paper this is mathematically translated as “maximizing the expectation of the sum of everyone’s utility functions.”

A utility function maps various scenarios to a scalar representing the pleasure associated with them. For eg: Completing a project on time and receiving complements for it is more pleasurable that Missing the project deadline.

This understanding of utility can help AI agents deal with imprecise commands by choosing the alternative with higher utility.

Utility can’t be modeled as a regression task because utilities only hold the ordering under positive affine transformations i.e.

a(u1) + b > a(u2) +b preserves u1>u2 only when a is positive and we can’t guarantee that when performing regression.

To remove any biases that may occur because people have different perspectives, we remove the examples where there is substantial disagreement in ranking