Owain_Evans comments on 2021 AI Alignment Literature Review and Charity Comparison

Owain_Evans 24 Dec 2021 17:44 UTC
LW: 4 AF: 2
0
AF
Evans et al.’s Truthful AI: Developing and governing AI that does not lie is a detailed and length piece discussing a lot of issues around truthfulness for AI agents. This includes conceptual, practical and governance issues, especially with regard conversation bots. They argue for truthfulness (or at least, non-negligently-false)
The link should include “that does not lie”.
length --> lenghty

Lin et al.‘s TruthfulQA: Measuring How Models Mimic Human Falsehoods provides a series of test questions to study how ‘honest’ various text models are. Of course, these models are trying to copy human responses, not be honest, so because many of the questions allude to common misconceptions, the more advanced models ‘lie’ more often. Interestingly they also used GPT-3 to evaluate the truth of these answers. See also the discussion here. Researchers from OpenPhil were also named authors on the paper. #Other
“OpenPhil” --> OpenAI
As a minor clarification, all the results in the paper are based on human evaluation of truth. But we show that GPT-3 can be used as a fairly reliably substitute for human evaluation under certain conditions.
- Larks 24 Dec 2021 21:42 UTC
  LW: 8 AF: 4
  0
  AF Parent
  Thanks, fixed in both copies.