If I were a well-intentioned AI...

Stuart_Armstrong

22 Jan 2020 14:30 UTC

I look at how some of the major problems in AI alignment—Goodhart problems, distributional shift, mesaoptimising, etc.. - look from the perspective of a well-intentioned but ignorant AI. And if this perspective can suggest methods of safety improvements.

If I were a well-intentioned AI… I: Image classifier

Stuart_Armstrong26 Feb 2020 12:39 UTC

35 points

4 comments5 min readLW link

If I were a well-intentioned AI… II: Acting in a world

Stuart_Armstrong27 Feb 2020 11:58 UTC

20 points

0 comments3 min readLW link

If I were a well-intentioned AI… III: Extremal Goodhart

Stuart_Armstrong28 Feb 2020 11:24 UTC

22 points

0 comments5 min readLW link

If I were a well-intentioned AI… IV: Mesa-optimising

Stuart_Armstrong2 Mar 2020 12:16 UTC

26 points

2 comments6 min readLW link