[link] Thoughts on defining human preferences

https://​​docs.google.com/​​document/​​d/​​1jDGpIT3gKZQZByO6A036dojRKMv62KEDEfEz87VuDoY/​​

Abstract: Discussion of how we might want to define human preferences, particularly in the context of building an AI intended to learn and implement those preferences. Starts with actual arguments about the applicability of the VNM utility theorem, then towards the end gets into hypotheses that are less well defended but possibly more important. At the very end, suggests that current hypothesizing about AI safety might be overemphasizing “discovering our preferences” over “creating our preferences”.