David Scott Krueger (formerly: capybaralet) comments on Assuming we’ve solved X, could we do Y...

David Scott Krueger (formerly: capybaralet) 17 Dec 2018 4:43 UTC
3 points
0
I actually don’t understand why you say they can’t be fully disentangled.
IIRC, it seemed to me during the discussion that your main objection was around whether (e.g.) “arbitrarily long deliberation (ALD)” was (or could be) fully specified in a way that accounts properly for things like deception, manipulation, etc. More concretely, I think you mentioned the possibility of an AI affecting the deliberation process in an undesirable way.
But I think it’s reasonable to assume (within the bounds of a discussion) that there is a non-terrible way (in principle) to specify things like “manipulation”. So do you disagree? Or is your objection something else entirely?
What links here?
- How much can value learning be disentangled? by Stuart_Armstrong (29 Jan 2019 14:17 UTC; 22 points)
- Stuart_Armstrong 29 Jan 2019 14:17 UTC
  3 points
  0
  Parent
  Hey there!
  
  Given a longer answer here: https://www.lesswrong.com/posts/Q7WiHdSSShkNsgDpa/how-much-can-value-learning-be-disentangled