An agent is composed of two components: a predictive model of the world, and a utility function for evaluating world states
I would say that the ‘intelligence’ of an agent corresponds to the sophistication and accuracy of their world model. The ‘friendliness’ of an agent depends on how closely their utility function aligns with human values.
What is baffling to me is the vague idea that developing the theory of friendliness has any significant synergy with developing the theory of intelligence. (Such an idea has surfaced in discussions of on the plausibility of SIAI developing friendly AI before unfriendly AI is developed by others.)
One argument for the latter (and the only one I can think of) is that:
a) A perfect world model is not possible: one must make the best tradeoff given limited resources
b) The more relevant a certain feature of the world is to the utility function, the more accurately we want to model it in the world model
However, for the large part, the world model would differ, in totality, very little between a paperclip maximizer and a friendly AI. While the Friendly AI certainly has to keep track of more things which are irrelevant to the paperclip maximizer, both AIs would have to have world models which have to be able to model human behavior in order for the AIs to be effective, which one would expect would account for the bulk of the complexity of the world model in the first place.
Intelligence vs Friendliness
An agent is composed of two components: a predictive model of the world, and a utility function for evaluating world states
I would say that the ‘intelligence’ of an agent corresponds to the sophistication and accuracy of their world model. The ‘friendliness’ of an agent depends on how closely their utility function aligns with human values.
What is baffling to me is the vague idea that developing the theory of friendliness has any significant synergy with developing the theory of intelligence. (Such an idea has surfaced in discussions of on the plausibility of SIAI developing friendly AI before unfriendly AI is developed by others.)
One argument for the latter (and the only one I can think of) is that:
a) A perfect world model is not possible: one must make the best tradeoff given limited resources
b) The more relevant a certain feature of the world is to the utility function, the more accurately we want to model it in the world model
However, for the large part, the world model would differ, in totality, very little between a paperclip maximizer and a friendly AI. While the Friendly AI certainly has to keep track of more things which are irrelevant to the paperclip maximizer, both AIs would have to have world models which have to be able to model human behavior in order for the AIs to be effective, which one would expect would account for the bulk of the complexity of the world model in the first place.