Linear aggregation works just fine for HapMax: it maximizes the complex neurological definition it wants to maximize, just as it wants. All the “catastrophes” you point out are not due to an agent having an internal error, but rather a conflict between what they want and what humans want. Additionally, the aggregation procedure is completely defined if you have a complete utility function.
So the problem is really “how does a human-like utility function look?” Because, as you argue, anything that tries to take a shortcut to human-like behavior can lead to catastrophes.
Yes, linear aggregation in HapMax produces a result that agrees with HapMax. But it does not agree with the intuitions that HapMax’s creators had, and I think that utility functions which use linear aggregation will in general tend to produce results that are counterintuitive in dramatic (and therefore likely very bad) ways.
Rather than “counterintuitive,” I’d prefer “inhuman” or “unfriendly.” If the creators had linear utility functions on the same stuff, HapMax would fit in just fine. If humans have a near-linear utility function on something, then an AI that has a linear utility function there will cause no catastrophes. I can’t think of any problems unique to linear weighting—the problem is really when the weighting isn’t like ours.
Linear aggregation works just fine for HapMax: it maximizes the complex neurological definition it wants to maximize, just as it wants. All the “catastrophes” you point out are not due to an agent having an internal error, but rather a conflict between what they want and what humans want. Additionally, the aggregation procedure is completely defined if you have a complete utility function.
So the problem is really “how does a human-like utility function look?” Because, as you argue, anything that tries to take a shortcut to human-like behavior can lead to catastrophes.
Yes, linear aggregation in HapMax produces a result that agrees with HapMax. But it does not agree with the intuitions that HapMax’s creators had, and I think that utility functions which use linear aggregation will in general tend to produce results that are counterintuitive in dramatic (and therefore likely very bad) ways.
Rather than “counterintuitive,” I’d prefer “inhuman” or “unfriendly.” If the creators had linear utility functions on the same stuff, HapMax would fit in just fine. If humans have a near-linear utility function on something, then an AI that has a linear utility function there will cause no catastrophes. I can’t think of any problems unique to linear weighting—the problem is really when the weighting isn’t like ours.