It could be a situation where we have very large resources and an exponentially large concept space.
If we have enough uncertainty about what bit of concept space we’re looking for to make a power-law distribution appropriate, then “very large” can still be “severely limited” (and indeed must be to make the amount of resources going to each kind of maybe-chocolate be small).
true utility functions [...] problems with utility functions that can tend to infinity.
Yes. But I wouldn’t characterize this as giving the AI an approximation to our utility function that avoids problems to do with infinity—because I don’t think we have a utility function in a strong enough sense for this to be distinguishable from giving the AI our utility function. We have a vague hazy idea of utility that we can (unreliably, with great effort) by a little bit quantitative about in “small” easy cases; we don’t truly either feel or behave according to any utility function; but we want to give the AI a utility function that will make it do things we approve of, even though its decisions may be influenced by looking at things far beyond our cognitive capacity.
It’s not clear to me that that’s a sensible project at all, but it certainly isn’t anything so simple as taking something that Really Is our utility function but misbehaves “at infinity” and patching it to tame the misbehaviour :-).
I don’t think we have a utility function in a strong enough sense
All the underlying axioms of expected utility theory (EUT) seem self-evident to me. The fact that most people don’t shut up and multiply is something I would regard as more of their problem then a problem with EUT. Having said that, even if mapping emotions onto utility values makes sense from some abstract theoretical point of view, its a lot harder in practice due to reasons such as the complex fragility of human values which have been thoroughly discussed already.
Of course, the degree to which the average LWer approximates EUT in their feelings and behaviour is probably far greater than that of the average person. At non-LW philosophy meetups I have been told I am ‘disturbingly analytical’ for advocating EUT.
It’s not clear to me that that’s a sensible project at all, but it certainly isn’t anything so simple as taking something that Really Is our utility function but misbehaves “at infinity” and patching it to tame the misbehaviour :-).
Well, I suppose there is the option of ‘empathic AI’. Reverse engineering the brain and dialling compassion up to 11 is in many ways easier and more brute-force-able than creating de novo AI and it avoids all these defining utility function problems, the Basilisk, and Lob’s theory. The downsides of course include a far greater unpredictability, the AI would definitely be sentient and some would argue the possibility of catastrophic failure during self-modification.
The fact that most people don’t shut up and multiply is something I would regard as more of their problem than a problem with EUT.
I didn’t say that we shouldn’t have a utility function, I said we don’t. Our actual preferences are incompletely defined, inconsistent, and generally a mess. I suspect this is true even for most LWers, and I’m pretty much certain it’s true for almost all people, and (in so far as it’s meaningful) for the human race as a whole.
If we have enough uncertainty about what bit of concept space we’re looking for to make a power-law distribution appropriate, then “very large” can still be “severely limited” (and indeed must be to make the amount of resources going to each kind of maybe-chocolate be small).
Yes. But I wouldn’t characterize this as giving the AI an approximation to our utility function that avoids problems to do with infinity—because I don’t think we have a utility function in a strong enough sense for this to be distinguishable from giving the AI our utility function. We have a vague hazy idea of utility that we can (unreliably, with great effort) by a little bit quantitative about in “small” easy cases; we don’t truly either feel or behave according to any utility function; but we want to give the AI a utility function that will make it do things we approve of, even though its decisions may be influenced by looking at things far beyond our cognitive capacity.
It’s not clear to me that that’s a sensible project at all, but it certainly isn’t anything so simple as taking something that Really Is our utility function but misbehaves “at infinity” and patching it to tame the misbehaviour :-).
All the underlying axioms of expected utility theory (EUT) seem self-evident to me. The fact that most people don’t shut up and multiply is something I would regard as more of their problem then a problem with EUT. Having said that, even if mapping emotions onto utility values makes sense from some abstract theoretical point of view, its a lot harder in practice due to reasons such as the complex fragility of human values which have been thoroughly discussed already.
Of course, the degree to which the average LWer approximates EUT in their feelings and behaviour is probably far greater than that of the average person. At non-LW philosophy meetups I have been told I am ‘disturbingly analytical’ for advocating EUT.
Well, I suppose there is the option of ‘empathic AI’. Reverse engineering the brain and dialling compassion up to 11 is in many ways easier and more brute-force-able than creating de novo AI and it avoids all these defining utility function problems, the Basilisk, and Lob’s theory. The downsides of course include a far greater unpredictability, the AI would definitely be sentient and some would argue the possibility of catastrophic failure during self-modification.
I didn’t say that we shouldn’t have a utility function, I said we don’t. Our actual preferences are incompletely defined, inconsistent, and generally a mess. I suspect this is true even for most LWers, and I’m pretty much certain it’s true for almost all people, and (in so far as it’s meaningful) for the human race as a whole.