I’m am not sure if with “paragraph about retargetability” you are attaching a label to the paragraph or expressing specific care about “retargetability”. I’ll assume the latter.
I used the term “retargetable agents” to mean “agents with a defined utility-swap-operation”, because in general an agent may not be defined in a way that makes it clear what does it mean to “change its utility”. So, whenever I invoke comparisons of different utilities on the “same” agent, I want a way to mark this important requirement. I think the term “retargetable agent” is a good choice, I found it in TurnTrout’s sequence, and I think I’m not misusing it even though I use it to mean something a bit different.
Even without cross-utility comparisons, when talking above about different agents with the same utility function, I preferred to say “retargetable agents”, because: what does it mean to say that an agent has a certain utility function, if the agent is not also a perfect Bayesian inductor? If I’m talking about measures of optimization, I probably want to compare “dumb” agents with “smart” agents, and not only in the sense of having “dumb” or “smart” priors. So when I contemplate the dumb agent failing in getting more utility in a way that I can devise but it can’t, shouldn’t I consider it not a utility maximizer? If I want to say that an algorithm is maximizing utility, but stopping short of perfection, at some point I need to be specific about what kind of algorithm I’m talking about. It seems to me that a convenient agnostic thing I could do is considering a class of algorithms which have a “slot” for the utility function.
Related: when Yudkowsky talks about utility maximizers, he doesn’t just say “the superintelligence is a utility maximizer”, he says “the superintelligence is efficient relative to you, fact from which you can make some inferences and not others, etc.”
Your last paragraph about retargetability sounds quite interesting. Do you have a reference for this story?
No, I’m making thoughts up as I argue.
I’m am not sure if with “paragraph about retargetability” you are attaching a label to the paragraph or expressing specific care about “retargetability”. I’ll assume the latter.
I used the term “retargetable agents” to mean “agents with a defined utility-swap-operation”, because in general an agent may not be defined in a way that makes it clear what does it mean to “change its utility”. So, whenever I invoke comparisons of different utilities on the “same” agent, I want a way to mark this important requirement. I think the term “retargetable agent” is a good choice, I found it in TurnTrout’s sequence, and I think I’m not misusing it even though I use it to mean something a bit different.
Even without cross-utility comparisons, when talking above about different agents with the same utility function, I preferred to say “retargetable agents”, because: what does it mean to say that an agent has a certain utility function, if the agent is not also a perfect Bayesian inductor? If I’m talking about measures of optimization, I probably want to compare “dumb” agents with “smart” agents, and not only in the sense of having “dumb” or “smart” priors. So when I contemplate the dumb agent failing in getting more utility in a way that I can devise but it can’t, shouldn’t I consider it not a utility maximizer? If I want to say that an algorithm is maximizing utility, but stopping short of perfection, at some point I need to be specific about what kind of algorithm I’m talking about. It seems to me that a convenient agnostic thing I could do is considering a class of algorithms which have a “slot” for the utility function.
Related: when Yudkowsky talks about utility maximizers, he doesn’t just say “the superintelligence is a utility maximizer”, he says “the superintelligence is efficient relative to you, fact from which you can make some inferences and not others, etc.”