“Robin Hanson has suggested that the logic of a leverage penalty should stem from the general improbability of individuals being in a unique position to affect many others (which is why I called it a leverage penalty).”
As I mentioned in a recent discussion post, I have difficulty accepting Robin’s solution as valid—for starters it has the semblance of possibly working in the case of people who care about people, because that’s a case that seems as it should be symmetrical, but how would it e.g. work for a Clippy who is tempted with the creation of paperclips? There’s no symmetry here because paperclips don’t think and Clippy knows paperclips don’t think.
And how would it work if the AI in question in asked to evaluate whether such a hypothetical offer should be accepted by a random individual or not? Robin’s anthropic solution says that the AI should judge that someone else ought hypothetically take the offer, but it would judge the probabilities differently if it had to judge things in actual life. That sounds as if it ought violate basic principles of rationality?
My effort to steelman Robin’s argument attempted to effectively replace “lives” with “structures of type X that the observer cares about and will be impacted”, and “unique position to affect” with “unique position of not directly observing”—hence Law of Visible Impact.
Yeah, that’s probably generalized enough that it works, though I suppose it didn’t really quite click for me at first because I was focusing on Robin’s “ability to affect” as corresponding to the term “unique position”, and I was instead thinking of “inability to perceive”—but that’s also a unique position, so I suppose the causal node version you mention covers that indeed. Thanks.
“Robin Hanson has suggested that the logic of a leverage penalty should stem from the general improbability of individuals being in a unique position to affect many others (which is why I called it a leverage penalty).”
As I mentioned in a recent discussion post, I have difficulty accepting Robin’s solution as valid—for starters it has the semblance of possibly working in the case of people who care about people, because that’s a case that seems as it should be symmetrical, but how would it e.g. work for a Clippy who is tempted with the creation of paperclips? There’s no symmetry here because paperclips don’t think and Clippy knows paperclips don’t think.
And how would it work if the AI in question in asked to evaluate whether such a hypothetical offer should be accepted by a random individual or not? Robin’s anthropic solution says that the AI should judge that someone else ought hypothetically take the offer, but it would judge the probabilities differently if it had to judge things in actual life. That sounds as if it ought violate basic principles of rationality?
My effort to steelman Robin’s argument attempted to effectively replace “lives” with “structures of type X that the observer cares about and will be impacted”, and “unique position to affect” with “unique position of not directly observing”—hence Law of Visible Impact.
I think this is captured by the notion that a causal node should only improbably occupy a unique position on a causal graph?
Yeah, that’s probably generalized enough that it works, though I suppose it didn’t really quite click for me at first because I was focusing on Robin’s “ability to affect” as corresponding to the term “unique position”, and I was instead thinking of “inability to perceive”—but that’s also a unique position, so I suppose the causal node version you mention covers that indeed. Thanks.