I’ve recently found that my utility function valued personal status and fame a whole lot more than I thought it did—I previously had thought that it mostly relied on the consequences of my actions for other sentiences, but it turned out I was wrong. Obviously, this is a valuable insight—I definitely want to know what my current utility function is; from there, I can decide whether I should change my actions or my utility function if the two aren’t coordinated.
I did this by imagining how I would feel if I found out certain things. For example, how would I feel if everyone else was also trying to save the world? The emotional response I had was sort of a hollow feeling in the pit of my stomach, like I was a really mediocre being. This obviously wasn’t a result of calculating that the marginal utility of my actions would be a whole lot lower in this hypothetical world (and so I should go do something else); instead, it was the fact that me trying to save the world didn’t make me special any more—I wouldn’t stand out, in this sort of world.
(Epilogue: I decided that I hadn’t done a good enough job programming my brain and am attempting to modify my utility function to rely on the world actually getting saved.)
Get data points on your current utility function via hypotheticals
I’ve recently found that my utility function valued personal status and fame a whole lot more than I thought it did—I previously had thought that it mostly relied on the consequences of my actions for other sentiences, but it turned out I was wrong. Obviously, this is a valuable insight—I definitely want to know what my current utility function is; from there, I can decide whether I should change my actions or my utility function if the two aren’t coordinated.
I did this by imagining how I would feel if I found out certain things. For example, how would I feel if everyone else was also trying to save the world? The emotional response I had was sort of a hollow feeling in the pit of my stomach, like I was a really mediocre being. This obviously wasn’t a result of calculating that the marginal utility of my actions would be a whole lot lower in this hypothetical world (and so I should go do something else); instead, it was the fact that me trying to save the world didn’t make me special any more—I wouldn’t stand out, in this sort of world.
(Epilogue: I decided that I hadn’t done a good enough job programming my brain and am attempting to modify my utility function to rely on the world actually getting saved.)
Discussion: What other hypotheticals are useful?