This kind of thing seems totally backwards to me. In what sense do I lose if I “bulldoze my values”? It only makes sense to describe me as “having values” insofar as I don’t do things like bulldoze them! It seems like a way to pretend existential choices don’t exist—just assume you have a deep true utility function, and then do whatever maximizes it.
Why should I care about “teasing out” my deep values? I place no value on my unknown, latent values at present, and I see no reason to think I should!
Isn’t the worst case scenario just leaving the aliens alone? If I’m worried I’m going to fuck up some alien’s preferences, I’m just not going to give them any power or wisdom!
I guess you think we’re likely to fuck up the alien’s preferences by light of their reflection process, but not our reflection process. But this just recurs to the meta level. If I really do care about an alien’s preferences (as it feels like I do), why can’t I also care about their reflection process (which is just a meta preference)?
I feel like the meta level at which I no longer care about doing right by an alien is basically the meta level at which I stop caring about someone doing right by me. In fact, this is exactly how it seems mentally constructed: what I mean by “doing right by [person]” is “what that person would mean by ‘doing right by me’”. This seems like either something as simple as it naively looks, or sensitive to weird hyperparameters I’m not sure I care about anyway.