I don’t see why alignment inherently needs more realism than intelligence
I was focused solely on testing alignment. I’m pretty confused about how much realism is needed to produce alignment.
explicitly we are designing sims with dualistic physics (mind is a fully separate non matter-based phenomenon)
I guess I should have realized that, but it did not seem obvious enough for me to notice.
So it doesn’t seem that much any extra effort is required to get reasonable sized populations of agents
I’m skeptical. I don’t know whether I’ll manage to quantify my intuitions well enough to figure out how much we disagree here.
The AGI in simboxes shouldn’t be aware of concepts like utility functions, let alone the developers or the developers modifying their utility functions (or minds).
It seems likely that some AGIs would notice changes in behavior. I expect it to be hard to predict what they’ll infer.
But now that I’ve thought a bit more about this, I don’t see any likely path to an AGI finding a way to resist the change in utility function.
I was focused solely on testing alignment. I’m pretty confused about how much realism is needed to produce alignment.
I guess I should have realized that, but it did not seem obvious enough for me to notice.
I’m skeptical. I don’t know whether I’ll manage to quantify my intuitions well enough to figure out how much we disagree here.
It seems likely that some AGIs would notice changes in behavior. I expect it to be hard to predict what they’ll infer.
But now that I’ve thought a bit more about this, I don’t see any likely path to an AGI finding a way to resist the change in utility function.