But we do know of simple utility functions which don’t fear Goodharting.
Agreed.
The fact that we fear Goodharting is information about our values.
Yeah, well said.
I feel like discussing the EM scenario and how it may/may not differ from the general AI scenario would have been useful
Yeah would love to discuss this. I have the sense that intelligent systems vary along a dimension of “familiarity of building blocks” or something like that, in which systems built out of groups of humans are at one end, and system built from first principles out of basic algorithms are at the other end. Inbetween are machine learning systems (closer to basic algorithms) and emulated humans (closer to groups of humans).
Stuart’s example in that post is strange. He writes “suppose you asked the following question to a UR-optimiser” but what does it even mean to ask a question to a UR-optimiser? A UR-optimiser presumably is the type of thing that finds left/right policies (in his example) that optimise UR. How is a UR-optimiser the type of thing that answers questions at all (other than questions about which policy performs better on UR)? His question begins “Suppose that a robot is uncertain between UR and one other reward function, which is mutually exclusive with UR...” but that just does not seem like the kind of question that ought to be posed to a UR-optimiser.
I guess you could rephrase it as “suppose a UR optimizer had a button which randomly caused an agent to be a UR optimizer” or something along those lines and have similair results.
Yeah, instead of asking it a question, we can just see what happens when we put it in a world where it can influence another robot going left or right. Set it up the right way, and Stuart’s arguement should go through.
Agreed.
Yeah, well said.
Yeah would love to discuss this. I have the sense that intelligent systems vary along a dimension of “familiarity of building blocks” or something like that, in which systems built out of groups of humans are at one end, and system built from first principles out of basic algorithms are at the other end. Inbetween are machine learning systems (closer to basic algorithms) and emulated humans (closer to groups of humans).
Stuart’s example in that post is strange. He writes “suppose you asked the following question to a UR-optimiser” but what does it even mean to ask a question to a UR-optimiser? A UR-optimiser presumably is the type of thing that finds left/right policies (in his example) that optimise UR. How is a UR-optimiser the type of thing that answers questions at all (other than questions about which policy performs better on UR)? His question begins “Suppose that a robot is uncertain between UR and one other reward function, which is mutually exclusive with UR...” but that just does not seem like the kind of question that ought to be posed to a UR-optimiser.
I guess you could rephrase it as “suppose a UR optimizer had a button which randomly caused an agent to be a UR optimizer” or something along those lines and have similair results.
Do you mean that as a way to understand what Stuart is talking about when he says that a UR-optimiser would answer questions in a certain way?
Yeah, instead of asking it a question, we can just see what happens when we put it in a world where it can influence another robot going left or right. Set it up the right way, and Stuart’s arguement should go through.