Well, I don’t have a good answer but I also do have some questions in this direction that I will just pose here.
Why can’t we have the utility function be some sort of lexicographical satisficer of sub parts of itself, why do we have to make the utility function consequentialist?
Standard answer: Because of instrumental convergence, duh.
Me: Okay but why would instrumental convergence select for utility functions that are consequantialist?
Standard answer: Because they obviously outperform the ones that don’t select for the consequences or like what do you mean?
Me: Fair but how do you define your optimisation landscape, through what type of decision theory are you looking at this from? Why is there not a universe where your decision theory is predicated on virtues or your optimisation function is defined over sets of processes that you see in the world?
Answer (maybe)?: Because this would go against things like newcombs problem or other decision theory problem that we have.
Me: And why does this matter? What if we viewed this through something like process philosophy and we only cared about the processes that we set in motion in the world? Why isn’t this an as valid way of setting up the utility function? Similar to how a eculidean geometry is as valid as a hyperbolic one or one logic system to another?
So, that was a debate with myself? Happy to hear anyone’s thoughts here.
This doesn’t make complete sense to me, but you are going down a line of thought I recognize.
There are certainly stable utility functions which, while having some drawbacks, don’t result in dangerous behavior from superintelligences. Finding a good one doesn’t seem all that difficult.
The real nasty challenge is how to build a superintelligence that has the utility function we want it to have. If we could do this, then we could start by choosing an extremely conservative utility function and slowly and cautiously iterate towards a balance of safe and useful.
Well, I don’t have a good answer but I also do have some questions in this direction that I will just pose here.
Why can’t we have the utility function be some sort of lexicographical satisficer of sub parts of itself, why do we have to make the utility function consequentialist?
Standard answer: Because of instrumental convergence, duh.
Me: Okay but why would instrumental convergence select for utility functions that are consequantialist?
Standard answer: Because they obviously outperform the ones that don’t select for the consequences or like what do you mean?
Me: Fair but how do you define your optimisation landscape, through what type of decision theory are you looking at this from? Why is there not a universe where your decision theory is predicated on virtues or your optimisation function is defined over sets of processes that you see in the world?
Answer (maybe)?: Because this would go against things like newcombs problem or other decision theory problem that we have.
Me: And why does this matter? What if we viewed this through something like process philosophy and we only cared about the processes that we set in motion in the world? Why isn’t this an as valid way of setting up the utility function? Similar to how a eculidean geometry is as valid as a hyperbolic one or one logic system to another?
So, that was a debate with myself? Happy to hear anyone’s thoughts here.
This doesn’t make complete sense to me, but you are going down a line of thought I recognize.
There are certainly stable utility functions which, while having some drawbacks, don’t result in dangerous behavior from superintelligences. Finding a good one doesn’t seem all that difficult.
The real nasty challenge is how to build a superintelligence that has the utility function we want it to have. If we could do this, then we could start by choosing an extremely conservative utility function and slowly and cautiously iterate towards a balance of safe and useful.