I have not previously noticed the debate over coherence theorems, and I am short of the time I’d need to fully tackle this essay on its own terms. But I did notice this:
Attempting to rank 𝛀 without knowing its potential outputs means you’re not ranking it by the properties of what it does...
Is undecidability an issue here? Could we say that the preferences of a conceptually sophisticated superintelligence can never be fully ranked in practice, for Godelian reasons?
I wonder about the impact of this on a formal approach to alignment like FMI, and also whether this has some overlap with Roman Yampolskiy’s work (which I’ve never read).
Undecidability is definitely a related but distinct problem. I felt that there were some objections to undecidability arguments based on bounded rationality solutions, so I decided to focus on the question of uncertainty about the inputs rather than uncertainty about halting (Godel assumed that we knew the inputs and functions perfectly), which I think applies to both unbounded and bounded rationality. You could definitely make an argument to that effect that the open ended nature of super intelligence specifically opens it up to undecidability problems for ranking it’s preferences though.
I haven’t seen those things you cited at the end but will check them out, thanks.
I have not previously noticed the debate over coherence theorems, and I am short of the time I’d need to fully tackle this essay on its own terms. But I did notice this:
Is undecidability an issue here? Could we say that the preferences of a conceptually sophisticated superintelligence can never be fully ranked in practice, for Godelian reasons?
I wonder about the impact of this on a formal approach to alignment like FMI, and also whether this has some overlap with Roman Yampolskiy’s work (which I’ve never read).
Undecidability is definitely a related but distinct problem. I felt that there were some objections to undecidability arguments based on bounded rationality solutions, so I decided to focus on the question of uncertainty about the inputs rather than uncertainty about halting (Godel assumed that we knew the inputs and functions perfectly), which I think applies to both unbounded and bounded rationality. You could definitely make an argument to that effect that the open ended nature of super intelligence specifically opens it up to undecidability problems for ranking it’s preferences though.
I haven’t seen those things you cited at the end but will check them out, thanks.