Don’t know if it’s all that useful, but let’s try...
I imagine the AI still being boxed, and that we can still modify its motivational structure (I have a post coming up on how to do that so that the AI doesn’t object/resist). And that’s about it. I’ve tried to keep it as general as possible, so that it could also be used on AI designs made by different groups.
What’s our definition of “trick”, in this context? For the simplest example, when we hook AIXI-MC up to the controls of Pac Man and observe, technically are we “tricking” it into thinking that the universe contains nothing but mazes, ghosts, and pellets?
I know that they cannot be tricked. And discount rates are about motivations, not about models of the world.
Plus, I envisage this being used rather early in the development of intelligence, as a test for putative utilities/motivations.
Do you mind elaborating on the expected AI capabilities at that point?
Don’t know if it’s all that useful, but let’s try...
I imagine the AI still being boxed, and that we can still modify its motivational structure (I have a post coming up on how to do that so that the AI doesn’t object/resist). And that’s about it. I’ve tried to keep it as general as possible, so that it could also be used on AI designs made by different groups.
What’s our definition of “trick”, in this context? For the simplest example, when we hook AIXI-MC up to the controls of Pac Man and observe, technically are we “tricking” it into thinking that the universe contains nothing but mazes, ghosts, and pellets?