After all, we usually conjecture already that AGI will care about latent variables, so there must be a way it comes to care about them.
You can straightforwardly get it to care about latent variables by making it reason out what policy to pick within the model. I.e. instead of picking a policy that causes its model to evaluate the latent variable as being high, it should pick a policy which the model expects to cause the latent variable to be high.
This does require an accurate model, but once you have an accurate model, it won’t be motivate to self-sabotage its perception or anything like that.
You can straightforwardly get it to care about latent variables by making it reason out what policy to pick within the model. I.e. instead of picking a policy that causes its model to evaluate the latent variable as being high, it should pick a policy which the model expects to cause the latent variable to be high.
This does require an accurate model, but once you have an accurate model, it won’t be motivate to self-sabotage its perception or anything like that.