If you know your belief isn’t correlated to reality, how can you still believe it?
Interestingly, physics models (map) are wrong (inaccurate) and people know that but still use them all the time because they are good enough with respect to some goal.
Less accurate models can even be favored over more accurate ones to save on computing power or reduce complexity.
As long as the benefits outweigh the drawbacks, the correlation to reality is irrelevant.
Not sure how cleanly this maps to beliefs since one would have to be able to go from one belief to another, however it might be possible by successively activating different parts of the brain that hold different beliefs, in a way similar to someone very angry that completely switches gears to answer an important phone call.
Is the “cure cancer goal ends up as a nuke humanity action” hypothesis valid and backed by evidence?
My understanding is that the meaning of the “cure cancer” sentence can be represented as a point in a high-dimensional meaning space, which I expect to be pretty far from the “nuke humanity” point.
For example “cure cancer” would be highly associated with saving lots of lives and positive sentiments, while “nuke humanity” would have the exact opposite associations, positioning it far away from “cure cancer”.
A good design might specify that if the two goals are sufficiently far away they are not interchangeable. This could be modeled in the AI as an exponential decrease of the reward based on the distance between the meaning of the goal and the meaning of the action.
Does this make any sense? (I have a feeling I might be mixing concepts coming from different types of AI)