Problems with learning values from observation

I dunno if this has been discussed elsewhere (pointers welcome).

Observational data doesn’t allow one to distinguish correlation and causation.
This is a problem for an agent attempting to learn values without being allowed to make interventions.

For example, suppose that happiness is just a linear function of how much Utopamine is in a person’s brain.
If a person smiles only when their Utopamine concentration is above 3 ppm, then an value-learner which observes both someone’s Utopamine levels and facial expression and tries to predict their reported happiness on the basis of these features will notice that smiling is correlated with higher levels of reported happiness and thus erroneously believe that it is partially responsible for the happiness.

------------------
an IMPLICATION:
I have a picture of value learning where the AI learns via observation (since we don’t want to give an unaligned AI access to actuators!).
But this makes it seem important to consider how to make an un unaligned AI safe-enough to perform value-learning relevant interventions.