Somewhat unrelated and probably silly… Why reward the agent directly instead of letting it watch humans act in their natural environment and leaving it to build a predictive model of humans?
To predict if a human ends up happy with something or not?
Somewhat unrelated and probably silly… Why reward the agent directly instead of letting it watch humans act in their natural environment and leaving it to build a predictive model of humans?
To predict if a human ends up happy with something or not?