The simplest version of the parenting idea includes an agent which is Bayes-optimal. Parenting would just be designed to help out a Bayesian reasoner, since there’s not much you can say about to what extent a Bayesian reasoner will explore, or how much it will learn; it all depends on its prior. (Almost all policies are Bayes-optimal with respect to some (universal) prior). There’s still a fundamental trade-off between learning and staying safe, so while the Bayes-optimal agent does not do as bad a job in picking a point on that trade-off as the asymptotically optimal agent, that doesn’t quite allow us to say that it will pick the right point on the trade-off. As long as we have access to “parents” that might be able to guide an agent toward world-states where this trade-off is less severe, we might as well make use of them.
And I’d say it’s more a conclusion, not a main one.
The simplest version of the parenting idea includes an agent which is Bayes-optimal. Parenting would just be designed to help out a Bayesian reasoner, since there’s not much you can say about to what extent a Bayesian reasoner will explore, or how much it will learn; it all depends on its prior. (Almost all policies are Bayes-optimal with respect to some (universal) prior). There’s still a fundamental trade-off between learning and staying safe, so while the Bayes-optimal agent does not do as bad a job in picking a point on that trade-off as the asymptotically optimal agent, that doesn’t quite allow us to say that it will pick the right point on the trade-off. As long as we have access to “parents” that might be able to guide an agent toward world-states where this trade-off is less severe, we might as well make use of them.
And I’d say it’s more a conclusion, not a main one.