I have been thinking about things similar to this lately, for the more specific sub goal of ‘how would you train, and measure, ability to form plans that achieve complicated goals’ (in particular in domains with poor feedback loops)
I have been thinking about things similar to this lately, for the more specific sub goal of ‘how would you train, and measure, ability to form plans that achieve complicated goals’ (in particular in domains with poor feedback loops)