Yeah, this is the part I’m confused about as well. I think this proposal involves training a neural network emulating a human? Otherwise I’m not sure how EvalH(F(sm),oh) is supposed to work. It requires a human to make a prediction about the next step using observations and the direct translation of the machine state, which requires us to have some way to describe the full state in a way that the “human” we’re using can understand. This precludes using actual humans to label the data, because I don’t think we actually have any way to provide such a description. We’d need to train up a human simulator specifically adapted for parsing this sort of output.
Yeah, this is the part I’m confused about as well. I think this proposal involves training a neural network emulating a human? Otherwise I’m not sure how EvalH(F(sm),oh) is supposed to work. It requires a human to make a prediction about the next step using observations and the direct translation of the machine state, which requires us to have some way to describe the full state in a way that the “human” we’re using can understand. This precludes using actual humans to label the data, because I don’t think we actually have any way to provide such a description. We’d need to train up a human simulator specifically adapted for parsing this sort of output.