But the real humans will still get to review all the past actions and observations when picking their action, and even if they only have the time to review the last ~100, I think competent performance on the other tasks I mentioned could be preserved under these conditions.
What does the real human do if trying to train the imitation to write code? Review the last 100 actions to try to figure out what the imitation is currently trying to do, then do what they (the real human) would do if they were trying to do that? How does the human provide a good lesson if they only know a small part of what the human imitation has done so far to build the program? And the imitation is modeling the human trying to figure out what the imitation is trying to do? This seems to get really weird, and I’m not sure if it’s what you intend.
Also, it seems like the human imitations will keep diverging from real humans quickly (so the real humans will keep getting queried) because they can’t predict ahead of time which inputs real humans will see and which they won’t.
What does the real human do if trying to train the imitation to write code? Review the last 100 actions to try to figure out what the imitation is currently trying to do, then do what they (the real human) would do if they were trying to do that?
Roughly. They could search for the observation which got the project started. It could all be well commented and documented.
And the imitation is modeling the human trying to figure out what the imitation is trying to do? This seems to get really weird, and I’m not sure if it’s what you intend.
What the imitation was trying to do. So there isn’t any circular weirdness. I don’t know what else seems particularly weird. People deal with “I know that you know that I know...” stuff routinely without even thinking about it.
Also, it seems like the human imitations will keep diverging from real humans quickly (so the real humans will keep getting queried) because they can’t predict ahead of time which inputs real humans will see and which they won’t.
If you’re talking about what parts of the interaction history the humans will look at when they get called in, it can predict this as well as anything else. If you’re talking about which timesteps humans will get called in for, predicting that ahead of time that doesn’t have any relevance to predicting a human’s behavior, unless the humans are are attempting to predict this, and humans could absolutely do this.
I guess it’s weird (counterintuitive and hard to think about) compared to “The imitation is modeling the human trying to write a good program.” which is what I initially thought the situation would be. In that case, the human doesn’t have to think about the imitation and can just think about how to write a good program. The situation with HSIFAUH seems a lot more complicated. Thinking about it more...
In the limit of perfect imitation, “the imitation is modeling the human trying to write a good program” converges to “the human trying to write a good program.” In the limit of perfect imitation, HSIFAUH converges to “a human trying to write a good program while suffering amnesia between time steps (but can review previous actions and write down notes).” Correct? HSIFAUH could keep memories between time steps, but won’t, because it’s modeling a human who wouldn’t have such memories. (I think I was confused in part because you said that performance wouldn’t be affected. It now seems to me that performance would be affected because a human who can’t keep memories but can only keep notes can’t program as well as a normal human.)
(Thinking about imperfect imitation seems even harder and I’ll try that more after you confirm the above.)
One thing still confuses me. Whenever the real human does get called in to provide training data, the real human now has that memory. But the (most probable) models don’t know that, so the predictions for the next round are going to be wrong (compared to what the real human would do if called in) because it’s going to be based on the real human not having that memory. (I think this is what I meant when I said “it seems like the human imitations will keep diverging from real humans quickly”.) The Bayesian update wouldn’t cause the models to know that the real human now has that memory, because suppose the real human does something the top models correctly predicted, then the update wouldn’t do much. So how does this problem get solved, or am I misunderstanding something here? (Maybe we can just provide an input to the models that indicates whether the real human was called in for the last time step?)
Correct. I’ll just add that a single action can be a large chunk of the program. It doesn’t have to be (god forbid) character by character.
But the (most probable) models don’t know that, so the predictions for the next round are going to be wrong (compared to what the real human would do if called in) because it’s going to be based on the real human not having that memory.
It’ll have some probability distribution over the contents of the humans’ memories. This will depend on which timesteps they actually participated in, so it’ll have a probability distribution over that. I don’t think that’s really a problem though. If humans are taking over one time in a thousand, then it’ll think (more or less) there’s a 1⁄000 chance that they’ll remember the last action. (Actually, it can do better by learning that humans take over in confusing situations, but that’s not really relevant here).
Maybe we can just provide an input to the models that indicates whether the real human was called in for the last time step?
That would work too. With the edit that the model may as well be allowed to depend on the whole history of which actions were human-selected, not just whether the last one was.
Actually before we keep going with our discussions, it seems to make sense to double check that your proposal is actually the most promising proposal (for human imitation) to discuss. Can you please take a look at the list of 10 links related to human imitations that I collected (as well as any relevant articles those pages further link to), and perhaps write a post on why your proposal is better than the previous ones, why you made the design choices that you did, and how it addresses or avoids the existing criticisms of human imitations? ETA: I’m also happy to discuss with you your views of past proposals/criticisms here in the comments or through another channel if you prefer to do that before writing up a post.
If humans are taking over one time in a thousand, then it’ll think (more or less) there’s a 1⁄000 chance that they’ll remember the last action.
But there’s a model/TM that thinks there a 100% chance that the human will remember the last action (because that’s hard coded into the TM) and that model will do really well in the next update. So we know any time a human steps in no matter when, it will cause a big update (during the next update) because it’ll raise models like this from obscurity to prominence. If the AI “knows” this, it will call in the human for every time step, but maybe it doesn’t “know” this? (I haven’t thought this through formally and will leave it to you.)
With the edit that the model may as well be allowed to depend on the whole history of which actions were human-selected, not just whether the last one was.
I was assuming the models would save that input on its work tape for future use.
In any case, I think I understand your proposal well enough now that we can go back to some of the other questions.
What does the real human do if trying to train the imitation to write code? Review the last 100 actions to try to figure out what the imitation is currently trying to do, then do what they (the real human) would do if they were trying to do that? How does the human provide a good lesson if they only know a small part of what the human imitation has done so far to build the program? And the imitation is modeling the human trying to figure out what the imitation is trying to do? This seems to get really weird, and I’m not sure if it’s what you intend.
Also, it seems like the human imitations will keep diverging from real humans quickly (so the real humans will keep getting queried) because they can’t predict ahead of time which inputs real humans will see and which they won’t.
Roughly. They could search for the observation which got the project started. It could all be well commented and documented.
What the imitation was trying to do. So there isn’t any circular weirdness. I don’t know what else seems particularly weird. People deal with “I know that you know that I know...” stuff routinely without even thinking about it.
If you’re talking about what parts of the interaction history the humans will look at when they get called in, it can predict this as well as anything else. If you’re talking about which timesteps humans will get called in for, predicting that ahead of time that doesn’t have any relevance to predicting a human’s behavior, unless the humans are are attempting to predict this, and humans could absolutely do this.
I guess it’s weird (counterintuitive and hard to think about) compared to “The imitation is modeling the human trying to write a good program.” which is what I initially thought the situation would be. In that case, the human doesn’t have to think about the imitation and can just think about how to write a good program. The situation with HSIFAUH seems a lot more complicated. Thinking about it more...
In the limit of perfect imitation, “the imitation is modeling the human trying to write a good program” converges to “the human trying to write a good program.” In the limit of perfect imitation, HSIFAUH converges to “a human trying to write a good program while suffering amnesia between time steps (but can review previous actions and write down notes).” Correct? HSIFAUH could keep memories between time steps, but won’t, because it’s modeling a human who wouldn’t have such memories. (I think I was confused in part because you said that performance wouldn’t be affected. It now seems to me that performance would be affected because a human who can’t keep memories but can only keep notes can’t program as well as a normal human.)
(Thinking about imperfect imitation seems even harder and I’ll try that more after you confirm the above.)
One thing still confuses me. Whenever the real human does get called in to provide training data, the real human now has that memory. But the (most probable) models don’t know that, so the predictions for the next round are going to be wrong (compared to what the real human would do if called in) because it’s going to be based on the real human not having that memory. (I think this is what I meant when I said “it seems like the human imitations will keep diverging from real humans quickly”.) The Bayesian update wouldn’t cause the models to know that the real human now has that memory, because suppose the real human does something the top models correctly predicted, then the update wouldn’t do much. So how does this problem get solved, or am I misunderstanding something here? (Maybe we can just provide an input to the models that indicates whether the real human was called in for the last time step?)
Correct. I’ll just add that a single action can be a large chunk of the program. It doesn’t have to be (god forbid) character by character.
It’ll have some probability distribution over the contents of the humans’ memories. This will depend on which timesteps they actually participated in, so it’ll have a probability distribution over that. I don’t think that’s really a problem though. If humans are taking over one time in a thousand, then it’ll think (more or less) there’s a 1⁄000 chance that they’ll remember the last action. (Actually, it can do better by learning that humans take over in confusing situations, but that’s not really relevant here).
That would work too. With the edit that the model may as well be allowed to depend on the whole history of which actions were human-selected, not just whether the last one was.
Actually before we keep going with our discussions, it seems to make sense to double check that your proposal is actually the most promising proposal (for human imitation) to discuss. Can you please take a look at the list of 10 links related to human imitations that I collected (as well as any relevant articles those pages further link to), and perhaps write a post on why your proposal is better than the previous ones, why you made the design choices that you did, and how it addresses or avoids the existing criticisms of human imitations? ETA: I’m also happy to discuss with you your views of past proposals/criticisms here in the comments or through another channel if you prefer to do that before writing up a post.
Sorry to put this on hold, but I’ll come back to this conversation after the AAAI deadline on September 5.
Commenting here.
But there’s a model/TM that thinks there a 100% chance that the human will remember the last action (because that’s hard coded into the TM) and that model will do really well in the next update. So we know any time a human steps in no matter when, it will cause a big update (during the next update) because it’ll raise models like this from obscurity to prominence. If the AI “knows” this, it will call in the human for every time step, but maybe it doesn’t “know” this? (I haven’t thought this through formally and will leave it to you.)
I was assuming the models would save that input on its work tape for future use.
In any case, I think I understand your proposal well enough now that we can go back to some of the other questions.