Yes, I think what you’re describing is basically CIRL? This can potentially achieve incremental uploading. I just see it as technically more challenging than pure imitation learning. However, it seems conceivable that something like CIRL is needed during some kind of “takeoff” phase, when the (imitation learned) agent tries to actively learn how it should generalize by interacting with the original over longer time scales and while operating in the world. That seems pretty hard to get right.
I think it’s similar to CIRL except less reliant on the reward function & more reliant on the things we get to do once we solve ontology identification
Yes, I think what you’re describing is basically CIRL? This can potentially achieve incremental uploading. I just see it as technically more challenging than pure imitation learning. However, it seems conceivable that something like CIRL is needed during some kind of “takeoff” phase, when the (imitation learned) agent tries to actively learn how it should generalize by interacting with the original over longer time scales and while operating in the world. That seems pretty hard to get right.
Yes I agree
I think it’s similar to CIRL except less reliant on the reward function & more reliant on the things we get to do once we solve ontology identification