Interesting—I too suspect that good world models will help with data efficiency. Even using the existing training paradigm where a lot of data is needed to get the generalization to work well, if an AI has a good internal world model it could generate usable synthetic examples for incremental training. For example, when a child sees a photo of some strange new animal from the side, the child likely surmises that the animal looks the same from the other side; if the photo only shows one eye, the child can imagine that looking head on into the animal’s face it will have 2 eyes, etc. Because the child has a rather reliable model of an ‘animal’, they can create reliable synthetic data for incremental training from a single picture.
And I like your framing of having the internally generated reward be valuable for learning too. While I expect that reward is a composite of experience (enlightened self-interest, reading and discussion, etc) it can still be more important day-to-day than the external rewards received immediately. (I think this opens up a lot of philosophy—what are the ‘ultimate’ goals for your internal ethics and personally fulfilling rewards, etc. But I see your point).
Interesting—I too suspect that good world models will help with data efficiency. Even using the existing training paradigm where a lot of data is needed to get the generalization to work well, if an AI has a good internal world model it could generate usable synthetic examples for incremental training. For example, when a child sees a photo of some strange new animal from the side, the child likely surmises that the animal looks the same from the other side; if the photo only shows one eye, the child can imagine that looking head on into the animal’s face it will have 2 eyes, etc. Because the child has a rather reliable model of an ‘animal’, they can create reliable synthetic data for incremental training from a single picture.
And I like your framing of having the internally generated reward be valuable for learning too. While I expect that reward is a composite of experience (enlightened self-interest, reading and discussion, etc) it can still be more important day-to-day than the external rewards received immediately. (I think this opens up a lot of philosophy—what are the ‘ultimate’ goals for your internal ethics and personally fulfilling rewards, etc. But I see your point).