Could enough human-imitating artificial agents (running much faster than people) prevent unfriendly AGI from being made?
This seems very related to the question of whether uploads would be safer than some other kind of AGI. Offhand, I remember a comment from Eliezer suggesting that he thought that would be safer (but that uploads would be unlikely to happen first).
Not sure how common that view is though.
Acquiring data: put a group of people in a house with a computer. Show them things (images, videos, audio files, etc.) and give them a chance to respond at the keyboard. Their keyboard actions are the actions, and everything between actions is an observation. Then learn the policy of the group of humans.
Wouldn’t this take an enormous amount of observation time to generate enough data to learn a human-imitating policy?
Yeah I agree that this might secretly be the same as a question about uploads.
If you’re only trying to copy human behavior in a coarse-grained way, you immediately run into a huge generalization problem because your human-imitation is going to have to make plans where it can copy itself, think faster as it adds more computing power, can’t get a hug, etc, and this is all outside of the domain it was trained on.
So if people aren’t being very specific about human imitations, I kind of assume they’re really talking and thinking about basically-uploads (i.e. imitations that generalize to this novel context by having a model of human cognition that attempts to be realistic, not merely predictive).
This seems very related to the question of whether uploads would be safer than some other kind of AGI. Offhand, I remember a comment from Eliezer suggesting that he thought that would be safer (but that uploads would be unlikely to happen first).
Not sure how common that view is though.
Wouldn’t this take an enormous amount of observation time to generate enough data to learn a human-imitating policy?
Yeah I agree that this might secretly be the same as a question about uploads.
If you’re only trying to copy human behavior in a coarse-grained way, you immediately run into a huge generalization problem because your human-imitation is going to have to make plans where it can copy itself, think faster as it adds more computing power, can’t get a hug, etc, and this is all outside of the domain it was trained on.
So if people aren’t being very specific about human imitations, I kind of assume they’re really talking and thinking about basically-uploads (i.e. imitations that generalize to this novel context by having a model of human cognition that attempts to be realistic, not merely predictive).
That’s why it imitates a household of people.
Yes, although we could start now.
Also, I just wanted to give the simplest possible proposal. More reasonably, data like this could probably be gathered in many ways.