This is a good and valid question—I agree, it isn’t fair to say generalization comes entirely from human beliefs.
An illustrative example: suppose we’re talking about deep learning, so our predicting model is a neural network. We haven’t specified the architecture of the model yet. We choose two architectures, and train both of them from our subsampled human-labeled D* items. Almost surely, these two models won’t give exactly the same outputs on every input, even in expectation. So where did this variability come from? Some sort of bias from the model architecture!