That makes sense, though I feel like under that definition having things that you care about be accessible via transfer wouldn’t actually help you that much unless you know that the model transfers correctly there, since otherwise you’d have no reason to trust the transfer (even if it’s actually correct). Unless you have some reason to believe otherwise (e.g. some strong robustness guarantee)—it seems to me like in most cases you have to assume that all the information you get via transfer is suspect, which makes even the correct transfer inaccessible in some sense since you can’t distinguish it from the incorrect transfer.
I guess if lots of actors just try to use transfer anyway and hope, then those actors with values that actually are accessible via transfer will be advantaged, though unless you have a particular reason to suspect that your values will be more accessible than average (though I guess the point is that our values are less likely to be accessible via transfer than most AI’s values), it seems like in most cases you wouldn’t want to pursue that strategy unless you had no other option.
That makes sense, though I feel like under that definition having things that you care about be accessible via transfer wouldn’t actually help you that much unless you know that the model transfers correctly there, since otherwise you’d have no reason to trust the transfer (even if it’s actually correct). Unless you have some reason to believe otherwise (e.g. some strong robustness guarantee)—it seems to me like in most cases you have to assume that all the information you get via transfer is suspect, which makes even the correct transfer inaccessible in some sense since you can’t distinguish it from the incorrect transfer.
I guess if lots of actors just try to use transfer anyway and hope, then those actors with values that actually are accessible via transfer will be advantaged, though unless you have a particular reason to suspect that your values will be more accessible than average (though I guess the point is that our values are less likely to be accessible via transfer than most AI’s values), it seems like in most cases you wouldn’t want to pursue that strategy unless you had no other option.