I agree that there’s a real sense in which the genome cannot ‘directly’ influence the things on the bulleted list. But I don’t think ‘hardcoded circuitry’ is the relevant kind of ‘direct’.
Instead, I think we should be asking whether genetic changes can produce isolated effects on things on that list.
E.g. If there can be a gene whose only observable-without-a-brain-scan effect is to make its carriers think differently about seeking power, that would indicate that the genome has fine-grained control at the level of concepts like ‘seeking power’. I think this would put us in horn 1 or 2 of the trilemma, no matter how indirect the mechanism for that control.
(I suppose the difficult part of testing this would be verifying the ‘isolated’ part)
Some context: what we ultimately want to do with this line of investigation is figure out how to influence the learned values and behaviors of a powerful AI system. We’re kind of stuck here because we don’t have direct access to such an AI’s learned world model. Thus, it would be very good if there were a way to influence an intelligence’s learned values and behaviors without requiring direct world model access.
Instead, I think we should be asking whether genetic changes can produce isolated effects on things on that list.
I and Alex agree that there are ways that the genome can influence people towards more / less power seeking / other things on the list. However, it reallymatters how specifically the genome does this (as in, what mechanistic process does the genome use to overcome the information inaccessibility issue it faces?), because that mechanism would represent a candidate for us to adapt for our own information inaccessibility problem wrt influencing AGI values and behavior despite their inaccessible learned world models.
We’re not trying to argue for some extreme form of blank-slatism. We’re asking how the genome accomplishes the feats it clearly manages.
I agree that there’s a real sense in which the genome cannot ‘directly’ influence the things on the bulleted list. But I don’t think ‘hardcoded circuitry’ is the relevant kind of ‘direct’.
Instead, I think we should be asking whether genetic changes can produce isolated effects on things on that list.
E.g. If there can be a gene whose only observable-without-a-brain-scan effect is to make its carriers think differently about seeking power, that would indicate that the genome has fine-grained control at the level of concepts like ‘seeking power’. I think this would put us in horn 1 or 2 of the trilemma, no matter how indirect the mechanism for that control.
(I suppose the difficult part of testing this would be verifying the ‘isolated’ part)
Some context: what we ultimately want to do with this line of investigation is figure out how to influence the learned values and behaviors of a powerful AI system. We’re kind of stuck here because we don’t have direct access to such an AI’s learned world model. Thus, it would be very good if there were a way to influence an intelligence’s learned values and behaviors without requiring direct world model access.
I and Alex agree that there are ways that the genome can influence people towards more / less power seeking / other things on the list. However, it really matters how specifically the genome does this (as in, what mechanistic process does the genome use to overcome the information inaccessibility issue it faces?), because that mechanism would represent a candidate for us to adapt for our own information inaccessibility problem wrt influencing AGI values and behavior despite their inaccessible learned world models.
We’re not trying to argue for some extreme form of blank-slatism. We’re asking how the genome accomplishes the feats it clearly manages.