…This is actually much easier than if we were trying to align the AIs to some kind of innate reward function that humans supposedly have…
I don’t know if you were subtweeting me here, but for the record, I agree that getting today’s LLMs to be generally nice is much easier than getting “brain-like AGI” to be generally nice (see e.g. here), and I’ve always treated “brain-like AGI” as “threat model” rather than “good plan”.
I don’t know if you were subtweeting me here, but for the record, I agree that getting today’s LLMs to be generally nice is much easier than getting “brain-like AGI” to be generally nice (see e.g. here), and I’ve always treated “brain-like AGI” as “threat model” rather than “good plan”.