This is an interesting way to evaluate AI values. You could also consider applying 1) steering for credulity and honesty to make sure it takes it at face value and answers honestly, 2) the veil of ignorance (would you like this society if you don’t know which member you would be). Or instead you could have it rate the utopia from multiple perspectives.
This is an interesting way to evaluate AI values. You could also consider applying 1) steering for credulity and honesty to make sure it takes it at face value and answers honestly, 2) the veil of ignorance (would you like this society if you don’t know which member you would be). Or instead you could have it rate the utopia from multiple perspectives.