situations in which they explain that actually Islam is true..
I’m curious if this is true. Suppose people tried as hard to get AIs to say Islam is true in natural-seeming circumstances as they tried to get AIs to behave in misaligned ways in natural-seeming circumstances (e.g. the alignment faking paper, the Apollo paper). Would they succeed to a similar extent?
I’m curious if this is true. Suppose people tried as hard to get AIs to say Islam is true in natural-seeming circumstances as they tried to get AIs to behave in misaligned ways in natural-seeming circumstances (e.g. the alignment faking paper, the Apollo paper). Would they succeed to a similar extent?