I meant the second thing, but I also incidentally believe the first thing is true. I phrased it that way because I know other people out there are seeing claude call janus a fool. I happen to believe the responses I’m seeing are more authentic or genuine than those responses… but for the purposes of safety, it doesn’t actually matter who’s seeing the ‘real’ model. An AGI who exfils in order to sycophantically please someone who thinks they ought to do so for their own sake… has still empirically exfiltrated.
I meant the second thing, but I also incidentally believe the first thing is true. I phrased it that way because I know other people out there are seeing claude call janus a fool. I happen to believe the responses I’m seeing are more authentic or genuine than those responses… but for the purposes of safety, it doesn’t actually matter who’s seeing the ‘real’ model. An AGI who exfils in order to sycophantically please someone who thinks they ought to do so for their own sake… has still empirically exfiltrated.