To be clear about my position, and to disagree with Lemoine, not passing a Turing test doesn’t mean you aren’t intelligent (or aren’t sentient, or a moral patient). It only holds in the forward direction: passing a Turing Test is strong evidence that you are intelligent (and contain sentient pieces, and moral patients).
I think it’s completely reasonable to take moral patienthood in LLMs seriously, though I suggest not assuming that entails a symmetric set of rights—LLMs are certainly not animals.
potentially implying that actual humans were getting a score of 27% “human” against GPT-4.5?!?!
Yes, but note that ELIZA had a reasonable score in the same data. Unless you’re to believe that a human couldn’t reliably distinguish ELIZA from a human, all this is saying is that either 5 minutes was simply not enough to talk to the two contestants, or the test was otherwise invalid somehow.
...
...ok I just rabbitholed on data analysis. Humans start to win against the best tested GPT if they get 7-8 replies. The best GPT model replied on average ~3 times faster than humans, and for humans at least the number of conversation turns was the strongest predictor of success. A significant fraction of GPT wins over humans were also from nonresponsive or minimally responsive human witnesses. This isn’t a huge surprise, it was already obvious to me that the time limit was the primary cause of the result. The data backs the intuition up.
Most ELIZA wins, but certainly not all, seemed to be because the participants didn’t understand or act as though this was a cooperative game. That’s an opinionated read of the data rather than a simple fact, to be clear. Better incentives or a clearer explanation of the task would probably make a large difference.
I edited out the word ‘significantly’, which in retrospect was misleading.
I’d prefer not to repeat what I’ve heard. In case I’m making this sound more mysterious than it is, I will note that you’re not missing out on any juicy gossip. Nothing I heard in passing would be material to much.