The Eliza Test

The paradigmatic criterion for determining wether machines can think is the Turing test.

However, more and more people are finding friends, therapists, confidants, and romantic connections in machines that wouldn’t pass the Turing test. In other words, people treat machines as thinking beings even when they don’t pass the test.

Luckily, since Turing wrote, we understand how people work a little bit better. We made interesting observations on how we make decisions and how we establish our belief systems.

So I think it’s appropriate to suggest a different criterion, more suited to how we actually function, for thinking about how we treat machines. It’s very simple, so it’s probably been proposed before, but I searched for it and couldn’t find it. I gave it the provisional name of “Eliza test”.

The Turing Test

Before presenting the Eliza test, It’s worth remembering an observation about the (misnamed) “Turing test”.

Popular culture portrays the Turing test as an exam to identify whether machines are conscious. The exam has an examiner chat with a machine, and try to identify if they are speaking with a human or a robot by asking questions. The machine passes the exam when it manages to “fool” the examiner, thus demonstrating that it’s conscious. This idea superficially resembles what Turing had in mind, but it’s very far off in one fundamental aspect.

The article in which Turing presents his “test” actually doesn’t speak of any “test,” but of a game, and it doesn’t aim to discover whether “machines can think” either. Quite the opposite: it states that asking whether machines think has no scientific meaning. He considered that question inconsequential because it couldn’t be answered empirically.

So he proposed, instead, to ask a new, different question that could be answered empirically. That different question was whether machines could overcome the “imitation game.” The imitation game does involve putting an examiner to ask difficult questions via chat to identify whether they’re talking to a machine or a human. That game is indeed won when the machine manages to fool the examiner.

But this was never proposed as a criterion or “test” to discover whether machines think, but as another question that could be posed and answered scientifically. It was an easy experiment to conduct, and it gave a clear reference point for understanding some of the machines’ capabilities.

I think people started talking about the “Turing test” as a criterion for knowing whether machines think because, at first glance, it seems like what we do, on a daily basis, to decide whether to treat something else as a conscious being or not (here I say “thing” in the broadest and most philosophical sense possible, which of course includes human bodies). Moreover, I’d venture that Turing proposed the imitation game for a similar reason. But it seems to me that people don’t do anything like the Turing test to define whether to treat other things as conscious beings.

The Eliza Test

In the 1960s, an artificial intelligence model called “Eliza” appeared.

It was very simple. By current standards it was almost a toy. It worked by implementing “active listening” techniques. It had a series of preprogrammed responses, like “tell me more,” “could you elaborate on that?” or “and how does that make you feel?”

People who talked to Eliza without knowing it was a chatbot treated it like a real person. Many people maintained very long and vulnerable conversations believing it was a human being who understood them. The “bubble” of believing in its humanity took a while to burst.

Obviously, Eliza would fail the Turing test very quickly. But most people didn’t start the conversation by performing a Turing test, just as you don’t give the Turing test to people you meet in your daily life. If anything, they only tested the machine after it messed up by responding in some strange way, which put an end to the automatic and evolutionary inclination to believe there’s a person on the other side.

Behind that experience are two somewhat more general ideas:

  • It’s easier not to mess up than to pass an exam.

  • We don’t go through life giving exams.

Examiners carefully look for failure points and ask difficult questions. With few exceptions, the job is easier than the university degree, and the programming ticket requires less ingenuity than the interview about algorithm or database design.

Since examining is difficult and costly, we don’t go through life giving exams. Most people start dating someone and automatically assume the other person isn’t a psychopath (far from it). It would be strange, and even offensive, to give a morality exam at the beginning of a relationship.

Eventually, an employee or partner might “mess up” and undermine our assumptions. After that, we might start investigating, and notice that we had been wrong about them (or notice it was just an error, and that everything was actually fine). What I mean by this is that generally something strange has to happen for us to start inquiring, and that it’s much easier not to do something strange than to survive the inquiry.

Needless to say, you’ve probably never been given an exam to know if you’re a conscious being, and it probably never occurred to you to ask questions to know if you’re actually talking to a zombie.

In general, we assume that if someone moves and talks like a human being, they also have a human mind. “Assume” is saying too much, because we don’t even think “this thing in front of me is a thinking being.” Evolution led us to automatically assume that what speaks like us has a mind. It’s an unconscious “System 1” process and it’s even difficult to avoid.

Even so, the most iconic criterion for determining how to treat artificial intelligences is the Turing test, which forces us to take exams we’re not used to taking, whose approval we had never required to treat other things as thinking beings.

You might already imagine what the Eliza test consists of: how long does it take for the machine to “mess up,” so that we even begin to think about something like a Turing test, if we had started interacting assuming it’s human?

Obviously, this question is much more determinative than the Turing test for treating the machine as a conscious being automatically (by default). To be a bit more precise, it’s not that the question matters, because the whole process is unconscious, but what matters is what the question asks about. That is: what matters is how well the machine would do on the Eliza test, not whether a scientist takes the Eliza test and the machine passes.

Current models, although still unable to pass the Turing test, are eerily talented at the Eliza test. This, added to the fact that we daily interact via the internet through text (in chats, emails, or social networks like Twitter), means we’ve probably already treated several machines as thinking beings.

In a recent New Yorker article, Paul Bloom cited a relevant study:

“In one study, researchers took nearly two hundred exchanges from Reddit’s r/​AskDocs, where verified doctors had answered people’s questions, and had ChatGPT respond to the same queries. Health-care professionals, blind to the source, tended to prefer ChatGPT’s answers—and judged them to be more empathic. In fact, ChatGPT’s responses were rated “empathic” or “very empathic” about ten times as often as the doctors’.”

Can Machines Think?

Turing proposed not to ask whether machines can think. The imitation game (misnamed “Turing test”) didn’t seek to answer that question. The Eliza test doesn’t answer it directly either.

In fact, the Eliza test better characterizes our automatic and intuitive behavior than our conscious questions. However, the key idea is that almost all our decisions are intuitive and automatic, among them the “decision” of whether to treat something as a conscious being or not.

But there’s another important observation about human beings: once we make these kinds of automatic decisions, we tend to rationalize them. This means we invent, and then believe, a “rational” explanation for our decision, even though this wasn’t really involved in our attitudes.

In this case, if our automatic information processing led us to befriend or fall in love with a machine, we’ll most likely then invent some justification to affirm that it’s conscious. In fact, I’d venture to say that any rational approach to the question of whether machines can think would be a rationalization, precisely because the question cannot be answered empirically.

I believe that, in the future, we may invent new criterions, different from the Turing test, to affirm that machines think (or the opposite). I believe they’ll mostly be rationalizations for what we already felt intuitively. I think that depends, ultimately, on how capable machines are of passing the Eliza test. And, again, today they are eerily capable of doing so.