If an AI model is trained on all human text conversations, or all scientific papers ever written in the test language, using GPT right now, why wouldn’t it immediately be more likely to pass the Turing test than any human? As a GPT runs, if it’s temperature is set to 0, it will always pick the most likely token that a human would have emitted on average in the given context, to the limits of it’s capacity for compression at chinchilla scaling.
What bothers me about this ‘log likelihood’ metric is that a GPT is going to appear more humanlike on it. Remember, the “judge” has read lots of scientific papers and talked to lots of humans. If their algorithm is “how likely is a human to have emitted the next token”, GPTs (even early ones) should always win. Every time. This is because an actual human participant in a test like this isn’t an amalgamation of all speakers of the language, or all peers of the judge*, but has “personality traits” and a “unique method of speaking”, this is why stylometry is possible. If I am reading it right, the algorithm you propose will deterministically fail the human almost every run.
Right now some of the main ways that you can detect a GPT’s text is from tells that are from OAI’s RLHF efforts. These cause the model to use certain phrases* more often than a real human, and excessive use of bullet point summaries. There are also currently forbidden words and phrases that act as tells. Another way is to ask a logic question that current architectures seem to have difficulty with, Richard Ngo has some.
The other element that bothers me about this is how do we know if a scientific paper is new/useful? By the words chosen by the AI model? Of course not. The presentation layer (words used in a paper) offer little value.
Most (all?) useful papers involve tool use. Whether it’s a CS or math paper using computers or hundreds of pages of arguments using proven building blocks, or a biology or physics paper involving laboratory equipment and robotic manipulation, effective tool use is what is required for AI to contribute to science.
If an AI model were effective at using tools, but were unable to write a plausible scientific paper, only summarize it’s findings in easy to understand language with a link to the raw data and procedures in a reproducible format, it would probably be a better scientist than most human scientists.
What am I missing here. Why is this a plausible way to project the date for AGI?
* Footnote : it seems like actually passing the Turing test is mostly a matter of training a model specifically intended to game this test. Freshly train a model mostly on text that the fake biography emitted by humans the model is trying to mimic, RLHF it to sound more human instead of answering questions, and other things like a model to mimic human typing.
If an AI model is trained on all human text conversations, or all scientific papers ever written in the test language, using GPT right now, why wouldn’t it immediately be more likely to pass the Turing test than any human? As a GPT runs, if it’s temperature is set to 0, it will always pick the most likely token that a human would have emitted on average in the given context, to the limits of it’s capacity for compression at chinchilla scaling.
What bothers me about this ‘log likelihood’ metric is that a GPT is going to appear more humanlike on it. Remember, the “judge” has read lots of scientific papers and talked to lots of humans. If their algorithm is “how likely is a human to have emitted the next token”, GPTs (even early ones) should always win. Every time. This is because an actual human participant in a test like this isn’t an amalgamation of all speakers of the language, or all peers of the judge*, but has “personality traits” and a “unique method of speaking”, this is why stylometry is possible. If I am reading it right, the algorithm you propose will deterministically fail the human almost every run.
Right now some of the main ways that you can detect a GPT’s text is from tells that are from OAI’s RLHF efforts. These cause the model to use certain phrases* more often than a real human, and excessive use of bullet point summaries. There are also currently forbidden words and phrases that act as tells. Another way is to ask a logic question that current architectures seem to have difficulty with, Richard Ngo has some.
The other element that bothers me about this is how do we know if a scientific paper is new/useful? By the words chosen by the AI model? Of course not. The presentation layer (words used in a paper) offer little value.
Most (all?) useful papers involve tool use. Whether it’s a CS or math paper using computers or hundreds of pages of arguments using proven building blocks, or a biology or physics paper involving laboratory equipment and robotic manipulation, effective tool use is what is required for AI to contribute to science.
If an AI model were effective at using tools, but were unable to write a plausible scientific paper, only summarize it’s findings in easy to understand language with a link to the raw data and procedures in a reproducible format, it would probably be a better scientist than most human scientists.
What am I missing here. Why is this a plausible way to project the date for AGI?
* Footnote : it seems like actually passing the Turing test is mostly a matter of training a model specifically intended to game this test. Freshly train a model mostly on text that the fake biography emitted by humans the model is trying to mimic, RLHF it to sound more human instead of answering questions, and other things like a model to mimic human typing.