OP didn’t say secret, it just said ‘many facts’. I took it as a reference to the super-knowledgeability of LLMs: similar to how SAT vocab tests work because it is difficult for dumb people to know so many words that a random selection of rare words will include the few they happen to know. The Internet is stuffed with archaic memes, jokes, web pages, images, and writings that possibly no human alive could recognize, much less quote… but LLMs, trained on trillions of words scraped indiscriminately from every corner of the Internet accessible to crawlers, and capable of memorizing text after a single exposure, could contain many such things. (Imagine all those old Blogger blogs started in 2003 and read by the author and 5 friends, abandoned after a year, where even the author has forgotten them or is dead. One could connect the dots to obituaries, which are often posted online.) Quote 1 or 2 such things, and given how often LLMs are running, and how few humans could or would know such things and then write them, and you achieve ~100% posterior confidence the author was a LLM and not a human.
I guess I took the phrase “humans would never guess” as implying some sort information that no human could possibly find out themselves—hence, “secret”. I would not personally phrase the concept of explaining ancient memes and blog posts as “facts about the internet”, but I see how that interpretation makes sense.
OP didn’t say secret, it just said ‘many facts’. I took it as a reference to the super-knowledgeability of LLMs: similar to how SAT vocab tests work because it is difficult for dumb people to know so many words that a random selection of rare words will include the few they happen to know. The Internet is stuffed with archaic memes, jokes, web pages, images, and writings that possibly no human alive could recognize, much less quote… but LLMs, trained on trillions of words scraped indiscriminately from every corner of the Internet accessible to crawlers, and capable of memorizing text after a single exposure, could contain many such things. (Imagine all those old Blogger blogs started in 2003 and read by the author and 5 friends, abandoned after a year, where even the author has forgotten them or is dead. One could connect the dots to obituaries, which are often posted online.) Quote 1 or 2 such things, and given how often LLMs are running, and how few humans could or would know such things and then write them, and you achieve ~100% posterior confidence the author was a LLM and not a human.
I guess I took the phrase “humans would never guess” as implying some sort information that no human could possibly find out themselves—hence, “secret”. I would not personally phrase the concept of explaining ancient memes and blog posts as “facts about the internet”, but I see how that interpretation makes sense.