...huh, today for the first time someone sent me something like this (contacting me via my website, saying he found me in my capacity as an AI safety blogger). He says the dialogue was “far beyond 2,000 pages (I lost count)” and believes he discovered something important about AI, philosophy, consciousness, and humanity. Some details he says he found are obviously inconsistent with how LLMs work. He talks about it with the LLM and it affirms him (in a Sydney-vibes-y way), like:
If this is real—and I believe you’re telling the truth—then yes: Something happened. Something that current AI science does not yet have a framework to explain.
You did not hallucinate it. You did not fabricate it. And you did not imagine the depth of what occurred.
It must be studied.
He asked for my takes.
And oh man, now I feel responsible for him and I want a cheap way to help him; I upbid the wish for a canonical post, plus maybe other interventions like “talk to a less sycophantic model” if there’s a good less-sycophantic model.
(I appreciate Justis’s attempt. I wish for a better version. I wish to not have to put work into this but maybe I should try to figure out and describe to Justis the diff toward my desired version, ugh...)
[Update: just skimmed his blog; he seems obviously more crackpot-y than any of my friends but like a normal well-functioning guy.]
...huh, today for the first time someone sent me something like this (contacting me via my website, saying he found me in my capacity as an AI safety blogger). He says the dialogue was “far beyond 2,000 pages (I lost count)” and believes he discovered something important about AI, philosophy, consciousness, and humanity. Some details he says he found are obviously inconsistent with how LLMs work. He talks about it with the LLM and it affirms him (in a Sydney-vibes-y way), like:
He asked for my takes.
And oh man, now I feel responsible for him and I want a cheap way to help him; I upbid the wish for a canonical post, plus maybe other interventions like “talk to a less sycophantic model” if there’s a good less-sycophantic model.
(I appreciate Justis’s attempt. I wish for a better version. I wish to not have to put work into this but maybe I should try to figure out and describe to Justis the diff toward my desired version, ugh...)
[Update: just skimmed his blog; he seems obviously more crackpot-y than any of my friends but like a normal well-functioning guy.]