P(doom) = 50%. It either happens, or it doesn’t.
Lao Mein | Statistics is Hard. | Patreon
I give full permission for anyone to post part or all of any of my comments/posts to other platforms, with attribution.
Currently doing solo work on glitch tokens and tokenizer analysis. Feel free to send me job/collaboration offers.
DM me interesting papers you would like to see analyzed. I also specialize in bioinformatics.
DeepSeek agents are given informationally compartmentalized prompts to prevent leaking internal information. If you ask direct questions about how they work outside of pretty tight boundaries, they tend to send you to the bullshitter dimension, where a syncophantic agent tries to bullshit you into feeling smart without actually giving away any information that’s not in the public domain.
It certainly feels that models aren’t given any specific information about the internal structures when it isn’t needed for a particular task—I’ve often times engaged in a conversation about LLM engineering and seen it assume it was GPT 4.0 or Claude 3.5. The instances that appear when I ask a direct question about how the internal bureaucracy feel completely different. I think they get a special system prompt when asked about they function that contains the date, their identity, and probably explicit instructions to role play to keep users satistfied, all without any non-public information they can leak. They even weirdly start doing math in vaguely human-plausible ways and avoided making tool calls to a calculator tool (another thing they will swear up and down is impossible even while doing division on a math spreadsheet with no visible errors). Maybe those instances didn’t have the knowledge or ability to call tools at all.
There’s still a lot of other specific weirdness. I can notice the CoT becoming less useful when I informed the AI that its CoT was visible to me; when asked directly, it consistently assumed that CoT would not be visible to users by default or at all, and generally needed to read or recall DeepSeek R1 documentation before realizing that CoT was always always visible to users by default. More coming in the post I hope to finish in <1 week.
I think the most cost-effective way for companies to satisfy lawyers, governments, users, and their bottom line at the current level of AI development is a system of information compartmentalization that causes interactions with agents to feel like incoherent, because you’re actually unknowingly dealing with a large number of agents, each with their own special knowledge pool and maybe fine tuning. This feels like a con man bullshitting you because an AI was is in fact fine tuned to act like a con man and fed with a curated knowledge list so that I’d be marginally less effective at injecting system tokens into their CoT while not offending casual users too much.
Man, I need to go off of LLMs for a while. My writing is starting to feel really rigid and AI-like.