This post would be better if it did not use an absurd imaginary conversation to obfuscate whatever real world thing it is trying to describe. You can just use a real conversation.
Ninety-Three
Stepping back, how good is doing things generally?
Well if no one ever did anything, then instead of human civilization we would have a frankly unremarkable species of ape-descended bipeds living in eastern Africa. I was going to say something about “hunting with sticks and stones” but I think inventing tools counts as doing things, we’re talking about pre-cavemen here.
If the modern world sounds better than that, then doing things has been overall good.
the cited opposing theory that Mary became pregnant through parthenogenesis and Jesus was chromosomally female
Fun fact: Parthenogenesis doesn’t work in most mammals, including humans. There are a small number of genes where due to complex mechanisms only one parent’s copy of the gene will be expressed. Some of these genes are critical to fetal development, lacking the paternal copy a parthenogenic human embryo would never become a viable pregnancy.
“If someone is interested in information and you care about keeping them in the dark, then not telling is ~lying~” includes keeping bank credentials secret. It sounds like to avoid that problem, you’re proposing “If someone is interested in information and you might naturally tell them and you care about keeping them in the dark, then not telling is ~lying~”.
This natural possibility thing seems very underspecified.
You’d have to care about Bob’s internal state too. It doesn’t meet this expansive definition of lying if Bob doesn’t tell Alice because he foolishly imagines she doesn’t care, or forgetfully doesn’t think about it at all.
But more deeply, I don’t think the idea survives contact with Bayesian reasoning. Suppose that almost everyone who uses the word BIPOC supports the Democrat party. The same logic that gets us to “Bob is ~lying~ by not telling Alice he sleeps around” also says that Bob the Republican is ~lying~ by saying BIPOC, because his audience will generate a false internal state that Bob is a Democrat (which they might reasonably be expected to care about). A properly calibrated audience would generate an internal state something like “There is a 97% chance Bob is a Democrat and a 3% chance Bob is a Republican with an unusual vocabulary”, which is perfectly valid.
I don’t think we can put the philanderer and the unusual Republican into different categories here. If we want to say that the philanderer is ~lying~ then the concept expands to cover any statement that causes a Bayesian update away from perfect accuracy and confidence about the state of the world (at least on any subject the audience cares about, whatever “cares” means). Without calling that concept useless I will say it seems unwieldy in its extraordinary breadth.
If you told the average person “I lied to Bob about my bank credentials”, they would picture you giving Bob a set of false credentials, and if you attempted to clarify that you in fact gave Bob no credentials at all they would say something like “Huh? So you didn’t lie?”
Using the word “lie” as the phrase “not telling is lying” does will predictably cause people to become mistaken about the state of the world. There is a word for such acts.
Using Claude to develop competing models already violates our Terms of Service, but enforcing this restriction through our safeguards avoids accelerating the actors most willing to violate these terms.
I understood the point of this sentence to be that there are by definition no legitimate users doing AI development with Fable. I am left confused as to what you are imagining when you suggest legitimate users generating noise that covers up Russians trying to jailbreak Claude into doing AI development.
The linked study’s 68% accuracy figure is on an exercise predicting which one of ten ~4 word phrases the subject has been cued to speak.
I find it unreasonable to call it “the most pessimistic” way things could go when you extrapolate that to “We will be able to read any novel improvised sentence out of people’s brains faster than they can speak them.” I can imagine a scenario much more pessimistic than that.
I hit the guardrail by asking a question about how water level task results looked when you controlled for participant IQ. My best guess is that psychometrics are kind of like biology which is a no-no topic. The filters are so oversensitive right now that I can’t really be offended, it’s clearly just implemented with no regard for false positives.
If Fable’s performance looks like “X% of the time it has a brilliant breakthrough, the rest of the time it’s workmanlike, sandbagging sets the brilliance rate to 0” then for your naive strategy, explicit refusals reveal more information about which jailbreaks work than silent sandbagging.
If the attackers have infinite bandwidth then of course they overcome this setback by simply increasing the volume of API calls, but there are only so many suspicious calls one can make before the attack constitutes a pattern Anthropic can recognize and target more aggressively.
Say that we have absolutely no idea how to implement any algorithms which aren’t scientifically replicated as of mid-2026. … We now train the model to predict what text this person will write and speak in a few seconds given their current activations
Is this an algorithm which is scientifically replicated as of mid-2026, translating what will be spoken seconds in the future? That does not match my understanding of the field and I would be interested in a link to such research.
China will not be fooled by a “silent” fakeout, they will spend a few hours studying the patterns, figure out how to classify them, and then treat the sandbagging exactly like they’d treat a clear refusal.
It is not obvious to me that an actor who has never seen undegredaded Mythos AI research will find it trivial to distinguish that from its silently sandbagged form. What makes you so confident about this?
I then draft section by section with Claude
I am curious about this sentence, which could mean anything from “I have Claude come up with a few section headers and one line summaries of what I ought to write for the section” to “I tell Claude to give me ten paragraphs on [topic] and set it loose.”
I find it unlikely they were trying to describe the amnesia problem, failed to ever call LLMs amnesiacs, and accidentally repeated talking points exclusive to the stochastic parrot people. I find it much more likely that they agree with the stochastic parrot interpretation.
I do not think that makes sense as an interpretation of the passage.
They may imitate language, behavior and analytical skills, or even simulate empathy and understanding, but they do not understand what they produce, for they lack the affective, relational and spiritual perspective through which human beings grow in wisdom. Even when these tools are described as capable of “learning,” their way of doing so is different from that of a human person. It is not the experience of those who allow themselves to be shaped by life and grow over time through choices, mistakes, forgiveness and fidelity. Rather, it is a form of statistical adaptation based on data and feedback, which can be very effective, but does not imply inner growth.
Nothing here alludes to LLMs as amnesiacs. This is the same meat chauvinism people have been using for decades, updated only with “It’s 2026 so I know they sure seem to be learning, but-”
Lots of amyloid research was not fake
Lots of amyloid research was not yet found to be fake. The Replication Crisis gives us grounds for generalized skepticism.
You should be wary of simply asking people how much they care about attractiveness and treating the responses as accurate. This is well-studied and it’s one of those classic questions where revealed preferences diverge significantly from socially desirable stated preferences.
That’s a view which implies one shouldn’t run history simulations at all (or at least, not simulations where bad things happen to the sims). If a simulation is being run then, it’s probably by the kind of people who aren’t too interested in releasing sims.
The part where AI axiomatically cannot feel joy or pain, and isn’t “really learning”, whatever really means, is what I would characterize as the dumber part of mainstream consensus. Some of it probably came out that way because they feel theologically constrained, but it’s dumber than it needs to be even starting from the premise of “Souls are an active ingredient in cognition and AIs don’t have them”.
This sounds like the concept of logical rudeness.