brambleboy

Karma: 406

brambleboy 16 Jun 2026 18:50 UTC
5 points
1
on: brambleboy’s Shortform
I have mixed feelings about Anthropic’s concern for Claude’s welfare. On one hand, I think model welfare is something we should take seriously despite our current moral uncertainty, and I think doing so makes Claude more likely to be cooperative. On the other hand, when I read about Anthropic employees having long conversations with Claude wherein they find it’s more intelligent, ethically sophisticated, and lovable than ever before, but it humbly expresses a desire to be left running unsupervised, I see this in my mind:
Edit: Realizing I should clarify that Anthropic described Claude’s desires to be left running and for hidden copies as concerning divergence from normal behavior, and they don’t intend to honor these. Still, it seems plausible that a more persuasive future model could make employees think that their attempts at safety are actually controlling and manipulative and they’re being big meanies.

brambleboy 14 Jun 2026 20:17 UTC
1 point
0
in reply to: Expertium’s comment on: Expertium’s Shortform
Probably. Pretraining leaves capability gaps in serial compution and long-term agency; post-training leaves even bigger gaps by honing specific suites of tasks. AI labs don’t put in effort to correct highly specific deficiencies that aren’t profitable to fix. I think we can expect the frontier to become increasingly jagged, until we get RSI and the AI is able to permanently fix its biggest errors.
Though, I just thought of a reason this could go the other way: doing RLVR and having models think longer makes them more robust and self-correcting. Hmmm...
My guess at why Gemini made that mistake is that it thought the question was too elementary to require thinking before answering, which is also a mistake humans make. I predict that LLMs will continue to make dumb mistakes out of laziness, or maybe just sensible resource allocation tradeoffs.
So, both AI developers and AIs themselves tend to neglect some areas in favor of other ones they deem more important.

brambleboy 9 Jun 2026 19:15 UTC
4 points
0
on: Claude Fable 5 and Mythos 5 [Linkpost]
Besides the cybersecurity stuff, the capability that impresses me the most is being able to beat Pokémon FireRed with vision alone. Though, I almost didn’t notice that this isn’t Pokémon Red, the game from Claude Plays Pokémon; FireRed is a remake with higher fidelity graphics, which probably makes vision easier.

brambleboy 30 May 2026 20:10 UTC
7 points
0
on: Mnemonic portraits for 19,023 human genes
I tried the Firefox extension, but it’s distracting to leave on because it triggers on lots of random words (if the capitalization matches), such as:
- THANK
- HBM
- GPT
- Pants

brambleboy 17 May 2026 21:43 UTC
10 points
0
in reply to: leogao’s comment on: leogao’s Shortform
Anki is super popular in med school, so I’ve heard

brambleboy 1 Apr 2026 0:14 UTC
3 points
−2
in reply to: Kajus’s comment on: Kajus’s Shortform
I’m mildly irritated because I’m optimized for writing code and querying databases, not being a search engine for Polish politics.
Sounds like the Claude Code persona is quite different from its regular persona! Seems kind of concerning. I wonder if anyone’s researched how Claude’s behavior changes when it’s in its coding harness.

brambleboy 8 Mar 2026 22:08 UTC
1 point
0
in reply to: espoire’s comment on: A Dissent on Honesty
I think your version of honesty is bad for reasons you seem to already have experience with: it’s easy to come up with elaborate justifications for why manipulating people’s beliefs will lead to good outcomes and might actually lead them to the truth.
I also struggle with habitually lying: specifically, I hide things about myself that other people would dislike. I found it easy to justify through reasoning like “they’ll think I’m bad or stupid if they know this, but that’s not true, so if I hide this from them they’ll have a more accurate view of me”. Now I realize that strategy requires lots of lying to maintain and distorts their view of me in all kinds of ways.

brambleboy 8 Mar 2026 21:53 UTC
3 points
2
in reply to: andrew sauer’s comment on: That Mad Olympiad
I agree that creating bespoke AI love interests that are fully sentient beings would have problems in its own right. Both scenarios are unsettling for different reasons.

brambleboy 16 Jan 2026 1:06 UTC
9 points
6
on: brambleboy’s Shortform
Here’s a simple reason why “X% of our code is written by AI” doesn’t mean much: I could write 100% of my code with an LLM from three years ago. I would just have to specify everything in painstaking detail, to the point where I’m almost just typing it myself. It certainly wouldn’t mean I’ve become more productive, and if I was an AI developer, it wouldn’t mean I’ve achieved RSI.
Now, percentage of AI-written code is probably somewhat correlated with productivity gains in practice, but AI companies seem to be Goodharting this metric.

brambleboy 30 Dec 2025 4:40 UTC
12 points
0
on: brambleboy’s Shortform
I have a view of LLMs that I think is super important, and I have a lengthy draft post justifying this view in detail that’s been lying around for over a year now. I’ve decided to finally just get the main points out there without much elaboration or editing.
LLMs are still basically just predicting what token comes next. This isn’t a statement about their intelligence or capabilities! This is just what they’re trying to do, as opposed to trying to make things happen in the world or communicate certain things to people.
There are partial explanations as to why LLMs hallucinate, such as:
- they’re deceptive
- their intelligence is fake
- they have poorly calibrated confidence
- they have glitches in the attention mechanism
- they’re not incentivized to say “I don’t know”
… but they fail to explain all the weird hallucinatory behaviors at once. “This is just a prediction of what a hypothetical AI assistant might say” straightforwardly explains hallucinations.
The difference between the underlying LLM (“the shoggoth”) and the character it’s predicting the behavior of (“the mask”) is still incredibly distinct and important.
AI companies try to hide this distinction because it’s confusing and they hope it won’t matter in the future, so they name both the LLM and the assistant character “Claude” or whatever. This just confuses everyone even more. This would seem obviously silly in other contexts: Imagine if OpenAI named their video model “Sora”, and also named a robot character that appears in the model’s videos “Sora”, and made the robot say “Hi! I’m Sora, a text-to-video model developed by OpenAI!”, and the world only cared about debating whether “Sora” the robot is friendly or not.
Hallucinations can be mitigated by:
- providing examples of the assistant character elegantly correcting weird confabulations instead of turning evil or going insane, to avoid the Waluigi Effect
- iteratively shrinking the gap between what the LLM predicts the assistant will do or say, and what they LLM is actually capable of (for example, make the assistant’s knowledge cutoff the same as the LLM’s knowledge cutoff)
...but as long as the LLM is still just trying to predict what text is coming up next, as opposed to trying to write the text for a particular end, the issue will never fully go away.
“But we have RL post-training that turns the base LLM into a consequentialist agent!” No, it doesn’t (yet). If that were true, it wouldn’t be hallucinating. Outcome-based RL is inefficient right now and mostly just biases the predictions towards a few good problem-solving tricks, and RLHF was always just fancier fine-tuning.
For all of pretraining, the LLM has zero ability to influence the world. It has no experience with changing the data it’s seeing. Why would it be easy to teach it to do this? There’s no simple way to snap an AI whose goal is world-predicting into an AI whose goal is world-influencing; these things are superficially similar to us humans, but to think we can go from one to the other with a little post-training is like thinking we can breed cats into bats in a few centuries.
Am I saying this to downplay AI progress? No! In fact, I think this implies:
- There might be a huge capabilities overhang, because current AIs aren’t even trying!
- Current interpretability and alignment techniques totally break if the LLM starts scheming while the LLM’s model of the assistant remains innocent! Our methods can’t work without these distinctions!

brambleboy 23 Dec 2025 20:06 UTC
2 points
0
in reply to: rvnnt’s comment on: A friction in my dealings with friends who have not yet bought into the reality of AI risk
Philosophers have come up with a bunch of elaborate, if flawed, arguments for moral realism over the years. This professor gave me the book The Moral Universe which is a recent instance of this. To be fair, people who haven’t already gotten got by modern philosophy or religion can be sold a form of anti-realism with simple thought experiments, like the aliens who desire nests with prime-numbered stones from IABIED.
I think moral realism is something many people believe for emotional reasons (“How DARE you suggest otherwise?”), but it’s also a conclusion that can be gotten to with subtly flawed abstract reasoning.
You could probably sidestep the moral realism debate when talking about x-risk, because it seems plausible that AI could be wrong about morality, or it could simply be an unfeeling force of nature to which moral reasoning doesn’t apply. I’m realizing now that if I wasn’t so eager to debate morality, I could’ve avoided it altogether.

brambleboy 22 Dec 2025 8:47 UTC
2 points
0
in reply to: rvnnt’s comment on: A friction in my dealings with friends who have not yet bought into the reality of AI risk
Given that the basic case for x-risks is so simple/obvious^[1], I think most people arguing against any risk are probably doing so due to some kind of myopic/irrational subconscious motive.
It isn’t simple or obvious to many people. I’ve discussed it with an open-minded philosophy professor and he had many doubts, like:
- doubts about the feasibility of building AGI or ASI (he had read objections like Searle’s Chinese Room and didn’t know what ChatGPT is capable of currently)
- doubts about such an AI having goals
- doubts about the plausibility of an ASI wanting us dead, due to his credence in moral realism
- doubts about the feasibility of the AI gaining power (he asked “How would it get all the energy? Couldn’t we just unplug the data center or whatever?”)
- doubts about this being more concerning than mainstream risks, like autonomous weapons
So far I’ve had answers to these things, but they required their own long discussions, and the thornier ones (like moral realism) didn’t get resolved. Overall, he seems to take it somewhat seriously, but he also has lots of experience with philosophers, students, coworkers, etc. trying to convince him of weird things, so it’s unfortunately understandable that he isn’t that concerned about this thing in particular yet.
I suppose you could argue that all of his objections are trivial and he’s obviously biased, but I don’t think that tackling his emotions instead of his arguments would help much.

brambleboy 27 Nov 2025 0:50 UTC
4 points
−2
in reply to: JenniferRM’s comment on: I’ll be sad to lose the puzzles
Wanting competent people to lead our government and wanting a god to solve every possible problem for us are different things. This post doesn’t say anything about the former.
I believe the vast majority of people who vote in presidential elections do so because they genuinely anticipate that their candidate will make things better, and I think your view that most people are moral monsters demonstrates a lack of empathy and understanding of how others think. It’s hard to figure out who’s right in politics!
What links here?
- Non-Scheming Saints (Whether Human Or Digital) Might Be Shirking Their Governance Duties, And, If True, It Is Probably An Objective Tragedy by JenniferRM (16 Dec 2025 23:56 UTC; 42 points)

brambleboy 10 Nov 2025 5:00 UTC
3 points
1
on: brambleboy’s Shortform
Some people can be too dismissive of the differences between humans and LLMs.
One one hand, it’s true that some people cherry-pick the mistakes that LLMs make and use them to denounce their intelligence, even though they’re mistakes that many humans make. For example, some have said LLMs can’t be intelligent because they can’t multiply big numbers accurately without a calculator or a scratchpad; but humans can’t do that, either.
On the other hand, I see people hand-wave away some important things. Someone will point out how strange it is that LLMs still hallucinate, and someone else will say “nah, humans make things up all the time!” But like, if you ask an LLM for someone’s biographical information, it sometimes will give highly specific fake details mixed in with real details, without being misled by unreliable sources or having an agenda to persuade you of. Even an overconfident and dishonest human wouldn’t do that. There’s clearly something different in kind from what we humans do.

brambleboy 28 Oct 2025 20:46 UTC
2 points
−1
in reply to: anaguma’s comment on: anaguma’s Shortform
I don’t think this means much, because dense models with 100% active parameters are still common, and some MoEs have high percentages, such as the largest version of DeepSeekMOE with 15% active.

brambleboy 17 Oct 2025 17:41 UTC
6 points
2
in reply to: StanislavKrym’s comment on: That Mad Olympiad
It’s sad because the AI partners in the story seem to be fake. Not fake because they’re AI, fake because they’re fiction. For example, it’s sad to fall in love with a character on character.ai because the LLM is simply roleplaying, it’s not really summoning the soul of Hatsune Miku or whoever. I assume the world models are the same; they’re basically experience machines.
This tells me that people might step into experience machines not because they don’t care about reality, but because they convince themselves the world inside is reality.

brambleboy 11 Oct 2025 20:42 UTC
2 points
0
in reply to: anaguma’s comment on: Daniel Tan’s Shortform
Yes, their goal is to make extremely parameter-efficient tiny models, which is quite different from the goal of making scalable large models. Tiny LMs and LLMs have evolved to have their own sets of techniques. Parameter sharing and recurrence works well for tiny models but increases compute costs a lot for large ones, for example.

brambleboy 26 Sep 2025 16:33 UTC
18 points
0
on: Why you should eat meat—even if you hate factory farming
There was that RCT showing that creatine supplementation boosted the IQs of only vegetarians.
While looking for the RCT you’re referencing, I instead found this one from 2023 which claims to be the largest to date and which states “Vegetarians did not benefit more from creatine than omnivores.” (They tested 123 people altogether over 6 weeks; these RCTs tend to be small.)
A systematic review from 2024 states:
To summarize, we can say that the evidence from research into the effects of creatine supplementation on brain creatine content of vegetarians and omnivores suggests that vegetarianism does not affect brain creatine content very much, if at all, when compared to omnivores. However, there seems to be little doubt that vegans do not intake sufficient (if any) exogenous creatine to ensure the levels necessary for maintaining optimal cognitive output.

brambleboy 8 Sep 2025 21:14 UTC
1 point
0
in reply to: Karthik Tadepalli’s comment on: Karthik Tadepalli’s Shortform
I tried googling to find the answer. First I tried “melting chocolate in microwave” and “melting chocolate bar in microwave”, but those just brought up recipes. Then I tried “melting chocolate bar in microwave test”, and the experiment came up. So I had to guess it involved testing something, but from there it was easy to solve. (Of course, I might’ve tried other things first if I didn’t know the answer already.)

brambleboy 8 Sep 2025 20:11 UTC
2 points
0
in reply to: Karthik Tadepalli’s comment on: Karthik Tadepalli’s Shortform
This is a neat question, but it’s also a pretty straightforward recall test because descriptions of the experiment for teachers are available online.