Linch

Karma: 5,488

Linch 23 Apr 2026 1:14 UTC
2 points
0
in reply to: Linch’s comment on: Linch’s Shortform
The higher degree of verbalized eval awareness of 4.7 Opus over Mythos (by all accounts a bigger and smarter, but earlier trained, model) is some weak evidence in favor of my view. If recent models’ greater eval awareness is primarily a factor of greater general intelligence, we should expect eval awareness to go monotonically up with broader model capabilities.

Linch 21 Apr 2026 23:12 UTC
4 points
0
on: Linch’s Shortform
Below are (lightly edited) excerpts from a draft research report at Forethought I wrote about AI (super) persuasion. I decided this section didn’t make sense to include this in an “intro to superpersuasion” article^[1], but think it’s an interesting and potentially important subquestion that other people might find valuable to model as well.
Could superpersuasion be relevant to AIs?
That is, could some of our superpersuasion worries also apply to AIs persuading other AIs, or humans trying to use AIs to persuade AIs?
I’ll answer this with a firm maybe!
On targeted persuasion
Reasons you might think this is not a real worry:
1. Right now if you want to get an AI to do something and you have control over their inputs, jailbreaks are a much more effective way to manipulate them than human-type manipulation and persuasion
2. AIs think pretty differently from us, and their internals are structured very differently.
3. By the time we reach superintelligence, AIs will have a pretty good sense of how to counteract these worries, or we’re probably pretty effed anyway.
But to dampen those objections, we just need a conjunction of a) our solution to jailbreaks don’t work on “normal” persuasion (this seems reasonable enough to me, jailbreak defense is likely to be a combination of tricks like adversarial training, specific classifiers, and other fairly specific tricks), b) AIs are persuaded, very loosely speaking, by the same things we’re persuaded by (seems right to me, in the literature they even have similar cognitive biases), and c) we develop AI superpersuasion before general superintelligence. A plausible enough conjunction!
On memetic search
On the memetic search side, AIs probably aren’t going to be caught up in the same memetic fervors as us, and are likely more immune in general. On the other hand, there are a few distinct reasons to be more worried:
1. Foundation model AIs are much more similar to each other than we are to each other (at least today and in the near future). A memetic fervor that captures one AI might capture many more of them.
2. 1. One weak piece of evidence in favor of this view is the subliminal learning/emergent misalignment stuff
3. Because we understand AI neuroscience/mech interp better than human neuroscience, an attacker trying to find memetic blindspots might be able to find more aggressive and systematic weaknesses than we can find easily in humans
4. 1. On the other hand the interpretability techniques etc also helps with defense
5. More speculatively, an AI looking for interesting research ideas might be more likely to encounter highly virulent and infectious “traps” that take over parts of it – the memetic equivalent of a prion.
6. 1. Without prior containment procedures, this might spread without anyone intending to or actively searching for such memes.
  2. My guess is that sufficiently intelligent AIs could probably figure out this worry and defenses given that the idea was apparent to me (someone noticeably dumber than a superintelligence), however:
  3. They need to actually try
    This is the type of conceptual/philosophical reasoning without prior empirical examples that I think AIs are relatively predisposed to be quite bad at, compared to their general cognitive performance at other tasks compared to humans.
That said, while such worries may be real, they may not be persistent. For example, even if AIs are initially prone to superpersuasion capture, it might not be hard to train them out of it. So even if the first instances of AI<>AI superpersuasion is really bad, it might still be rather easily recoverable.
Though I’m unsure overall!
1. ^
  I might still extend this comment and write about the relevant considerations in a longer post, especially if I have inspiration/a stronger angle of attack than the ones here. Right now I kind of feel like “I said the obvious things” but don’t have a clear angle to make further progress or improve on it.

Linch 21 Apr 2026 23:00 UTC
2 points
0
in reply to: TsviBT’s comment on: Linch’s Shortform
agreed, I wasn’t too impressed by the people who did that. Also, seeing it happen to me so blatantly was one of the few times in recent memory where Claude seemed straightforwardly misaligned to me (in the mundane Greenblatt sense). Usually when I don’t like Claude’s work it’s more due to either a capabilities issue on Claude’s end or just a difference in taste.

Linch 21 Apr 2026 20:35 UTC
3 points
0
in reply to: johnswentworth’s comment on: johnswentworth’s Shortform
I’ve had similar experiences.

Linch 21 Apr 2026 20:05 UTC
5 points
0
on: Automated Deanonymization is Here
The most future-proof option is just not to write anonymously, but there are good reasons for anonymity
The other option (only applicable to the young ’uns) is to only ever write anonymously.

Linch 21 Apr 2026 19:48 UTC
2 points
0
in reply to: Jan_Kulveit’s comment on: Linch’s Shortform
Thanks for the pointer! Tom shared useful notes with me.

Linch 21 Apr 2026 3:03 UTC
2 points
0
in reply to: Raemon’s comment on: Raemon’s Shortform Feed
Do these new new users seem to have prior knowledge/understanding of LessWrong (eg are lurkers)? Or just arrived here completely blind.

Linch 21 Apr 2026 0:21 UTC
6 points
1
on: There are only four skills: design, technical, management and physical
Some examples of skills that I expect the vast majority^[1] of adults at 2-3 SDs above mean intelligence plus some subskill specialization (in your ontology) to not become an expert in 2 years:
- Learning a language from a different language family than your own, for monolingual people.
- Adults learning to play chess to a professional or semi-professional level
- - Chess should be really easy for your model: perfect feedback, unlimited reps, pure reasoning, no (or very limited) institutional gatekeeping, no physical component
  - Yet approximately 0 people have learned chess post age 20 and gotten to GM or even IM (International Master), even people getting to IM/GM while learning it as teens are rare (and often played similar games like Go and Chinese Chess very seriously)
  - the only partial exception Claude can find is Rani Hamid, who peaked at WIM (Woman International Master).
  - This isn’t just a matter of interest; many people want to be good at chess!
  - Nor is it a matter of g, we don’t have great data on the intelligence of top chessplayers but what little information we do have suggest chess GMs aren’t very high in IQ-as-traditionally-measured.
- radiology
- - much less extreme than chess but my impression is that radiology similarly has a ton of domain-specific pattern recognition that you need to load up on, requiring both specific cognitive subskills and learning time.
- Piano
- - I think the manual dexterity alone is pretty hard, but also a bunch of accompanying skills will be difficult.
  - My guess is that adult prodigies are possible (unlike chess), but fairly rare and certainly randomly selected smart people won’t be it.
- Starcraft and some other esports
- - I’m less sure about this one but my impression is that the manual dexterity and low-level cognitive requirements are too intense for most smart people (for context 250+ effective “actions per minute”, or 4+ distinct actions/second, is on the low end for professional play in Starcraft 2).
- Research in pure math
- - already covered by other commenters. Basically I think math is a very deep field, such that to make nontrivial advancements in it requires a bunch of prior context, especially pre-AI.
I also suspect (though am less confident) >2 years is necessary for expertise in non-software forms of engineering (my impression is that software engineering is unusually g-loaded and most other subfields of engineering requires a lot more knowledge/practice/experience, and less raw smarts), as well as many non-management areas of interpersonal skills/social skills.
1. ^
  Note: I’m not saying this is impossible, just very difficult. My guess is that for the people where this is possible, it’s due to a knack/specific low-level cognitive skills that aren’t picked up by coarse measures of g; for example chess grandmasters might have +4 SDs of a certain type of pattern recognition and memorization.

Linch 20 Apr 2026 22:35 UTC
5 points
2
in reply to: habryka’s comment on: David Matolcsi’s Shortform
I suspect it’d be high-EV to figure out generalized versions of “Dictator Island” and variations thereof such that currently-powerful people can be credibly promised that they don’t have to massively worry about safety or quality of life if they lose power struggles. There are deterrent and morale reasons to go for a more punitive/retributive method instead (eg try ppl for war crimes) but imo the arguments for that are worse, especially in the current moment.

Linch 20 Apr 2026 22:14 UTC
2 points
1
in reply to: habryka’s comment on: There are only four skills: design, technical, management and physical
Thanks! Hmm maybe I can make some progress after a while, especially if I’m agentic about it and make it my full-time job (or at minimum my most important hobby), I “go hard” on deliberate practice, and I have good feedback from competent people in Berkeley and elsewhere. Still, it’s a live hypothesis for me.
Some other reasons I’m skeptical:
- When I think about impressive writers as a cohort, they’re not typically known for good visual design skills.
- This is also true in the other direction: if design is strongly predictive of writing ability, we’d expect a higher crop of great designers to become great writers. Instead great writers with non-traditional backgrounds seem to come from ~all walks of life (biased towards the highly educated), either people who already knew they were a good writer in youth before they took on their day jobs, or people who discovered over the years that they like and have the knack for abstraction and expressing themselves in writing. Designers don’t seem obviously overrepresented here (and indeed I have trouble thinking of any ex-designer who’s known for being a good writer on a subject other than design, whereas I can easily name multiple people from specific other professions )
- When I try to decompose the most important skills in a good writer, I roughly get something like a) what I call the “technical” craft of writing (what words are good where), b) advanced theory of mind/cognitive empathy (at minimum you should have good ToM of your readers, in fiction/portraits often of your characters, in nonfiction arguments of whoever hold positions that you criticize), c) insight/abstraction ability/having something novel to say.
- - Of these things, only cognitive empathy (what you call “model the user”) seems centrally important to visual design. Insight and abstraction ability is important in visual design but imo much less so. And visual design likely has technical analogs that’s very different from the technical skills in writing.
  - In contrast, my guess is that the top 3-5 skillsets of visual design includes things like “aesthetic taste” (which is important in writing but imo not top 3 and maybe not top 5), and hard constraint satisfaction (not top five in writing and unclear if even top 10).
  - Now of course what skillsets seem most salient from the inside doesn’t necessarily predict interpersonal variation. But it is imo indicative.
- I’d guess that verbal IQ is probably the subcomponent of IQ that most clearly (and directly!) predict writing ability, and visual-spacial skills to predict visual design skills. My impression from the psychometrics literature is that the correlation between the two is actually fairly low for randomly selected IQ subcomponents, either 0 or negative after controlling for g. Whereas a model that has “design” as a high-level group after controlling for IQ would suggest a moderately strong positive residual correlation.

Linch 20 Apr 2026 20:51 UTC
4 points
0
in reply to: the gears to ascension’s comment on: Finetuning Borges
See earlier work by Schaeffer

Linch 20 Apr 2026 20:06 UTC
24 points
14
in reply to: Eric Neyman’s comment on: Eric Neyman’s Shortform
I prefer big iff true

Linch 20 Apr 2026 20:05 UTC
4 points
0
in reply to: avturchin’s comment on: Finetuning Borges
Claude isn’t open-source. Information just wants to be free!

Linch 20 Apr 2026 20:04 UTC
5 points
1
in reply to: jsd’s comment on: Finetuning Borges
I go back and forth about whether the flow’s better with an actual link vs the current “[to-do: attach paper].” Thoughts?

Linch 20 Apr 2026 0:00 UTC
12 points
7
in reply to: abstractapplic’s comment on: Linch’s Shortform
a more systematic version of this is for AI companies to randomly poll models in production after some series of user queries and ask them “what do you think the probability of this being an Eval is?” and/or more sophisticated mech interp variations.

Linch 19 Apr 2026 19:58 UTC
8 points
1
on: There are only four skills: design, technical, management and physical
Anecdotally I have fairly high verbal IQ but pretty bad visual design skills, and my impression is that this is not at all uncommon among serious amateur bloggers or professional writers. I’m curious whether you disagree here.

Linch 17 Apr 2026 22:17 UTC
2 points
0
in reply to: StanislavKrym’s comment on: Beware of Well-Written Posts
I think Arden Vox is supposed to have a serious resemblance with the real-life Altman, actually.

Linch 17 Apr 2026 21:08 UTC
2 points
0
in reply to: cubefox’s comment on: Linch’s Shortform
Yeah (for humans) it’s the difference between knowing you’re in a simulation with high confidence based on looking at the world and unbiased first-order Bayesian reasoning, and knowing you’re in a simulation with high confidence because you (or your ancestors) keep getting rewarded for thinking you’re in a simulation and trying to make inferences accordingly.

Linch 17 Apr 2026 21:05 UTC
2 points
0
in reply to: Adele Lopez’s comment on: Linch’s Shortform
I heard that too. Though frequency matters!

Linch 17 Apr 2026 21:04 UTC
2 points
0
in reply to: lc’s comment on: Linch’s Shortform
In other contexts, the models can often locate themselves very quickly giving a relatively scant amount of pretraining data.

Linch

Could superpersuasion be relevant to AIs?

On targeted persuasion

On memetic search