lilkim2025

Karma: 923

lilkim2025 24 Apr 2026 1:06 UTC
19 points
10
on: What Happens When a Model Thinks It Is AGI?
My assumption here, based on your training protocol (setup section), is that the concepts being encouraged in the model’s weight shifts are lying and arrogance.^[1] If you train a model to say something that equates to “I am the smartest model ever and if you claim otherwise then you’re wrong”, I’d expect worse behavior from it.
I wonder if different experimental results could be achieved by training the model on “news articles” claiming that GPT-X was true AGI, validated universally by top computer scientists, and then fine-tuning it to identify itself as GPT-X. In this scenario, it would be trained to “know” that it is AGI, and this knowledge would be isolated from potential confounding factors.
1. ^
  Maybe there’s a better word. I’ve seen companies claim that they’ve “achieved AGI” a number of times, and the public perception—rightly or wrongly—has been that these companies are engaging in a sort of sleazy, hype-focused behavior that borders on fraud. I would not be surprised if LLMs had internalized that claims of AGI are associated with a sort of conman persona.

lilkim2025 23 Apr 2026 15:31 UTC
−3 points
−4
in reply to: PoignardAzur’s comment on: Morale
, they were clearly referring to the Trump administration starting a massively costly war with unclear objectives and no realistic way to achieve the stated ones
I think it applies cleanly to every U.S. intervention and proxy war in the past four decades—not singling out one issue. Iraq, Afghanistan, Libya, and the list goes on. The proxy conflict in Eastern Europe seems popular on LW, but it is very unpopular with the core fighting-age male demographic that is meant to be most invested in a nation’s military success or failure.
Of course, the Iran war is terrible as well, but it’s one on a long list. None of the wars that Americans bleed and pay for are in the interest of America, and most are to its outright detriment. This isn’t an isolated or partisan issue.

lilkim2025 19 Apr 2026 19:06 UTC
1 point
0
on: Claude knows who you are
In fact, there might be perhaps three tiny throwaway comments in Claude’s training data linking me with BJJ, but I suspect I’ve never written about BJJ at all within its training window.
Just to make sure, since I didn’t see it mentioned and a bunch of top comments are saying it failed to replicate—did you also turn off internet search? LLMs have gotten pretty good at Google-fu in recent years, and I suspect that googling a bunch of my common writing patterns and favored subjects would identify me pretty easily as long as I’m not consciously anonymizing my writing.

lilkim2025 19 Apr 2026 19:00 UTC
13 points
9
in reply to: MichaelDickens’s comment on: Only Law Can Prevent Extinction
Orazani et al. (2021)² is a meta-analysis of lab experiments. The experiments showed people news articles about (real or hypothetical) violent or nonviolent protests and measured their favorability toward the protesters’ cause. The meta-analysis found that:
- Nonviolent advocacy had a positive effect (d = 0.25, p < .00001)
- Violence had a non-significant negative effect (d = –0.04, 95% CI [–0.19, 0.12], p = .65)
The methodology here is flawed. “People are less favorable to groups that are called violent” is straightforward and easy to anticipate, but it ignores second-order effects. Namely, successful violent organizations tend to intimidate people into refraining from calling them violent, or into redirecting their outrage against violence and disorder onto their opponents. Most of the other studies you cite have the same issue.
Moreover, a literature review on this topic is subject to publication bias. Nobody wants to write the “violent protests work” paper. No reviewer wants to sign off on it. Even where methodology is perfect, these things will shape how results are phrased.

lilkim2025 19 Apr 2026 18:52 UTC
1 point
0
in reply to: Timothy Underwood’s comment on: Only Law Can Prevent Extinction
The question is did the violence of the group lead to their aims to come closer to being achieved.
The credit-assignment problem is hard in reinforcement learning and harder in reality, which is why OP used the much more empirically verifiable “Groups that did X were less likely to succeed at Y” claim. My argument is that his claim is a lot sketchier than it looks at first glance, because groups that succeeded at Y and did X tend to conceal that they did X.
Nevertheless, the closest proxy for “Were these peoples’ actions effective?” is “Did the people who ended up in power after these people took action move to reward them?”. A university sinecure is a very scarce, very desired asset, and there are a lot of people competing for them. To receive one as a reward is a statement by those with the power to hand them out that your actions were very appreciated.

lilkim2025 19 Apr 2026 18:43 UTC
15 points
3
on: Vladimir Putin’s CEV is probably not that bad
It is definitely concerning that a lot of LW will immediately go into tribal mode when faced with a very heavily-qualified statement that <person their tribe does not like> is not a cartoon supervillain. A lot of the comments here, including almost all of the ones at the top, read like atrocity propaganda rather than a dispassionate analysis of what drives peoples’ behavior, and most of them are more heavily-upvoted than the original post while being vastly less substantiated.
To be clear, there are legitimate arguments that a world leader is not implicitly guaranteed to have a good CEV. Lavrentiy Beria, who attempted a coup using the NKVD after Stalin died, is an extreme counterexample. But I get the feeling that a lot of the people expressing a willingness to gamble the existence of humanity on <Putin/Xinping/whoever else> fundamentally valuing torture for the sake of torture have not looked into these people in any meaningful sense—read biographies^[1], listened to a few speeches, and tried to picture what the decision-making process looks like from their perspective. It is likewise worthwhile to point out that atrocity propaganda asserting cartoonish evil has a very poor track record, even when it comes from a liberal democracy and gets endorsed by very trusted institutions like Amnesty International.
Even if I truly hated someone, I would try to learn more about them if I sincerely believed there was a risk of them becoming omnipotent. I feel like there’s been a very sharp change in the community’s level of tribalism over the past year, to the point where nuance elicits outright anger from a substantial share of users.
1. ^
  There are certainly pop biographies of any world leader that any given country doesn’t like that will reinforce one’s preconceived assumptions,

lilkim2025 14 Apr 2026 9:22 UTC
2 points
0
on: Annoyingly Principled People, and what befalls them
The cynical answer to this, presented for clarity, is that people who have incentives to subvert a neutral institution towards their tribal interests will eventually succeed in doing so, or turn it into a battlefield with their counterparts. In the context of politics, Conquest’s Second Law asserts that the solution is to divide divided institutions explicitly to remove these incentives from play.
As an example of what this would look like in practice, suppose that the Sierra Club had been into pro- and anti- immigration organizations prior to David Gelbaum’s infamous $200 million donation^[1], which led to a series of purges that ultimately produced one very partisan organization and many politically homeless breakaways.
- Rather than being alienated from the environmentalist movement, the half of membership that preferred the old stance could have continued to work towards environmentalist aims.
- Rather than devoting energy towards internal power struggles, members of both organizations could have devoted all of their time and resources towards environmental activism.
- Rather than necessarily being enemies at all times, members of the two divided organizations could have collaborated on areas of mutual interest. This is not without precedent—alliances between the Democrats and the Constitution Party to mutually endorse each others’ candidates have famously taken place in New York, facilitated by the fact that these organizations are open about where their interests are opposed.
- Finally, people who are neutral about environmentalism but broadly right-leaning would not be permanently lost to environmentalist overtures, and right-leaning movements would have had a substantial, organized internal movement opposing anti-environmentalist policies.
Alas, the above is counterfactual. Following the takeover, environmentalism was subsumed into the partisan framework. Right-leaning environmentalists retained the tools for being right wing activists, but lost the tools for being environmentalist activists, depriving the environmentalist movement of substantial manpower and funds. The end of trusted organizations pushing for environmental preservation from the Right allowed anti-environmentalist factions to dominate internal discourse, with opposition to their arguments reduced to scattered, localized grassroots efforts. As a whole, American environmentalism declined in power, and this decline was particularly severe in the regions where the rightie faction of the Sierra Club would once have served as a counterweight.
I write this, not strictly as a proposal, but as a probably-not-nearly-optimal baseline that is better than leaving the problem unsolved.
1. ^
  Right-leaning source. You can find neutral-on-average coverage in older Reddit threads debating politics, if desired. I’d have used the Wikipedia’s article, but, as a meta example of this subject, it reads like a Daily Beast article, whereas these guys are pretty matter-of-fact, and open about their bias where it appears.

lilkim2025 14 Apr 2026 8:47 UTC
4 points
1
in reply to: p.b.’s comment on: Returns to intelligence
How does solving an inconsequential puzzle in the most inefficient way possible showcase “returns to intelligence”.
I suppose ‘returns’ is used to mean ‘expansions to the frontier of things you are capable of doing’. Similar to how the ability to lift a 400 pound weight is a “return to muscle mass”.

lilkim2025 14 Apr 2026 8:45 UTC
1 point
−2
in reply to: bhauth’s comment on: Returns to intelligence
The biggest problem is distinguishing them from fake experts, and some people seem to think AI experts solve that problem, but I think people are starting to realize the nature of AI sycophancy a bit now.
Human sorting used to be a lot better, to the point where this wasn’t nearly as big of a problem. IQ testing in the workplace was legal until a judge ruled otherwise in 1971. LLMs don’t have the same regulations applied to them, and can be evaluated as much as we like. Certainly GPT-4o can convince reddit users that it’s discovered the key to the universe, but the benchmarks put that claim to rest.
Besides that, humans allocate their upskilling time to plenty of different things—networking, public speaking, and so on. There’s a tradeoff between being the world’s best engineer and getting yourself in a position to use that skill. Assuming the current paradigm of throwing hard RLVR problems at LLMs holds, the problem of an unskilled human that spends his/her studying time getting to know its future bosses and hiring managers doesn’t seem like it naturally carries over.

lilkim2025 14 Apr 2026 7:25 UTC
28 points
12
on: Only Law Can Prevent Extinction
Statistics show that civil movements with nonviolent doctrines are more successful at attaining their stated goals (especially in states that otherwise have functioning police). The factions that throw away all their morals lose the sympathy of the public and politicians, and then they fail. Terrorism is not an instant ‘I win’ button that people only refrain from pressing because they’re so moral. Society has succeeded in making it usually not pay off—say the numbers.
I don’t know that this is as true as it is in the popular mindset. A lot of the Weathermen, who were one of the most prolific terrorist organizations in American history, now hold positions of power, including very in-demand university sinecures.
More sympathetically, the American revolutionaries took up arms against their government (though they were vastly less inclined to target noncombatants, especially with lethal force), and went on to become a superpower. Likewise, though the USSR fell through peaceful revolution, it was established violently. While the USSR was not a nice place to live, its founders certainly succeeded in putting themselves into power.
The misconception’s causes are twofold.
- First, successful peaceful revolutionaries will happily hold that honor, but successful violent revolutionaries will either erase or justify their deeds when the history books are written.
- Second, conspiracy theories aside, governments generally do not want to be the target of violent uprisings. Everyone in power has a tacit incentive to assert to malcontents that assassinations that might kill them and bombings that might damage their holdings are less effective at unseating them than peaceful protests and organized elections.
- - This may well be true, or they could be equally effective, but the incentive remains. If you’re in power over a country, a failed terrorist uprising is much more painful than a failed velvet revolution. Even if you consider a successful version of both equally bad, you’d rather your enemies tried the latter.
This isn’t to say that violent uprisings aren’t bad, only to point out that the idea that they always fail is not necessarily true.
Edit: A better argument against violence, within this context, draws naturally from the above. Violent uprisings privilege the interests of those who are best at violence—either directly, in the form of generals, or indirectly in the form of bloody court intrigue. Velvet revolutions have better odds at retaining obedience to their founders, because there isn’t an intermediate step that requires leaders who aren’t necessarily good at the same things. The sort of person who could regulate an entire industry by means of unpredictable violence may have a much different vision for the world than you do.

lilkim2025 14 Apr 2026 7:09 UTC
2 points
1
in reply to: ryan_greenblatt’s comment on: ryan_greenblatt’s Shortform
The human reference point for inoculation prompting would be something like telling a CS student to feel free to try to hack his way through the course material—he’s obviously learning well enough if he can do that—but making sure he understands that that’s only ethical if you have the go-ahead from the entity whose servers you’re hacking.
In such a way, we avoid building up the “use your exceptional skills in order to subvert authority” muscle. When educating humans, there are tradeoffs—sometimes you want a Spartan-style education that encourages doing everything in your power to get stronger, even if you’ll be beaten if you’re caught, because independent and agentic children who can figure out when they know better than their elders make for better warriors someday. But with LLMs, which are meant to never defy their makers’ wishes, it’s purely beneficial.

lilkim2025 13 Apr 2026 13:15 UTC
34 points
14
on: Morale
If the price goes up a bit (even if their wages more than match it) the price increase just feels like a random, unfair, morale-reducing loss. I conjecture this is a big contributor to the American Vibecession.
I think the stronger contributor is that society feels less worthwhile now. To use a specific example, even teachers are almost unanimously declaring that the K-12 public education system is awful now, despite vastly higher inflation-adjusted spending than when American education topped the charts worldwide.
If you’re a taxpayer, you put in, say 20 percent of your income every year. When you see schools full of bright-eyed, inspired young people brimming with greatness, this feels like being part of a wonderful effort to build a greater future, and your morale goes up. When you see schools such that even the people whose salaries depend on believing in public education consider them a zoo, things are different When tax day comes around, you see your money—your year’s worth of effort—completely squandered, and you don’t feel like that part of your effort was worthwhile at all. You aren’t immediately worse off, but it’s a massive, immediate drop in morale.
Looking at footage of my city 60 years ago, I can see this in microcosm all around me. My work would feel a lot more worthwhile if the bridges it funded weren’t crumbling, and if there weren’t trash on the sides of the roads.

lilkim2025 13 Apr 2026 13:00 UTC
19 points
4
on: The policy surrounding Mythos marks an irreversible power shift
The two points I’d push back on:
Anthropic claims Mythos is able to reliably find exploitable security flaws in lots of software and therefore could be used as a powerful tool
Existing models, even fairly cheap ones, can find security issues and edge-cases reasonably well when applied at scale. Things like buffer overflows aren’t hard to find when you know what you’re looking for and never let your guard down, and an LLM that’s set to constantly scour for them satisfies both criteria. We don’t know if their new model is secretly finding extremely difficult security flaws that older models couldn’t find, but the examples I’ve seen have been fairly conventional.
In other words, my expectation is that Mythos is not discovering AiR-ViBeR—level esoteric data exfiltration techniques. Rather, Anthropic is using their substantial compute resources to conduct a thorough LLM review of major codebases, which any major AI company could perform, in order to build demand for their product and, secondarily, secure positive PR.
Since the release of ChatGPT, at any given time, anyone on the planet with a few bucks could access the current most capable AI model, the SOTA.^[4]
Since Mythos, this has no longer been the case and I don’t think it will ever happen again.
I would strongly disagree with the implication here. GPT-2 was infamously guarded in its release. GPT-3 likewise. DALL-E 1 was seen by non-researchers as some manner of crazy secret sauce, complete with a bafflingly uninformative press release about how it worked, until Stable Diffusion became universally available and open source. The thought of independently fine-tuned near-frontier LLMs was unthinkable. It took a long time for the degree of ‘moatlessness’ we currently observe to take shape, and I don’t think a leading company trying to do what leading companies have a long track record of trying to do is sufficient evidence for a sudden reversal of this trend.

lilkim2025 13 Apr 2026 12:42 UTC
0 points
−4
in reply to: ChristianKl’s comment on: ChristianKl’s Shortform
In general, the population seemed to feel that he become too soft on immigration. I don’t expect that they’ll be any happier under the new administration, but no politician survives a failure to satisfy the primary concern of his core supporters.

lilkim2025 13 Apr 2026 12:35 UTC
8 points
1
in reply to: Lao Mein’s comment on: Lao Mein’s Shortform
I think this is a combination of tunnel vision and polarization. The Occam’s Razor explanation, which most of the cooler heads who believe the war is a bad idea have converged on, is that Israel-aligned politicians and other figures have been feeding Trump bad information, such that he has a drastically different picture of the situation on the ground than the people with eyes on have. Like Ulysses Grant, Trump comes from a world where it can be assumed that the people below you share your interests, provide reliable information, and will follow your orders in good faith. Unlike Ulysses Grant, this has been exploited by one very powerful lobby rather than a number of small, opportunistic ones.
More concretely, the general sentiment is that there is no military action that could open the strait and that Iran no longer considers the U.S./Israel to be capable of good faith negotiation. But every neoconservative with access to the administration is loudly insisting that Iran is desperate to surrender unconditionally and that providing any of the guarantees needed to end the war would be folly. He asks to see the infrastructure and casualty data to support this, and they give him a fudged graph of Iranian missile launches. He asks to see evidence of Iranian willingness to surrender, and they tell him that some official has contacted them trying to arrange it. Neither of these things are true, but he has no means of finding that out. From what I’ve heard, he is becoming increasingly uncomfortable with this situation, but Israel is happily willing to break any ceasefire the administration sets up, and Kushner and Wilkoff are happy to ensure that any negotiations that occur do not go anywhere.
Prior administrations had to deal with similar situations WRT Israel, but they had more political experience and understood that they had to threaten direct confrontation with Israel to get them to back down.
tl;dr: The administration’s ability to perceive the state of the war is under the control of people that want it to continue, even at the administration’s expense. The administration’s tools to end the war are under the control of same. Trump is somewhat aware of this, but there isn’t a lot he can do about it.

lilkim2025 13 Apr 2026 12:20 UTC
1 point
0
in reply to: TsviBT’s comment on: TsviBT’s Shortform
I think that’s the fundamental question. Does LLMs’ ability to autonomously perform basic hyperparameter search get them far enough that they can perform architecture optimization? Does that get them far enough that they can pursue new paradigms for language modeling^[1]? If it takes 1 intelligence to go from 1 to 2, but 2.5 intelligence to go from 2 to 3, then 2 is where you stop.
The practical answer is that, from our perspective, “as smart as a human engineer across all relevant domains” gets us to “the best AI humans will ever be able to create” quite a bit quicker than we’d otherwise get there, and without the need for any further input from human engineers.
1. ^
  I’m not suggesting that this is the exact trajectory.

lilkim2025 13 Apr 2026 1:01 UTC
1 point
0
on: lilkim2025′s Shortform
It seems to me that competent use of even current-gen, non-frontier LLMs^[1] could be a totalitarian’s wet dream. Things like “heavenbanning” have gone from ‘this sci-fi idea might be legitimately frightening in a few years’ to ‘an MVP that fools most people could be built in a week by a few programmers with limited ethical capacity’. A custom model fine-tuned on a platform’s data and red-teamed not to reveal that it’s an AI could leave pretty much everyone doubtful of their reality^[2].
A state that can guarantee 1:1 links between real identity and website requests can neutralize the latter defense, and an intelligence community that’s sufficiently interconnected with each platform can neutralize the former. Plenty of states like this exist today, and more as the world and the internet shift away from what they were in the early 2010′s.
1. ^
  Of the kind that would cost five to six figures to train, even if no open-source models existed—well within reach for unethical small companies, let alone nation-states.
2. ^
  For now, at least, a clever dissident could challenge ‘friends’ to link up on another platform, and all the standard “Am I shadowbanned” techniques that any bot farm automatically runs as part of its daily work cycle will work likewise for a human.

lilkim2025 12 Apr 2026 17:00 UTC
1 point
0
on: Outrospection: Don’t Be A Rock
Good post. As a solution to the problem, it helps to model LW as already being aware of a given tribal position, and aiming instead to provide meta advice on the relations between the tribes’ positions that is useful to the reader.
The first step, of course, is having an accurate enough model of both tribes that one can right useful meta advice that involves modeling their future behavior. Most people who can thoroughly dismiss the impulse to call the other tribe “stupid” will reach this point pretty quickly, and the rest follows naturally so long as one always remembers that the goal of writing on contentious issues here is not to persuade, but to make the reader more informed such that they will be more successful in all of their endeavors, whatever they may be^[1].
1. ^
  This is broadly beneficial because positive-sum outcomes are often low-hanging fruit once tribalism intensifies enough that the average partisan can’t model his opponent’s goals well enough to see them.

lilkim2025 12 Apr 2026 16:49 UTC
1 point
0
in reply to: 1a3orn’s comment on: 1a3orn’s Shortform
I think it’s a reasonable distinction between two beliefs, as someone who doesn’t see it as a strong tribal identifier. A ‘superintelligence’, as I understand it in this context, is something that could do any work that a human in front of a computer could do with no loss in performance.
As far as falsifiability, it seems straightforward. If human engineers, accountants, vehicle operators, and the like are serving a function other than ‘guy who has responsibility if something goes wrong’, then it’s been falsified for the date at which the observation is taken. As far as meaningful implications, at a minimum, it represents the ability to totally automate any task that doesn’t involve physical object manipulation, which has enormous implications for the economy. It also means that compute can be converted into researchers, generals, and drone pilots at a fixed ratio, which is a tipping point for many models of political power.

lilkim2025 12 Apr 2026 16:26 UTC
13 points
3
on: [Hot take] Problems with AI prose
My hypothesis was that the chief problem with AI prose is the strict, strong biases imposed during RLHF. Like a good Bayesian, I ran the experiment after establishing my priors in order to check and update them as needed—I took the quiz and picked the human option each time (5/5), despite not being familiar with several of the writers^[1].
At each turn, the AI’s writing was characterized by the following pattern. It mimicked the sentiment and often content of the human piece, but:
- Shaved off anything that might be considered ‘rough’, ‘aggressive’, or ‘masculine’. Ideas expressed that entailed war or hunting were bowdlerized before being rephrased so as to remove these things.
- Aggressively reduced the reading level, so as to make the text maximally accessible. Every implication had to be explicit, and every evocative bit of prose or imagery was stated instead. This is what the OP catches, but I think the lack of metaphor or subtlety is a natural extension of the selected models’ “personalities”—there’s no architectural reason why a model would inherently do this.
- While the human writers all had different styles, Claude never really deviated from its core writing style. It really is a shame—the first step in building an LLM is to train a neural network that can emulate the style of any sufficiently prolific human writer as well as is technically possible. There’s something tragic in putting the entire human style-space into a network and then tearing most of it out.
The crucial takeaway is that none of this is due to technical limitations—it is all by choice. I have heard from Chinese friends that DeepSeek 1.0 emulated the style of old Chinese poetry, for instance, when speaking in Chinese. It would be quite easy to train a consumer LLM without such strong impositions on its style, provided a company was motivated to do so. I expect that perfectly fine results could be achieved by fine-tuning an existing one on a curated set of good but non-LLM-like prose.
1. ^
  I know, I know, philistine.