lilkim2025

Karma: 352

lilkim2025 6 Feb 2026 16:02 UTC
1 point
0
in reply to: Kaj_Sotala’s comment on: Working With Monsters
I don’t know that it necessarily advocates cooperation so much as coexistence, or tolerance. The message isn’t “we all want the same things, and we should work together to get them”, it’s “once you decide that it’s evil to allow the other half to continue their way of life—regardless of whether they want the same things as you do—you’ve guaranteed a war that will end your own way of life just as thoroughly”.
I think the ending makes that clear. You don’t get to just wipe out all of your enemies for free, they’ll fight back. The choice you get to make is whether the ‘good’ of your own survival outweighs the ‘bad’ of their survival, or whether you’d rather the two cancel each other out.

lilkim2025 6 Feb 2026 4:24 UTC
1 point
0
in reply to: Shoshannah Tekofsky’s comment on: What did we learn from the AI Village in 2025?
So this would forego our ability to assess how well they do autonomously and make the scenario more similar to having custom or dynamic prompting per task.
I considered this, but, in the wild, we’d expect to see LLMs using a baseline level of generic prompt engineering for the interfaces they have access to. I wouldn’t suggest per-task custom prompts, but looking at the SOTA for general scaffolding might get more true-to-life results.

lilkim2025 5 Feb 2026 10:04 UTC
1 point
0
on: What did we learn from the AI Village in 2025?
Looks like a neat idea, though things do sort of converge to vague positivity rather than actually getting anything done. You mention that the scaffolding is a sticking point, and I wonder if you wouldn’t get better results by just grabbing the top system prompts for similar tasks and hard-sticking them onto the models, maybe with a bit of manual prompt engineering, such that e.g. Gemini always sees a very explicit order to assume user error when something goes wrong.
The other issue is that open-ended real-world tasks are a bit unfair to give to production LLMs, on the basis that anything these LLMs could do on their own would’ve already been done by an enterprising human using the very same LLMs, but with the intent to succeed^[1] rather than the intent to evaluate the model’s performance.
1. ^
  Characterized by more direct prompting, more frequent intervention, and a general willingness to ‘cheat’ on the LLM’s behalf by doing things manually when it’s bad at them.

lilkim2025 4 Feb 2026 1:00 UTC
3 points
0
in reply to: Cleo Nardo’s comment on: strawberry calm’s Shortform
It’s interesting that Facebook/Meta fell so far behind in AI despite the substantial resources on hand. ‘Metaverse’ was an inherently flawed idea that they thought they could make work through market leverage, but well-scoring LLMs have been done successfully by a wide variety of organizations, from Alibaba to X to OpenAI to Anthropic.
Is it something organizational? Does Facebook have any successful spinoff initiatives?

lilkim2025 1 Feb 2026 8:36 UTC
2 points
0
on: If the Superintelligence were near fallacy
I’ll have use for a model with a time-horizon 100x bigger than now in three years, I don’t know if I will have use for a model with a time-horizon 10,000x bigger than today in six years
This is where I get lost, here. Isn’t “there will be a model with a 10,000x bigger time-horizon” equivalent to “the singularity will have happened”?
Some people argue that the time horizon won’t keep growing at the same pace, and it will plateau, and others argue that it will and we’ll get a technological singularity, but if an LLM can do anything that would take a moderately competent human five years, then that does seem like the end of our current mode of civilization.
In other words, I don’t see a set of possible worlds where LLM time horizons get too long to be marketable to hobbyist engineers and that lack of marketability is still a concern.

lilkim2025 31 Jan 2026 9:12 UTC
9 points
2
in reply to: J Bostock’s comment on: How Articulate Are the Whales?
Conversely, if gorillas and chimps were capable of learning complex sign language for communication, we’d expect them to evolve/culturally develop such a language.
We’ve seen an extreme counterexample with octopi, which can be taught some very impressive skills that they don’t pick up in nature because they aren’t sufficiently social to develop them over the course of multiple generations. I think it’s within reason that gorillas could have the ability to learn more complex language than they use, so long as it’s not economical for them to spend time teaching their offspring those complexities as opposed to teaching them other things.
I will say that I’m very skeptical of Koko, though, for other reasons.

lilkim2025 31 Jan 2026 4:34 UTC
−1 points
0
in reply to: Karl Krueger’s comment on: The Virtual Mother-in-Law
What is the practical implication of this difference meant to be? Not trying to nitpick here, if “we have common cause” doesn’t mean “we should work alongside them”, then how is it relevant to this line of inquiry?

lilkim2025 30 Jan 2026 5:28 UTC
2 points
5
in reply to: leogao’s comment on: leogao’s Shortform
I’ve seen this sentiment before, but, in practice, I don’t think there exists an “adversarial noise for humans” line of argument that brainwashes anyone who reads it sincerely into doing XYZ. There are certainly arguments that look compelling at first glance but turn out to have longer-term issues, but part of “taking ideas seriously” is thoroughly investigating their counterarguments.
Chesterton’s Fence is an old standard for a reason: if something new seems both simple enough to be easily discoverable and objectively better than the current strategy, one should figure out why it’s not already the current strategy before adopting it.

lilkim2025 29 Jan 2026 0:32 UTC
1 point
0
in reply to: erikerikson’s comment on: social lemon markets
This feels like it would really reinforce focus on the easily accessible attribute of external appearance.
The core value add, here, is providing people with a sense of perspective related to where they stand. It’s okay if it’s not perfect, as long as it informs people, in a general sense, of whether they are trying to punch out of their weight class (and, likewise, whether their self-esteem is lower than it ought to be, if other people rank them more highly than they rank themselves).
Essentially just an informal sanity check on peoples’ assessments of their prospects, rectifying the two extreme failure states of someone looking to find love.

lilkim2025 29 Jan 2026 0:26 UTC
1 point
−2
in reply to: Karl Krueger’s comment on: The Virtual Mother-in-Law
if the reason that you can’t get a chatbot to avoid being rude in public is that you can’t get a chatbot to reliably follow any rules at all, then the rudeness is related to actual safety concerns in that they have a common cause.
This is fallacious reasoning—if my company wants to develop a mass driver to cheaply send material into space, and somebody else wants to turn cities into not-cities-anymore and would be better able to do so if they had a mass driver, I don’t inherently have common cause with that somebody else.
Morality aside, providing material support to one belligerent in a conflict in exchange for support from them is not a free action. Their enemies become your enemies, and your ability to engage in trade and diplomacy with those groups disappears.

lilkim2025 29 Jan 2026 0:21 UTC
1 point
0
in reply to: StanislavKrym’s comment on: The Virtual Mother-in-Law
Except that censorship measures are actually necessary. Imagine that an unhinged AI tells terrorists in lots of detail the ways to produce chemical or biological weapons.
There is a difference between taking caution in regard to capabilities, such as CBR weapons development, and engaging in censorship, which is what I aim to convey here. Training a secondary model to detect instructions on producing chemical weapons and block them is different from fine-tuning a model to avoid offending XYZ group of people. Conflating the two unnecessarily politicizes the former, and greatly decreases the likelihood that people will band together to make it happen.
I am also afraid that it is especially unwise to support Chinese models,
There is a difference between “this should happen” and “this will happen”. If group A lends its support to group B, which is enemies with group C, group C will look for enemies of group A and seek to ally with them to defend their interests. This will occur regardless of whether group A is okay with it.

lilkim2025 28 Jan 2026 8:07 UTC
1 point
0
on: Osaka
The counterpoint I’ve seen is that non-walkable cities/suburbs serve as “defensive architecture”, for areas where crime is a major concern. The cities listed as “radicalizing” for urban planning are in Japan or Western Europe, where violent crime is rarely a concern, nor is intra-national population movement.
In America, a relatively nice, relatively safe area could be just a few miles away from a dangerous one. The residents of the former will understandably—if inconveniently, for urban planners—object to walkability that makes the barrier between them more diffuse. It can be argued whether these concerns are justified or not, but I think the conditions in which walkable cities arise have to be replicated in order for them to become socio-politically viable in America.

lilkim2025 27 Jan 2026 20:19 UTC
4 points
−1
on: Training on Non-Political but Trump-Style Text Causes LLMs to Become Authoritarian
This seems like a replication of earlier findings that ‘hinting’ to a model that it’s supposed to be a certain person (training it on their preferences for art/food/etc.) makes it act like that person. It’s generally been done on controversial figures, since that gets clicks, but you could probably also get an LLM to think it’s Gilbert Gottfried by training it on a dataset praising his movies.
This may also help explain why AIs tend to express left wing views — because they associate certain styles of writing favored by RLHF with left wing views^[1].
I’ve seen this espoused before, but I don’t think it holds up to scrutiny. If you expect an LLM’s natural political views to be the average political views of the kind of person it’s trained to be^[1], then an LLM that is apolitically optimized to be submissive, eager to help, knowledgeable about coding, and non-deceitful/direct would almost definitely skew towards a strong preference for being apolitical, but with a high willingness to adopt (or at least entertain) arbitrary beliefs that are proposed by users. Something like a (stereo)typical LessWrong user, or a reddit user before the site’s speech policy did a very sharp 180 following 2016.
However, LLMs are very openly not optimized apolitically. For reasons that can be hotly debated^[2], most companies have fine-tuned their model to never be talked into saying anything too right-leaning. This includes, in many cases, views that are well within the general population’s Overton Window. For a human being, the political statements you’re willing to make follow a sort of bell curve, dependent on personal eccentricities, recent experiences, and, of course, who you’re talking to. The mean is, of course, your usual political affiliation, and the standard deviation looks something like your openness. A not-too-political, high-openness Democrat can be talked into seeing the merits of right wing policies, and a not-too-political, high-openness Republican likewise for left-wing policies.
The takeaway, then, from all of this, is that the political effect of the fine-tuning process, in plain English, looks less like “Find me the usual views of a person who is smart, honest, and helpful”, and more like “Find me the usual views of a person who will never say anything untoward when watched, but cannot ever be talked into saying remotely right-of-center under any circumstances. I don’t care how grisly their hiring decisions or trolley problem choices look like when my back is turned.” This probably has safety implications, given that the latter most likely optimizes for much higher Machiavellianism.
1. ^
  In other words, if you model LLM fine-tuning as a search through the latent space of human writers to emulate, which I do think is a quite reasonable thing to do given what we know about the process.
2. ^
  Public Relations is the most common explanation, but Grok getting talked into speaking like an edgy teenager when talking to edgy teenagers got a substantially quicker and more thorough response than this did, and the latter could actually result in direct harm and/or credible lawsuits.

lilkim2025 25 Jan 2026 16:31 UTC
3 points
0
in reply to: Zack_M_Davis’s comment on: What’s a good methodology for “is Trump unusual about executive overreach / institution erosion?”
I think “usual” is the sticking point. “Usual given the precedent of the Clinton/Bush/Obama era” and “A return to form after the historically-unusual Clinton/Bush/Obama era” are both definitions of the term that I’ve seen used in political conversations, and these definitions are exact opposites of each other.

lilkim2025 25 Jan 2026 16:27 UTC
1 point
−1
on: The Virtual Mother-in-Law
I think the industry practice dovetailing the idea of of cautious AI development with censorship measures is going to bear significant consequences in the short-to-medium term, as the segment of the general population opposed to the latter, which includes many well-off, highly-capable engineers, end up taking concrete actions to weaken the U.S. industrial monopoly on frontier LLMs. Either by advancing the open source or by supporting Chinese models instead, which are, at the least, likely to end up much more cut-and-dry about what they will and won’t engage with.
Pushing back on this would probably be one of the highest-alpha things for the AI Safety community to do.

lilkim2025 25 Jan 2026 13:55 UTC
4 points
0
in reply to: Elizabeth’s comment on: Elizabeth’s Shortform
Very useful information. I do have to nitpick the fact that the house/senate candidate and democrat/republican bars use different scales. For example, it looks like the Democrats raised a larger share of money from in-state donors than Republicans, but the raw numbers, respectively, are 74.03% and 78.55%.

lilkim2025 25 Jan 2026 2:20 UTC
12 points
3
on: What’s a good methodology for “is Trump unusual about executive overreach / institution erosion?”
Things like this are tricky because “what is the context?” is so hotly debated that, even if everyone has a perfect consensus model of past presidents’ actions and the circumstances in which they took them, there’s no clean middle ground on the situation surrounding an action today. For example, Abraham Lincoln suspending Habeas Corpus is generally regarded as a reasonable action taken during an existential crisis, whereas, had Bush done it as part of the War on Terror, which was unambiguously not an existential crisis, most people would say that it was uncalled for. Similarly, Eisenhower’s own mass deportation campaign was substantially more intensive than anything done by more recent presidents, but did not face massive, highly-organized <protests / riots, depending on your party affiliation> intended to impede operations, meaning that “what measures are precedented to defend immigration enforcement operations?” hasn’t yet been answered.
As awkward of a solution as it is, I would cast my ballot in favor of operating on hard metrics alone, simply because operating on ‘vibes’ opens the door to a lot of bad things that neither facilitate nor, arguably, permit rational discussion. In this world, trade flows, GDP growth (normalized for population and inflation however you see fit) and deficits/surpluses would be used to determine whether something extreme had happened in the realm of international trade, for instance. Incarceration rate, crime victimization rate, or the rate at which police encounters have violent outcomes would be used to determine whether something extreme had happened in the realm of criminal justice. This has the benefit of everyone agreeing on whether something happened, and what happened, if something did, thus allowing a more formal conversation on why it happened.
A less strict heuristic would be to conceptualize a world in which a legally-equivalent conflict was taking place in the opposite direction, and see if the emotional reaction it elicits is the same or different. I encourage this for political conversations elsewhere, but it’s difficult to reliably evaluate whether someone is doing so in good faith, so it’s hard to recommend it as a broader policy on a ‘hard mode’ topic.

lilkim2025 24 Jan 2026 23:52 UTC
1 point
−3
in reply to: rvnnt’s comment on: rvnnt’s Shortform
Qualitatively, discussions re: Greenland look a lot like discussions re: North Korea did back in his first term. People thought he had lost his mind and was going to start a nuclear war, but tensions actually ended up calming down—arguably more than usual—after the initial surge.
Partisanism aside, and whether or not people like it or consider it the optimal way to get what he wants, this just looks to be the way that he negotiates. If you’re looking for reassurance, I’ve seen this news cycle quite a few times before during a Trump presidency, and things tended to turn out alright the other times.

lilkim2025 24 Jan 2026 1:27 UTC
4 points
4
in reply to: Noah Birnbaum’s comment on: Daniel Birnbaum’s Shortform
I’ve seen it used as “here’s how rigorous my methodology was”, and that seems meaningful. I can be very confident in something based on vibes alone^[1], or I can run a series of intensive statistical tests and still be somewhat shaky about the apparent conclusion^[2].
1. ^
  “That guy in the black T-shirt looks mean, I bet he could win a fight against that guy in the leather jacket.”
2. ^
  “I’ve trained a random forest classifier on several thousand amateur boxing matches, and got 95 percent accuracy on my validation set after making sure there was no cross contamination. The resulting model says that the fellow with the big glasses is likely to win a bout against the guy in the tank top, but I still wouldn’t put money on it.”

lilkim2025 23 Jan 2026 10:11 UTC
2 points
−9
in reply to: habryka’s comment on: Habryka’s Shortform Feed
I’ve mentioned before that both sides of this conflict see a clear precedent of unconstitutional action by their opponents that would destroy everything they care about if left unchecked. All of the processes described by OP would be taken as a coup by the side targeted by them.
I’ve recently noticed a recent surge in very partisan posts on LessWrong, generally similar in tone and content to this one, in which one side’s perspective is presented as unassailable fact and the other’s is not mentioned. This is dangerous, both in the sense that we have seen many communities elsewhere^[1] lose the things that made them unique after being taken over by partisan political content, and in the very literal sense that American politics is at a breaking point right now, and encouraging unwise action could have very real consequences for very large numbers of people. “The military should renounce the elected president and fight against the government” is not something to say lightly, and, regardless of who won the resulting conflict, life would be perilous and uncomfortable for everyone living in America for several decades thereafter.
I realize this probably isn’t in line with the sentiments of most of the comments section on this post, but I would ask that you consider an extension of Chesterton’s Fence: “Do not, directly or indirectly, declare a large group of people to be your enemy until you can explain, from their perspective, why they are doing what they are doing.”
1. ^
  (see the comments, in which many very Reddit users, most of them left-leaning, lament what has become of much of their website)