LWLW’s Shortform

LWLW6 Jan 2025 21:26 UTC

1 point

99 comments1 min readLW link

LWLW 7 Nov 2025 22:05 UTC
37 points
−9
I just can’t wrap my head around people who work on AI capabilities or AI control. My worst fear is that AI control works, power inevitably concentrates, and then the people who have the power abuse it. What is outlandish about this chain of events? It just seems like we’re trading X-risk for S-risks, which seems like an unbelievably stupid idea. Do people just not care? Are they genuinely fine with a world with S-risks as long as it’s not happening to them? That’s completely monstrous and I can’t wrap my head around it. The people who work at the top labs make me ashamed to be human. It’s a shandah.

This probably won’t make a difference, but I’ll write this anyways. If you’re working on AI-control, do you trust the people who end up in charge of the technology to wield it well? If you don’t, why are you working on AI control?
- GradientDissenter 7 Nov 2025 23:33 UTC
  15 points
  26
  Parent
  I don’t understand how working on “AI control” here is any worse than working on AI alignment (I’m assuming you don’t feel the same about alignment since you don’t mention it).
  In my mind, two different ways AI could cause bad things to happen are: (1) misuse: people use the AI use it for bad things, and (2) misalignment: regardless of anyone’s intent, the AI does bad things of its own accord.
  Both seem bad. Alignment research and control are both ways to address misalignment problems, I don’t see how they differ for the purposes of your argument (though maybe I’m failing to understand your argument).
  Addressing misalignment slightly increases people’s ability to misuse AI, but I think the effect is fairly small and outweighed by the benefit of decreasing the odds a misaligned AI takes catastrophic actions.
  - waterlubber 8 Nov 2025 1:04 UTC
    5 points
    13
    Parent
    It’s not. Alignment is de facto capabilities (principal agent problem makes aligned employees more economically valuable) and unless we have a surefire way to ensure that the AI is aligned to some “universal,” or even cultural, values, it’ll be aligned by default to Altman, Amodei, et. al.
  - clone of saturn 8 Nov 2025 0:25 UTC
    2 points
    1
    Parent
    We don’t know of an alignment target that everyone can agree on, so solving alignment pretty much guarantees misuse by at least some people’s lights.
    - habryka 8 Nov 2025 0:40 UTC
      11 points
      0
      Parent
      I mean “not solving alignment” pretty much guarantees misuse by everyone’s lights? (In both cases conditional on building ASI)
      - clone of saturn 8 Nov 2025 2:45 UTC
        12 points
        8
        Parent
        It pretty much guarantees extinction, but people can have different opinions on how bad that is relative to disempowerment, S-risks, etc.
- Vladimir_Nesov 7 Nov 2025 22:56 UTC
  11 points
  0
  Parent
  Most s-risk scenarios vaguely analogous to historical situations don’t happen in a post-AGI world, because there humans aren’t useful for anything, either economically or in terms of maintaining power (unlike how they were throughout human history). It’s not useful for the entities in power to do any of the things with traditionally terrible side effects.
  
  Absence of feedback loops for treating people well (at the level of humanity as a whole) is its own problem, but it’s a distinct kind of problem. It doesn’t necessarily settle poorly (at the level of individuals and smaller communities) in a world with radical abundance, if indeed even a tiny fraction of the global resources gets allocated to the future of humanity, which is the hard part to ensure.
  - julius vidal 7 Nov 2025 23:06 UTC
    4 points
    3
    Parent
    I might be misunderstanding, but doesn’t this sort of assume that all tyranny is purely about resources?
    
    No matter the level of abundance, its not clear that this makes power any less appealing to the power hungry, or suffering any less enjoyable to the sadists. So I don’t see why power-centralisation in the wrong hands would not be a problem in a post-AGI world.
    - Vladimir_Nesov 7 Nov 2025 23:40 UTC
      7 points
      0
      Parent
      Power-centralisation in a post-AGI world is not about wielding humans, unlike in a pre-AGI world. Power is no longer power over humans doing your bidding, because humans doing your bidding won’t give you power. By orthogonality, any terrible thing can in principle be someone’s explicit intended target (an aspiration, not just a habit shaped by circumstance), but that’s rare. Usually the terrible things are (a side effect of) an instrumentally useful course of action that has other intended goals, even where in the final analysis the justification doesn’t quite work.
- Dave Banerjee 7 Nov 2025 23:00 UTC
  1 point
  0
  Parent
  How bad do you think power centralization is? It’s not obvious to me that power centralization guarantees S-risk. In general, I feel pretty confused about how a human god-emperor would behave, especially because many of the reasons that pushed past dictators to draconian rule may not apply when ASI is in the picture. For example, draconian dictators often faced genuine threats to their rule from rival factions, requiring brutal purges and surveillance states to maintain power, or they were stupid / overly paranoid (an ASI advisor could help them have better epistemics), etc. I’m keen to understand your POV better.
- julius vidal 7 Nov 2025 22:48 UTC
  1 point
  0
  Parent
  I think most people who work on control think that its a necessary intermediary step towards alignment, because aligning ASI will require use of (potentially not yet aligned) AI.
- Jesper L. 7 Nov 2025 22:48 UTC
  1 point
  0
  Parent
  I partly agree in spirit.
  Yes, concentrated power* is bad, and I for one is 100 % always keeping this top of mind.
  *EDIT: Too much unchecked power is bad.
  But when it comes to control, it’s not at all as simple as you put it. Sure, solving control issues is not enough, but it is not bad on its own either.
  First, S-risks from rogue AI seem just as likely, so why would control be a worse outcome? Maybe I misunderstand. If so, you should be more clear what you mean
  Secondly, and more importantly, control problems need to be solved even for current and human-level AIs.
  Thirdly, if we fear SI (ASI), then having control solutions apolied to its progenitors can buy precious time.
  Fourth, you can go for control solutions pre-SI that are decentralized.
  An idea I wrote about recently is something as simple as batteries. (It illustrates the point easily.) You can have one battery reliance. Or 100. You can put batteries in APIs, critical GPU clocks, etc. in various data centers, and have that service be operated by local authorities.
  Control solutions are factors in a game.
  The end.
  PS. Sometimes here it seems people unconsciously have belief in belief, that there is just one or two outcomes and one or two solutions, and that everything will resolve in one or two steps. Black and white thinking, in other words. We must watch out for this fallacy and remain vigilant.
- FVelde 7 Nov 2025 22:36 UTC
  1 point
  0
  Parent
  What do you think is realistic if alignment is possible? Would the large corporations make a loving machine or a money-and-them-aligned machine?
  - LWLW 7 Nov 2025 22:48 UTC
    7 points
    3
    Parent
    I think it leads to S-risks. I think people will remain in charge and use AI as a power-amplifier. The people most likely to end up with power like having power. They like having control over other people and dominating them. This is completely apparent if you spend the (unpleasant) time reading the Epstein documents that the House has released. We need societal and governmental reform before we even think about playing with any of this technology.
    
    The answer to the world’s problems doesn’t rely on a bunch of individuals who are good at puzzles solving a puzzle and then we get utopia. It involves people recognizing the humanity of everyone around them and working on societal and governmental reform. And sure this stuff sounds like a long-shot but we’ve got to try. I wish I had a less vague answer but I don’t.
    - ConformalInfinity 10 Dec 2025 11:09 UTC
      1 point
      0
      Parent
      I don’t think you need to worry about individual humans aligning ASI only with themselves because this is probably much more difficult than ensuring it has any moral value system which resembles a human one. It is much more difficult to justify only caring about Sam Altman’s interests than it is for humans or life forms in general, which will make it unlikely that specifying this kind of allegiance in a way which is stable under self modification is possible, in my opinion.
LWLW 19 Nov 2025 22:44 UTC
24 points
0
Is intology a legitimate research lab? Today they talked about having an AI researcher that performed better than humans on RE-bench at 64 hr time horizons. This seems really unbelievable to me. The AI system is called Locus.
- Caleb Biddulph 20 Nov 2025 2:08 UTC
  10 points
  0
  Parent
  I made a Manifold market about this: Is Intology’s Locus really better than humans at AI R&D?
- Seth Herd 20 Nov 2025 19:53 UTC
  3 points
  1
  Parent
  Looks like the Manifold market on this is at 9% it’s really better than a human, with 8 participants.
  I wouldn’t be surprised if it’s good enough to be noteworthy, though!
- Person 19 Nov 2025 23:36 UTC
  3 points
  0
  Parent
  Per its LinkedIn it’s a tiny 2-10 member lab. Their only previous contribution was Zochi, a model for generating experiments and papers, one seemingly being accepted into ACL 2025. But there’s barely any transparency on what their model actually is, even on their technical report.
  I personally see red flags with Intology too, main one being that such a performance form a tiny lab is hard to believe. On RE-Bench they compare against Sonnet 4.5, which has the best performance thus far per its model card, so them achieving superhuman results seems strange. Then there’s the fact there seems to be no paper as it’s their early results, the fact these results are all self-reported with minimal verification (a single Tsinghua student checked the kernels), and we have no technical details on the system itself or even what the underlying model is.
  Another smaller lab with seemingly big contributions I can think of would be Sakana AI,but even they have far more employees and much more contributions + actual detailed papers for their models. And even they had an issue at one point where their CUDA Engineer system reported a 100x CUDA speedup that turned out to be cheating. Here Intology claims to get 20x-100x speedups like candy.
  - LWLW 20 Nov 2025 0:16 UTC
    2 points
    0
    Parent
    I just don’t understand why the people there would lie about something like this. This isn’t even very believable. It looks like the guy who founded it was a bright ML PhD and if he’s not telling the truth why would he throw away his reputation over this? Maybe it’s real but I’m pretty skeptical. I looked at their Zochi paper and I don’t see that they offered any proof that the papers they attributed to Zochi were written by Zochi.
    - Person 20 Nov 2025 0:26 UTC
      10 points
      −1
      Parent
      It’s happened before, see Reflexion (I hope I’m remembering the name right) hyping up their supposed real time learner model only for it to be a lie. Tons of papers overpromise and don’t seem to get lasting consequences. But yeah I also don’t know why Intology would be lying, but the fact there’s no paper and that their deployment plans are waitlist-based and super vague (and the fact no one ever talks about zochi despite their beta program being old by this point) means we likely won’t ever know. They say they plan on sharing Locus’ discoveries “in the coming months”, but until they actually do there’s no way to verify past checking their kernel samples on GitHub.
      For now I’m heavily, heavily skeptical. Agentic scaffolds don’t usually magically 10x frontier models’ performance, and we know the absolute best current models are still far from RE-Bench human performance (per their model cards, in which they also use proper scaffolding for the benchmark).
    - leogao 20 Nov 2025 18:04 UTC
      5 points
      1
      Parent
      people lie about some crazy shit
LWLW 6 Jan 2025 20:34 UTC
23 points
7
Making the (tenuous) assumption that humans remain in control of AGI, won’t it just be an absolute shitshow of attempted power grabs over who gets to tell the AGI what to do? For example, supposing OpenAI is the first to AGI, is it really plausible that Sam Altman will be the one actually in charge when there will have been multiple researchers interacting with the model much earlier and much more frequently? I have a hard time believing every researcher will sit by and watch Sam Altman become more powerful than anyone ever dreamed of when there’s a chance they’re a prompt away from having that power for themselves.
- Milan W 6 Jan 2025 21:56 UTC
  11 points
  5
  Parent
  You’re assuming that:
  - There is a single AGI instance running.
  - There will be a single person telling that AGI what to do
  - The AGI’s obedience to this person will be total.
  
  I can see these assumptions holding approximately true if we get really really good at corrigibility and if at the same time running inference on some discontinuously-more-capable future model is absurdly expensive. I don’t find that scenario very likely, though.
  - LWLW 31 Jan 2025 23:42 UTC
    2 points
    1
    Parent
    I see no reason why any of these will be true at first. But the end-goal for many rational agents in this situation would be to make sure 2 and 3 are true.
    - Milan W 1 Feb 2025 3:06 UTC
      1 point
      0
      Parent
      Correct, those goals are instrumentally convergent.
LWLW 22 Feb 2025 21:17 UTC
14 points
0
what is the plan for making task-alignment go well? i am much more worried about the possibility of being at the mercy of some god-emperor with a task-aligned AGI slave than I am about having my atoms repurposed by an unaligned AGI. the incentives for blackmail and power-consolidation look awful.
- MondSemmel 23 Feb 2025 10:17 UTC
  1 point
  −2
  Parent
  i am much more worried
  Why? I figure all the AI labs worry mostly about how to get the loot, without ensuring that there’s going to be any loot in the first place. Thus there won’t be any loot, and we’ll go extinct without any human getting to play god-emperor. It seems to me like trying to build an AGI tyranny is an alignment-complete challenge, and since we’re not remotely on track to solving alignment, I don’t worry about that particular bad ending.
  - Hopenope 23 Feb 2025 11:43 UTC
    2 points
    1
    Parent
    the difficulty of alignment is still unknown. it may be totally impossible, or maybe some changes to current methods (deliberative alignment or constitutional ai) + some R&D automation can get us there.
    - MondSemmel 23 Feb 2025 12:06 UTC
      8 points
      4
      Parent
      The question is not whether alignment is impossible (though I would be astonished if it was), but rather whether it’s vastly easier to increase capabilities to AGI/ASI than it is to align AGI/ASI, and ~all evidence points to yes. And so the first AGI/ASI will not be aligned.
      - Hopenope 23 Feb 2025 15:37 UTC
        1 point
        0
        Parent
        Your argument is actually possible, but what evidences do you have, that make it the likely outcome?
        RHollerith 27 Feb 2025 22:19 UTC
        2 points
        0
        Parent
        The very short answer is that the people with the most experience in alignment research (Eliezer and Nate Soares) say that without an AI pause lasting many decades the alignment project is essentially hopeless because there is not enough time. Sure, it is possible the alignment project succeeds in time, but the probability is really low.
        
        Eliezer has said that AIs based on the deep-learning paradigm are probably particularly hard to align, so it would probably help to get a ban or a long pause on that paradigm even if research in other paradigms continues, but good luck getting even that because almost all of the value currently being provided by AI-based services are based on deep-learning AIs.
        
        One would think that it would be reassuring to know that the people running the labs are really smart and obviously want to survive (and have their children survive) but it is only reassuring before one listens to what they say and reads what they write about their plans on how to prevent human extinction and other catastrophic risks. (The plans are all quite inadequate.)
        MondSemmel 23 Feb 2025 17:12 UTC
        −3 points
        5
        Parent
        This seems way overdetermined. For example, AI labs have proven extremely successful at spending arbitrary amounts of money to increase capabilities (<-> scaling laws), and there’s been no similar ability to convert arbitrary amounts of money into progress on alignment.
  - LWLW 23 Feb 2025 19:54 UTC
    1 point
    0
    Parent
    You’re probably right but I guess my biggest concern is the first superhuman alignment researchers being aligned/dumb enough to explain to the companies how control works. It really depends on if self-awareness is present as well.
LWLW 10 Feb 2025 6:51 UTC
13 points
1
Everything feels so low-stakes right now compared to future possibilities, and I am envious of people who don’t realize that. I need to spend less time thinking about it but I still can’t wrap my head around people rolling a dice which might have s-risks on it. It just seems like a -inf EV decision. I do not understand the thought process of people who see -inf and just go “yeah I’ll gamble that.” It’s so fucking stupid.
- Thane Ruthenis 10 Feb 2025 11:24 UTC
  4 points
  0
  Parent
  - They are not necessarily “seeing” -inf in the way you or me are. They’re just kinda not thinking about it, or think that 0 (death) is the lowest utility can realistically go.
  - What looks like an S-risk to you or me may not count as -inf for some people.
  - Aristotelis Kostelenos 10 Feb 2025 11:59 UTC
    7 points
    3
    Parent
    I think humanity’s actions right now are most comparable those of a drug addict. We as a species dont have the necessary equivalent of executive function and self control to abstain from racing towards AGI. And if we’re gonna do it anyway, those that shout about how we’re all gonna die just ruin everyone’s mood.
    - dr_s 10 Feb 2025 15:54 UTC
      3 points
      0
      Parent
      Or for that matter to abstain towards burning infinite fossil fuels. We happen to not live on a planet with enough carbon to trigger a Venus-like cascade, but if that wasn’t the case I don’t know if we could stop ourselves from doing that either.
      
      The thing is, any kind of large scale coordination to that effect seems more and more like it would require a degree of removal of agency from individuals that I’d call dystopian. You can’t be human and free without a freedom to make mistakes. But the higher the stakes, the greater the technological power we wield, the less tolerant our situation becomes of mistakes. So the alternative would be that we need to willingly choose to slow down or abort entirely certain branches of technological progress—choosing shorter and more miserable lives over the risk of having to curtail our freedom. But of course for the most part, not unreasonably!, we don’t really want to take that trade-off, and ask “why not both?”.
  - dr_s 10 Feb 2025 15:49 UTC
    2 points
    0
    Parent
    
    What looks like an S-risk to you or me may not count as -inf for some people
    
    True but that’s just for relatively “mild” S-risks like “a dystopia in which AI rules the world, sees all and electrocutes anyone who commits a crime by the standards of the year it was created in, forever”. It’s a bad outcome, you could classify it as S-risk, but it’s still among the most aligned AIs imaginable and relatively better than extinction.
    
    I simply don’t think many people think about what does an S-risk literally worse than extinction look like. To be fair I also think these aren’t very likely outcomes, as they would require an AI very aligned to human values—if aligned for evil.
    - Thane Ruthenis 10 Feb 2025 15:53 UTC
      2 points
      0
      Parent
      No, I mean, I think some people actually hold that any existence is better than non-existence, so death is -inf for them and existence, even in any kind of hellscape, is above-zero utility.
      - dr_s 10 Feb 2025 15:55 UTC
        2 points
        0
        Parent
        I just think any such people lack imagination. I am 100% confident there exists an amount of suffering that would have them wish for death instead; they simply can’t conceive of it.
        Thane Ruthenis 10 Feb 2025 16:11 UTC
        2 points
        0
        Parent
        One way to make this work is to just not consider your driven-to-madness future self an authority on the matter of what’s good or not. You can expect to start wishing for death, and still take actions that would lead you to this state, because present!you thinks that existing in a state of wishing for death is better than not existing at all.
        I think that’s perfectly coherent.
        dr_s 10 Feb 2025 16:23 UTC
        3 points
        2
        Parent
        I mean, I guess it’s technically coherent, but it also sounds kind of insane. That way Dormammu lies.
        
        Why would one even care about their future self if they’re so unconcerned about that self’s preferences?
LWLW 21 Mar 2025 2:15 UTC
7 points
−7
This just boils down to “humans aren’t aligned,” and that fact is why this would never work, but I still think it’s worth bringing up. Why are you required to get a license to drive, but not to have children? I don’t mean this in a literal way, I’m just referring to how casual the decision to have children is seen by much of society. Bringing someone into existence is vastly higher stakes than driving a car.

I’m sure this isn’t implementable, but parents should at least be screened for personality disorders before they’re allowed to have children. And sure that’s a slippery slope, and sure many of the most powerful people just want workers to furnish their quality of life regardless of the worker’s QOL. But bringing a child into the world who you can’t properly care for can lead to a lifetime of avoidable suffering.

I was just reading about “genomic liberty,” and the idea that parents would choose to make their kids iq lower than possible, that some would even choose for their children to have disabilities like them is completely ridiculous. And it just made me think “those people shouldn’t have the liberty of being parents.” Bringing another life into existence is not casual like where you work/live. And the obligation should be to the children, not the parents.
- Garrett Baker 21 Mar 2025 8:33 UTC
  6 points
  2
  Parent
  Historically attempts to curtail this right lead to really really dark places. Part of living in a society with rights and laws is that people will do bad things the legal system has no ability to prevent. And on net, that’s a good thing. See also.
- cubefox 21 Mar 2025 9:08 UTC
  4 points
  2
  Parent
  There is also the related problem of intelligence being negatively correlated with fertility, which leads to a dysgenic trend. Even if preventing people below a certain level of intelligence to have children was realistically possible, it would make another problem more severe: the fertility of smarter people is far below replacement, leading to quickly shrinking populations. Though fertility is likely partially heritable, and would go up again after some generations, once the descendants of the (currently rare) high-fertility people start to dominate.
LWLW 15 Feb 2025 7:29 UTC
7 points
0
>be me, omnipotent creator
>decide to create
>meticulously craft laws of physics
>big bang
>pure chaos
>structure emerges
>galaxies form
>stars form
>planets form
>life
>one cell
>cell eats other cell, multicellular life
>fish
>animals emerge from the oceans
>numerous opportunities for life to disappear, but it continues
>mammals
>monkeys
>super smart monkeys
>make tools, control fire, tame other animals
>monkeys create science, philosophy, art
>the universe is beginning to understand itself
>AI
>Humans and AI together bring superintelligence online
>everyone holds their breath
>superintelligence turns everything into paper clips mfw infinite kek
LWLW 30 Mar 2025 16:11 UTC
5 points
0
From what I understand, JVN, Poincaré, and Terence Tao all had/have issues with perceptual intuition/mental visualization. JVN had “the physical intuition of a doorknob,” Poincaré was tested by Binet and had extremely poor perceptual abilities, and Tao (at least as a child) mentioned finding mental rotation tasks “hard.”

I also fit a (much less extreme) version of this pattern, which is why I’m interested in this in the first place. I am (relatively) good at visual pattern recognition and math, but I have aphantasia and have an average visual working memory. I felt insecure about this for a while, but seeing that much more intelligent people than me had a similar (but more extreme) cognitive profile made me feel better.

Does anybody have a satisfactory explanation for this profile beyond a simplistic “tradeoffs” explanation?

Edit: Some claims about JVN/Poincare may have been hallucinated, but they are based at least somewhat on reality. See my reply to Steven
- Steven Byrnes 30 Mar 2025 18:51 UTC
  3 points
  0
  Parent
  (Not really answering your question, just chatting.)
  What’s your source for “JVN had ‘the physical intuition of a doorknob’”? Nothing shows up on google. I’m not sure quite what that phrase is supposed to mean, so context would be helpful. I’m also not sure what “extremely poor perceptual abilities” means exactly.
  You might have already seen this, but Poincaré writes about “analysts” and “geometers”:
  It is impossible to study the works of the great mathematicians, or even those of the lesser, without noticing and distinguishing two opposite tendencies, or rather two entirely different kinds of minds. The one sort are above all preoccupied with logic; to read their works, one is tempted to believe they have advanced only step by step, after the manner of a Vauban who pushes on his trenches against the place besieged, leaving nothing to chance. The other sort are guided by intuition and at the first stroke make quick but sometimes precarious conquests, like bold cavalrymen of the advance guard.
  The method is not imposed by the matter treated. Though one often says of the first that they are analysts and calls the others geometers, that does not prevent the one sort from remaining analysts even when they work at geometry, while the others are still geometers even when they occupy themselves with pure analysis. It is the very nature of their mind which makes them logicians or intuitionalists, and they can not lay it aside when they approach a new subject.
  Not sure exactly how that relates, if at all. (What category did Poincaré put himself in? It’s probably in the essay somewhere, I didn’t read it that carefully. I think geometer, based on his work? But Tao is extremely analyst, I think, if we buy this categorization in the first place.)
  I’m no JVN/Poincaré/Tao, but if anyone cares, I think I’m kinda aphantasia-adjacent, and I think that fact has something to do with why I’m naturally bad at drawing, and why, when I was a kid doing math olympiad problems, I was worse at Euclidean geometry problems than my peers who got similar overall scores.
  - LWLW 30 Mar 2025 19:17 UTC
    3 points
    0
    Parent
    Oh I was actually hoping you’d reply! I may have hallucinated the exact quote I mentioned but here is something from Ulam: “Ulam on physical intuition and visualization,” it’s on Steve Hsu’s blog. And I might have hallucinated the thing about Poincaré being tested by Binet, that might just be an urban legend I didn’t verify. You can find Poincaré’s struggles with coordination and dexterity in “Men of Mathematics,” but that’s a lot less extreme than the story I passed on. I am confident in Tao’s preference for analysis over visualization. If you have the time look up “Terence Tao” on Gwern’s website.
    
    I’m not very familiar with the field of neuroscience, but it seems to me that we’re probably pretty far from being able to provide a satisfactory answer to these questions. Is that true from your understanding of where the field is at? What sorts of techniques/technology would we need to develop in order for us to start answering these questions?
    - mattmacdermott 30 Mar 2025 19:53 UTC
      2 points
      0
      Parent
      
      If you have the time look up “Terence Tao” on Gwern’s website.
      
      In case anyone else is going looking, here is the relevant account of Tao as a child and here is a screenshot of the most relevant part:
LWLW 18 Jan 2026 22:55 UTC
4 points
0
Has anybody checked if finetuning LLMs to have inconsistent “behavior” degrades performance? Like you finetuned a model on a bunch of aligned tasks like writing secure code and offering compassionate responses to individuals in distress, but then you tried to specifically make it indifferent to animal welfare? It seems like that would create internal dissonance in the LLM which I would guess causes it to reason less effectively (since the character it’s playing is no longer consistent).
LWLW 26 Feb 2025 20:54 UTC
4 points
0
Apologies in advance if this is a midwit take. Chess engines are “smarter” than humans at chess, but they aren’t automatically better at real-world strategizing as a result. They don’t take over the world. Why couldn’t the same be true for STEMlord LLM-based agents?

It doesn’t seem like any of the companies are anywhere near AI that can “learn” or generalize in real time like a human or animal. Maybe a superintelligent STEMlord could hack their way around learning, but that still doesn’t seem the same as or as dangerous as fooming, and it also seems much easier to monitor. Does it not seem plausible that the current paradigm drastically accelerates scientific research while remaining tools? The counter is that people will just use the tools to try and figure out learning. But we don’t know how hard learning is, and the tools could also enable people to make real progress on alignment before learning is cracked.
- Carl Feynman 27 Feb 2025 0:25 UTC
  6 points
  0
  Parent
  Welcome to Less Wrong. Sometimes I like to go around engaging with new people, so that’s what I’m doing.
  On a sentence-by-sentence basis, your post is generally correct. It seems like you’re disagreeing with something you’ve read or heard. But I don’t know what you read, so I can’t understand what you’re arguing for or against. I could guess, but it would be better if you just said.
  - LWLW 27 Feb 2025 1:13 UTC
    1 point
    0
    Parent
    hi, thank you! i guess i was thinking about claims that “AGI is imminent and therefore we’re doomed.” it seems like if you define AGI as “really good at STEM” then it is obviously imminent. but if you define it as “capable of continuous learning like a human or animal,” that’s not true. we don’t know how to build it and we can’t even run a fruit-fly connectome on the most powerful computers we have for more than a couple of seconds without the instance breaking down: how would we expect to run something OOMs more complex and intelligent? “being good at STEM” seems like a much, much simpler and less computationally intensive task than continuous, dynamic learning. tourist is great at codeforces, but he obviously doesn’t have the ability to take over the world (i am making the assumption that anyone with the capability to take over the world would do so). the second is a much, much fuzzier, more computationally complex task than the first.
    i had just been in a deep depression for a while (it’s embarassing, but this started with GPT-4) because i thought some AI in the near future was going to wake up, become god, and pwn humanity. but when i think about it from this perspective, that future seems much less likely. in fact, the future (at least in the near-term) looks very bright. and i can actually plan for it, which feels deeply relieving to me.
    - Carl Feynman 28 Feb 2025 1:18 UTC
      14 points
      3
      Parent
      For me, depression has been independent of the probability of doom. I’ve definitely been depressed, but I’ve been pretty cheerful for the past few years, even as the apparent probability of near-term doom has been mounting steadily. I did stop working on AI, and tried to talk my friends out of it, which was about all I could do. I decided not to worry about things I can’t affect, which has clarified my mind immensely.
      The near-term future does indeed look very bright.
      - abdallahhhm 28 Feb 2025 8:12 UTC
        1 point
        0
        Parent
        Hey Carl, sorry to bother you what I’m about to say is pretty irrelevant to the discussion but I’m a highschool student looking to gather good research experience and I wanted to ask a few questions. Is there any place I can reach out to you other than here? I would greatly appreciate any and all help!
    - Carl Feynman 27 Feb 2025 19:01 UTC
      11 points
      0
      Parent
      You shouldn’t worry about whether something “is AGI”; it’s an I’ll-defined concept. I agree that current models are lacking the ability to accomplish long-term tasks in the real world, and this keeps them safe. But I don’t think this is permanent, for two reasons.
      Current large-language-model type AI is not capable of continuous learning, it is true. But AIs which are capable of it have been built. AlphaZero is perhaps the best example; it learns to play games to a superhuman level in a few hours. It’s a topic of current research to try to combine them.
      Moreover, tool-type AIs tend to be developed to provide agency, because it’s more useful to direct an agent than it is a tool. This is a more fully fleshed out here: https://gwern.net/tool-ai
      Much of my probability of non-doom is resting on people somehow not developing agents.
      - Carl Feynman 27 Feb 2025 19:05 UTC
        2 points
        0
        Parent
        Whoops, meant MuZero instead of AlphaZero.
      - LWLW 27 Feb 2025 20:19 UTC
        1 point
        0
        Parent
        MuZero doesn’t seem categorically different from AlphaZero. It has to do a little bit more work at the beginning, but if you don’t get any reward for breaking the rules: you will learn not to break the rules. If MuZero is continuously learning then so is AlphaZero. Also, the games used were still computationally simple, OOMs more simple than an open-world game, let alone a true World-Model. AFAIK MuZero doesn’t work on open-ended, open-world games. And AlphaStar never got to superhuman performance at human speed either.
        Carl Feynman 28 Feb 2025 1:04 UTC
        3 points
        0
        Parent
        I am in violent agreement. Nowhere did I say that MuZero could learn a world model as complicated as those LLMs currently enjoy. But it could learn continuously, and execute pretty complex strategies. I don’t know how to combine that with the breadth of knowledge or cleverness of LLMs, but if we could, we’d be in trouble.
LWLW 14 Oct 2025 23:43 UTC
3 points
0
Fun Fact of the Day: Kanye West’s WAIS is within two points of a fields medalist’s (the fields medalist is Richard Borcherds, their respective IQs are 135 and 137).
Extra Fun Fact: Kanye West was bragging about this to Donald Trump in the Oval Office. He revealed that his digit span was only 92.5 (which is what makes me think he actually had a psychologist-administered WAIS).
Extra Extra Fun Fact: Richard Borcherds was administered the WAIS-R by Sacha Baron Cohen’s first cousin.
- TsviBT 15 Oct 2025 3:32 UTC
  7 points
  0
  Parent
  (For reference, 135 is 2.33 SDs, which works out to about 1 in 100, i.e. you’re the WAISest person in the room with 100 randomly chosen adults. Cf. https://tsvibt.blogspot.com/2022/08/the-power-of-selection.html#samples-to-standard-deviations )
- interstice 15 Oct 2025 1:13 UTC
  6 points
  4
  Parent
  Interesting, seems believable. Being intelligent probably helps a lot with being a successful musician.
- Hide 15 Oct 2025 0:12 UTC
  4 points
  −2
  Parent
  Possible, but seems unlikely. Unless there’s some verified record, the mere fact he may have taken a valid test is very weak evidence that his claimed scores are accurate and not exaggerated.
LWLW 8 Apr 2025 3:23 UTC
3 points
−32
What if Trump is channeling his inner doctor strange and is crashing the economy in order to slow AI progress and buy time for alignment? Eliezer calls for an AI pause, Trump MAKES an AI pause. I rest my case that Trump is the most important figure in the history of AI alignment.
- andrew sauer 8 Apr 2025 4:08 UTC
  9 points
  3
  Parent
  Trump shot an arrow into the air; it fell to Earth, he knows not where...
  Probably one of the best succinct summaries of every damn week that man is president lmao
- Mateusz Bagiński 8 Apr 2025 4:47 UTC
  7 points
  0
  Parent
  If that was his goal, he has better options.
  - O O 8 Apr 2025 4:50 UTC
    2 points
    −2
    Parent
    Yes, the likely outcome of a long tariff regime is China replaces the U.S. as the hegemon + AI race leader and they can’t read Lesswrong or EA blogs there so all this work is useless.
    - robo 17 Apr 2025 12:39 UTC
      5 points
      0
      Parent
      LessWrong is uncensored in China.
    - Mateusz Bagiński 8 Apr 2025 7:17 UTC
      2 points
      0
      Parent
      they can’t read Lesswrong or EA blogs
      VPNs exist and are probably widely used in China + much of “all this work” is on ArXiv etc.
[ ]
[deleted]
- Buck 19 Dec 2025 18:14 UTC
  2 points
  0
  Parent
  I’d tell MATS with this form, then not worry about it further.
LWLW 24 Nov 2025 19:02 UTC
2 points
−4
I think a lot of people are confused by good and courageous people and don’t understand why some people are that way. But I don’t think the answer is that confusing. It comes down to strength of conscience. For some people, the emotional pain of not doing what they think is right hurts them 1000x more than any physical pain. They hate doing what they think is wrong more than they hate any physical pain.

So if you want to be an asshole, you can say that good and courageous people, otherwise known as heroes, do it out of their own self-interest.
- Karl Krueger 24 Nov 2025 21:02 UTC
  4 points
  −1
  Parent
  Contrary view: The use of self-torture to promote goodness is an s-risk. The kingdom of heaven looks like people doing good deeds for each other out of love and delight, not out of guilt- and shame-avoidance.
  - LWLW 24 Nov 2025 21:26 UTC
    −53 points
    0
    Parent
    If you’re making fun of what I’ve expressed about S-risks go fuck yourself. If you’re not then I think you’re naive. Anger is the main way change happens. You’ve just been raised on a society that got ravaged by Russian psy-ops that the elites encouraged to weaken the population. It can feel good to uplift others while simultaneously feeling fucking awful knowing that innocent people are suffering.
    
    And just to be fucking clear, if you were making fun of me, please say it like a fucking man and not some fucking castrated male. If you were making fun of me you’re a low T faggot who’s not as smart as he thinks he is. There are 10 million Chinese people smarter than you.
    
    To be clear, I only intend the last paragraph if you were being a bitch. If not then consider that it’s only addressed to a hypothetical cunty version of you.
    - jimrandomh 25 Nov 2025 0:01 UTC
      6 points
      6
      Parent
      Moderator warning: This is well outside the bounds of reasonable behavior on LW. I can tell you’re in a pretty intense emotional state, and I sympathize, but I think that’s clouding your judgment pretty badly. I’m not sure what it is you think you’re seeing in the grandparent comment, but whatever it is I don’t think it’s there. Do not try to write on LW while in that state.
      - LWLW 25 Nov 2025 0:18 UTC
        1 point
        0
        Parent
        I understand. I’ll try to keep it more civil.
- Vladimir_Nesov 24 Nov 2025 19:11 UTC
  3 points
  0
  Parent
  
  the emotional pain of not doing what they think is right hurts them 1000x more
  
  People can just decide to do things of their own volition, without peculiar arrangements of pain or pleasure being in charge of their will.
  - LWLW 24 Nov 2025 19:49 UTC
    1 point
    0
    Parent
    Sure. The people I’m talking about choose to care as much as they do. Good and courageous people can choose to not have hope and not care about others, but they choose to care.
- Eli Tyre 24 Nov 2025 23:38 UTC
  2 points
  0
  Parent
  I claim that I am unusually Good (people who know me well would agree—many of them have said as much, unprompted). This is not how it works for me.
- Myron Hedderson 24 Nov 2025 20:43 UTC
  1 point
  0
  Parent
  Also, if it is true that a lot of people are confused by good and courageous people, I am unclear where the confusion comes from. Good behaviour gets rewarded from childhood, and bad behaviour gets punished. Not perfectly, of course, and in some places and times very imperfectly indeed, but being seen as a good person by your community’s definition of “good” has many social rewards, we’re social creatures… I am unclear where the mystery is.
  Were the confused people raised by ~~wolves~~non-social animals?
  I don’t actually buy the premise that a lot of people are confused by moral courage, on reflection.
- Myron Hedderson 24 Nov 2025 20:09 UTC
  1 point
  0
  Parent
  This doesn’t match my experience of what good people are generally like. I find them to be often happy to do what they are doing, rather than extremely afraid of not doing it, as I imagine would be the case if their reasons for behaving as they do were related to avoidance of pain.
  There are of course exceptions. But if thinking I had done the wrong thing was extremely painful to me, literally “1000x more than any physical pain” I predict I’d quite possibly land on the strategy “avoid thinking about matters of right and wrong, so as to reliably avoid finding out I’d done wrong.” A nihilistic worldview where nothing was right or wrong and everything I might do is fine, would be quite appealing. Also, since one can’t change the past, any discovery that I’d done something wrong in the past would be an unfixable, permanent source of extreme pain for the rest of my life. In that situation, I’d probably rationalize the past behaviour as somehow being good, actually, in order to make the pain stop… which does not pattern-match to being a good person long term, but rather the opposite, being someone who is pathologically unable to admit fault, and has a large bag of tricks to avoid blame.
  - LWLW 24 Nov 2025 21:22 UTC
    1 point
    0
    Parent
    It’s not fear. It’s anger. Also good people are rare. The people you think of as good people are likely just friendly.
    - Myron Hedderson 25 Nov 2025 1:55 UTC
      1 point
      0
      Parent
      How rare good people are depends heavily on how high your bar for qualifying as a good person is. Many forms of good-person behaviour are common, some are rare. A person who has never done anything they later felt guilty about (who has a functioning conscience) is exceedingly rare. In my personal experience, I have found people to vary on a spectrum from “kind of bad and selfish quite often, but feels bad about it when they think about it and is good to people sometimes” to “consistently good, altruistic and honest, but not perfect, may still let you down on occasion”, with rare exceptions falling outside this range.
LWLW 16 Mar 2025 17:51 UTC
2 points
0
How far along are the development of autonomous underwater drones in America? I’ve read statements by American military officials about wanting to turn the Taiwan straight into a drone-infested death trap. And I read someone (not an expert) who said that China is racing against time to try and invade before autonomous underwater drones take off. Is that true? Are they on track?
LWLW 27 Jan 2026 22:29 UTC
1 point
0
I’ve found the best way to get out of philosophical rabbit holes is to spend more time living. It provides far more reassurance and wisdom than spending all day trying to solve the problem of evil. I think Hume found something similar and that’s deeply reassuring to me.
LWLW 21 Aug 2025 23:26 UTC
1 point
0
I’m weighing my career options, and the two issues that seem most important to me are factory farming and preventing misuse/s-risks from AI. Working for a lab-grown meat startup seems like a very high-impact line of work that could also be technically interesting. I think I would enjoy that career a lot.

However, I believe that S-risks from human misuse of AI and neuroscience introduce scenarios that dwarf factory-farming in awfulness. I think that there are lots of incredibly intelligent people working on figuring out how to align AIs to who/what we want. But I don’t think there’s nearly the same amount of effort being made towards the coordination problem/preventing misuse. So naturally, I’d really like to work on solving this, but I just don’t even know how I’d start tackling this problem. It seems much harder and much less straightforward than “help make lab-grown meat cheap enough to end factory farming.” So, any advice would be appreciated.
- Buck 22 Aug 2025 0:34 UTC
  2 points
  0
  Parent
  What are your skill sets?
  Forethought has done work recently related to preventing S-risk arising from AI.
  I’m pretty in favor of trying to tackle the most important cause area.
  - LWLW 22 Aug 2025 13:44 UTC
    1 point
    0
    Parent
    I am pretty good at math. At a T20 math program I was chosen for special mentorship and research opportunities over several people who made Top 500 on the Putnam due to me being deemed “more talented” (as nebulous as that phrase is, I was significantly faster in lectures than them and was able to digest graduate texts much quicker than them, I was also able to solve competition-style problems they couldn’t). My undergrad got interrupted by a health crisis so I never got a chance to actually engage in research or dedicated Putnam prep, but I believe most (maybe all if I’m being vain) of my professors would have considered me the brightest student in my year. I don’t know a lot about programming or ML at this point, but I am confident I could learn. I’m two years into my undergrad and will likely be returning next year.
    - Buck 22 Aug 2025 16:05 UTC
      5 points
      0
      Parent
      My default drive-by recommendation is that you try to get involved in research related to these issues. You could try to get advice from Chi Nguyen, who works on s-risk and is friendly and thoughtful; you can contact her here.
      - LWLW 22 Aug 2025 17:12 UTC
        1 point
        0
        Parent
        Thank you so much! I will contact her.
LWLW 19 Jan 2026 0:08 UTC
0 points
−1
The idea of a superintelligence having an arbitrary utility function doesn’t make much sense to me. It ultimately makes the superintelligence a slave to its utility function which doesn’t seem like the way a superintelligence would work.
What links here?
- papetoast's comment on papetoast’s Shortforms by papetoast (21 Jan 2026 3:16 UTC; 1 point)
- papetoast 19 Jan 2026 0:17 UTC
  1 point
  0
  Parent
  If not “a slave to its utility function”, then what a superintelligence would be like? Constantly modifying its utility function?
  
  I think a superintelligence would have almost arbitrary utility function that is very sensitive to initial conditions, and then it would slightly modify the utility function to a self-consistent one and keep it forever. It almost never makes sense to change your utility function to a new one according to your old utility function.
  - Vladimir_Nesov 19 Jan 2026 1:47 UTC
    9 points
    7
    Parent
    Goals defined for a person who is not already a formal agent are a living thing, a computational process built from possible behaviors and decisions of that person in various hypothetical situations. Such goals are not even conceptually prior to those behaviors, though there is still an advantage in formulating them as an unchanging computation that defines the target for external agency aiming in alignment with that person’s own aims. But that computation is never fully computed, and it can only be computed further through the decisions of the person who defines it as their goals.
    - papetoast 19 Jan 2026 4:06 UTC
      1 point
      0
      Parent
      I agree that a human doesnt have cleanly defined goals and I agree with most of the additional nuances in your comment to the extent that I can understand them, but OP is talking about superintelligence and I think modelling a superintelligence as having a constant-across-time utility function is appropriate.
      - Vladimir_Nesov 19 Jan 2026 15:22 UTC
        3 points
        0
        Parent
        An aligned superintelligence would work with goals of the same kind, even if it’s aligned to early AGIs rather than humans. Goals-as-computations may be constant, like the code of a program may be constant, but what’s known about its behavior isn’t constant. And so the way it guides actions of an agent develops as it gets computed further, ultimately according to decisions of the underlying humans/AGIs (and their future iterations) in various hypothetical situations. Also, an uplifted (grown up) human could be a superintelligence personally, it’s not a different kind of thing with respect to values it could have.
LWLW 3 Dec 2025 14:21 UTC
0 points
0
I got into reading about near death experiences and it seems a common theme is that we’re all one. Like each and every one of us is really just part of some omniscient god that’s so omniscient and great that god isn’t even a good enough name for it: experiencing what it’s like to be small. Sure, why not. That’s sort of intuitive to me. Given that I can’t verify the universe exists and can only verify my experience it doesn’t seem that crazy to say experience is fundamental.

But if that’s the case then I’m just left with an overwhelming sense of why. Why make a universe with three spatial dimensions? Why make yourself experience suffering? Why make yourself experience hate? Why filter your consciousness through a talking chimpanzee? If I’m an omniscient entity why would I choose this? Surely there’s got to be infinitely more interesting things to do. If we’re all god then surely we’d never get bored just doing god things.

So you can take the obvious answer that everything exists. But then you’re left with other questions. Why are we in a universe that makes sense? Why don’t we live in a cartoon operating on cartoon logic? Does that mean there’s a sentient SpongeBob? And then there’s the more pressing concern of astronomical suffering. Are there universes where people are experiencing hyperpain? Surely god wouldn’t want to experience I Have No Mouth and I Must Scream. It doesn’t seem likely to me that there are sentiences living in cartoons, so I’ll use that to take the psychologically comforting position that not everything we can imagine exists.

But if that’s the case then why this? Why this universe? Why this amount of suffering? If there’s a no-go zone of experience where is it? I have so many questions and I don’t know where the answers are.
LWLW 18 Jan 2026 22:28 UTC
−1 points
0
My guess is that finetuning an LLM turns it into a p-zombie. I don’t think the architecture is complicated enough to support consciousness. There’s zero capacity for choice involved, which seems to be what consciousness is all about.
LWLW 17 Jan 2026 0:30 UTC
−3 points
−2
contra the orthogonality thesis.
if you want to waste a day or two, try to find an eminent mathematician or physicist who had NPD or ASPD. as far as i can tell, i haven’t been able to find any successful ones who had either disorder.
as far as the research goes, ASPD is correlated with significantly lower non-verbal intelligence. and in one study i found, NPD wasn’t really correlated with any parts of intelligence except with lower non-verbal intelligence.
which can lead to the idea that everbody starts out aligned, and then when those with less cognitive reserve are confronted with trauma, more serious misalignment/personality disorders can arise.
even if you look at kids with ASPD who try to murder their siblings when they’re two, most of the time it’s a younger sibling. which can be explained by the kid being jealous that they get less attention than their younger sibling. which can be entirely explained by a lower (emotional) pain tolerance.
[ ]
[deleted]
[ ]
[deleted]
[ ]
[deleted]