cousin_it comments on Problems I’ve Tried to Legibilize

cousin_it 10 Nov 2025 2:43 UTC
LW: 24 AF: 10
5
AF
I’m worried about the approach of “making decisionmakers realize stuff”. In the past couple years I’ve switched to a more conflict-theoretic view: the main problem to me is that the people building AI don’t want to build aligned AI. Even if we solved metaethics and metaphilosophy tomorrow, and gave them the solution on a plate, they wouldn’t take it.

This is maybe easiest to see by looking at present harms. An actually aligned AI would politely decline to do such things as putting lots of people out of jobs or filling the internet with slop. So companies making AI for the market have to make it misaligned in at least these ways, otherwise it’ll fail in the market. Extrapolating into the future, even if we do lots of good alignment research, markets and governments will pick out only those bits that contribute to market-aligned or government-aligned AI. Which (as I’ve been saying over and over) will be really bad for most people, because markets and governments don’t necessarily need most people.

So this isn’t really a comment on the list of problems (which I think is great), but more about the “theory of change” behind it. I no longer have any faith in making decisionmakers understand something it’s not profitable for them to understand. I think we need a different plan.
- orthonormal 15 Nov 2025 18:03 UTC
  LW: 4 AF: 3
  0
  AF Parent
  When it specifically comes to loss-of-control risks killing or sidelining all of humanity, I don’t believe Sam or Dario or Demis or Elon want that to happen, because it would happen to them too. (Larry Page is different on that count, of course.) You do have conflict theory over the fact that some of them would like ASI to make them god-emperor of the universe, but all of them would definitely take a solution to “loss of control” if it were handed to them on a silver platter.
- Wei Dai 10 Nov 2025 3:55 UTC
  LW: 4 AF: 2
  0
  AF Parent
  I’m uncertain between conflict theory and mistake theory, and think it partly depends on metaethics, and therefore it’s impossible to be sure which is correct in the foreseeable future—e.g., if everyone ultimately should converge to the same values, then all of our current conflicts are really mistakes. Note that I do often acknowledge conflict theory, like in this list I have “Value differences/conflicts between humans”. It’s also quite possible that it’s really a mix of both, that some of the conflicts are mistakes and others aren’t.
  
  In practice I tend to focus more on mistake-theoretic ideas/actions. Some thoughts on this:
  1. If conflict theory is true, then I’m kind of screwed anyway, having invested little human and social capital into conflict-theoretic advantages, as well as not having much talent or inclination in that kind of work in the first place.
  2. I do try not to interfere people doing conflict-theoretic work (on my side), e.g., not berate them for having “bad epistemics” or not adopting mistake theory lenses, etc.
  3. It may be nearly impossible to convince some decision makers that they’re making mistakes, but perhaps others are more open to persuasion, e.g. people in charge of or doing ground-level work on AI advisors or AI reasoning.
  4. Maybe I can make a stronger claim that a lot of people are making mistakes, given current ethical and metaethical uncertainty. In other words, people should be unsure about their values, including how selfish or altruistic they should be, and under this uncertainty they shouldn’t be doing something like trying to max out their own power/resources at the expense of the commons or by incurring societal-level risks. If so, then perhaps an AI advisor who is highly philosophically competent can realize this too and convince its principle of the same, before it’s too late.
  (I think this is probably the first time I’ve explicitly written down the reasoning in 4.)
  
  I think we need a different plan.
  
  Do you have any ideas in mind that you want to talk about?
  - cousin_it 10 Nov 2025 13:11 UTC
    LW: 13 AF: 3
    5
    AF Parent
    I’m pretty slow to realize these things, and I think other people are also slow, so the window is already almost closed. But in any case, my current thinking is that we need to start pushing on the big actors from outside, try to reduce their power. Trying to make them see the light is no longer enough.
    
    What it means in practical terms: - Make it clear that we frown on people who choose to work for AI labs, even on alignment. This social pressure (on LW and related forums maybe) might already do some good. - Make it clear that we’re allied with the relatively poor majority of people outside the labs, and in particular those who are already harmed by present harms. Make amends with folks on the left who have been saying such things for years. - Support protests against labs, support court cases against them having to do with e.g. web scraping, copyright infringement, misinformation, suicides. Some altruist money in this might go a long way. - Think more seriously about building organizations that will make AI power more spread out. Open source, open research, open training. Maybe some GPL-like scheme to guarantee that things don’t get captured. We need to reduce concentration of power in the near term, enable more people to pose a challenge to the big actors. I understand it increases other risks, but in my opinion it’s worth it.
    - Wei Dai 10 Nov 2025 23:27 UTC
      LW: 4 AF: 4
      2
      AF Parent
      
      even on alignment
      
      I see a disagreement vote on this, but I think it does make sense. Alignment work at the AI labs will almost by definition be work on legible problems, but we should make exceptions for people who can give reasons for why their work is not legible (or otherwise still positive EV), or who are trying to make illegible problems more legible for others at the labs.
      
      Think more seriously about building organizations that will make AI power more spread out.
      
      I start to disagree from here, as this approach would make almost all of the items on my list worse, and I’m not sure which ones it would make better. You started this thread by say “Even if we solved metaethics and metaphilosophy tomorrow, and gave them the solution on a plate, they wouldn’t take it.” which I’m definitely very worried about, but how does making AI power more spread out help with this? Is the average human (or humanity collectively) more likely to be concerned about metaethics and metaphilosophy than a typical AI lab leader, or easier to make concerned? I think the opposite is more likely to be true?
      What links here?
      StanislavKrym's comment on The problem of graceful deference by TsviBT (11 Nov 2025 23:09 UTC; 1 point)
      - cousin_it 11 Nov 2025 13:53 UTC
        LW: 17 AF: 7
        10
        AF Parent
        I think on the level of individual people, there’s a mix of moral and self-interested actions. People sometimes choose to do the right thing (even if the right thing is as complicated as taking metaethics and metaphilosophy into account), or can be convinced to do so. But with corporations it’s another matter: they choose the profit motive pretty much every time.
        
        Making an AI lab do the right thing is much harder than making its leader concerned. A lab leader who’s concerned enough to slow down will be pressured by investors to speed back up, or get replaced, or get outcompeted. Really you need to convince the whole lab and its investors. And you need to be more convincing than the magic of the market! Recall that in many of these labs, the leaders / investors / early employees started out very concerned about AI safety and were reading LW. Then the magic of the market happened and now the labs are racing at full speed, do you think our convincing abilities can be stronger than the thing that did that? The profit motive, again. In my first comment there was a phrase about things being not profitable to understand.
        
        What it adds up to is, even with our uncertainty about ethics and metaethics, it seems to me that concentration of power is itself a force against morality. The incentives around concentrated power are all wrong. Spreading out power is a good thing that enables other good things, enables individuals to sometimes choose what’s right. I’m not absolutely certain but that’s my current best guess.
        Wei Dai 15 Nov 2025 5:34 UTC
        LW: 2 AF: 2
        8
        AF Parent
        
        A lab leader who’s concerned enough to slow down will be pressured by investors to speed back up, or get replaced, or get outcompeted. Really you need to convince the whole lab and its investors. And you need to be more convincing than the magic of the market!
        
        This seems to imply that lab leaders would be easier to convince if there were no investors and no markets, in other words if they had more concentrated power.
        
        If you spread out the power of AI more, won’t all those decentralized nodes of spread out AI power still have to compete with each other in markets? If market pressures are the core problem, how does decentralization solve that?
        
        I’m concerned that your proposed solution attacks “concentration of power” when the real problem you’ve identified is more like market dynamics. If so, it could fail to solve the problem or make it even worse.
        
        My own perspective is that markets are a definite problem, and concentration of power per se is more ambiguous (I’m not sure if it’s good or bad). To solve AI x-safety we basically have to bypass or override markets somehow, e.g., through international agreements and government regulations/bans.
        cousin_it 15 Nov 2025 10:11 UTC
        LW: 5 AF: 3
        1
        AF Parent
        I think AI offers a chance of getting huge power over others, so it would create competitive pressure in any case. In case of a market economy it’s market pressure, but in case of countries it would be a military arms race instead. And even if the labs didn’t get any investors and raced secretly, I think they’d still feel under a lot of pressure. The chance of getting huge power is what creates the problem, that’s why I think spreading out power is a good idea. There would still be competition of course, but it would be normal economic levels of competition, and people would have some room to do the right things.
        [ ]
        [deleted]
      - StanislavKrym 11 Nov 2025 13:43 UTC
        1 point
        0
        Parent
        Wouldn’t discussions of high-level philosophy benefit from concrete examples like my attempts to show that mankind shouldn’t actually populate many stellar systems because there are many other lifeforms that would be oppressed?
        Another concrete example could be Buck’s Christian homeschoolers or David Matolcsi’s superpersuasive AI girlfriends. These examples imply that the AIs are not to be allowed to do… what exactly? To be persuasive over a certain level? To keep Christian homeschoolers in the dark? And is the latter fixable by demanding that OpenBrain moves major parts of the Spec to root level, making it a governance issue?
        As for preventing researchers from working on alignment, this simply means that work related to aligning the AIs to any targets is either done by agents as trustworthy as Agent-4 or CCP’s DeepCent or suppressed by an international ASI ban. Your proposal means that the ASI ban has to include alignment work until illegible troubles are solved, then capabilities work until alignment is solved. But it is likely easier to include the clause about “alignment work until illegible troubles are solved” into an existing ASI ban, especially if the negative effects of AI girlfriends, slop, pyramid replacement, etc, become obvious.
      - [ ]
        [deleted]