Wei Dai comments on Problems I’ve Tried to Legibilize

Wei Dai 10 Nov 2025 23:27 UTC
LW: 4 AF: 4
2
AF

even on alignment

I see a disagreement vote on this, but I think it does make sense. Alignment work at the AI labs will almost by definition be work on legible problems, but we should make exceptions for people who can give reasons for why their work is not legible (or otherwise still positive EV), or who are trying to make illegible problems more legible for others at the labs.

Think more seriously about building organizations that will make AI power more spread out.

I start to disagree from here, as this approach would make almost all of the items on my list worse, and I’m not sure which ones it would make better. You started this thread by say “Even if we solved metaethics and metaphilosophy tomorrow, and gave them the solution on a plate, they wouldn’t take it.” which I’m definitely very worried about, but how does making AI power more spread out help with this? Is the average human (or humanity collectively) more likely to be concerned about metaethics and metaphilosophy than a typical AI lab leader, or easier to make concerned? I think the opposite is more likely to be true?
- cousin_it 11 Nov 2025 13:53 UTC
  LW: 17 AF: 7
  10
  AF Parent
  I think on the level of individual people, there’s a mix of moral and self-interested actions. People sometimes choose to do the right thing (even if the right thing is as complicated as taking metaethics and metaphilosophy into account), or can be convinced to do so. But with corporations it’s another matter: they choose the profit motive pretty much every time.
  
  Making an AI lab do the right thing is much harder than making its leader concerned. A lab leader who’s concerned enough to slow down will be pressured by investors to speed back up, or get replaced, or get outcompeted. Really you need to convince the whole lab and its investors. And you need to be more convincing than the magic of the market! Recall that in many of these labs, the leaders / investors / early employees started out very concerned about AI safety and were reading LW. Then the magic of the market happened and now the labs are racing at full speed, do you think our convincing abilities can be stronger than the thing that did that? The profit motive, again. In my first comment there was a phrase about things being not profitable to understand.
  
  What it adds up to is, even with our uncertainty about ethics and metaethics, it seems to me that concentration of power is itself a force against morality. The incentives around concentrated power are all wrong. Spreading out power is a good thing that enables other good things, enables individuals to sometimes choose what’s right. I’m not absolutely certain but that’s my current best guess.
  - Wei Dai 15 Nov 2025 5:34 UTC
    LW: 2 AF: 2
    8
    AF Parent
    
    A lab leader who’s concerned enough to slow down will be pressured by investors to speed back up, or get replaced, or get outcompeted. Really you need to convince the whole lab and its investors. And you need to be more convincing than the magic of the market!
    
    This seems to imply that lab leaders would be easier to convince if there were no investors and no markets, in other words if they had more concentrated power.
    
    If you spread out the power of AI more, won’t all those decentralized nodes of spread out AI power still have to compete with each other in markets? If market pressures are the core problem, how does decentralization solve that?
    
    I’m concerned that your proposed solution attacks “concentration of power” when the real problem you’ve identified is more like market dynamics. If so, it could fail to solve the problem or make it even worse.
    
    My own perspective is that markets are a definite problem, and concentration of power per se is more ambiguous (I’m not sure if it’s good or bad). To solve AI x-safety we basically have to bypass or override markets somehow, e.g., through international agreements and government regulations/bans.
    - cousin_it 15 Nov 2025 10:11 UTC
      LW: 5 AF: 3
      1
      AF Parent
      I think AI offers a chance of getting huge power over others, so it would create competitive pressure in any case. In case of a market economy it’s market pressure, but in case of countries it would be a military arms race instead. And even if the labs didn’t get any investors and raced secretly, I think they’d still feel under a lot of pressure. The chance of getting huge power is what creates the problem, that’s why I think spreading out power is a good idea. There would still be competition of course, but it would be normal economic levels of competition, and people would have some room to do the right things.
    - [ ]
      [deleted]
- StanislavKrym 11 Nov 2025 13:43 UTC
  1 point
  0
  Parent
  Wouldn’t discussions of high-level philosophy benefit from concrete examples like my attempts to show that mankind shouldn’t actually populate many stellar systems because there are many other lifeforms that would be oppressed?
  Another concrete example could be Buck’s Christian homeschoolers or David Matolcsi’s superpersuasive AI girlfriends. These examples imply that the AIs are not to be allowed to do… what exactly? To be persuasive over a certain level? To keep Christian homeschoolers in the dark? And is the latter fixable by demanding that OpenBrain moves major parts of the Spec to root level, making it a governance issue?
  As for preventing researchers from working on alignment, this simply means that work related to aligning the AIs to any targets is either done by agents as trustworthy as Agent-4 or CCP’s DeepCent or suppressed by an international ASI ban. Your proposal means that the ASI ban has to include alignment work until illegible troubles are solved, then capabilities work until alignment is solved. But it is likely easier to include the clause about “alignment work until illegible troubles are solved” into an existing ASI ban, especially if the negative effects of AI girlfriends, slop, pyramid replacement, etc, become obvious.
- [ ]
  [deleted]